Visualization Python Libraries? Part -2

RAVI SHEKHAR TIWARI
DataDrivenInvestor
Published in
16 min readApr 10, 2021

--

In this part we will have a glimpse of the types of graphs/chart which are used to visualize the data and information obtained from these graphs/charts.

When managing numbers in insights, consolidating information representation is necessary for making a lucid and justifiable rundown of dataset. It doesn’t make any difference if it’s a huge or little dataset, envisioning information utilizing diagrams and graphs will contribute to a great extent to your crowd understanding the message.

There are, in any case, various sorts of diagrams and outlines utilized in information perception and it is some of the time precarious picking which type is best for your business or information. Every one of these charts has its own qualities and shortcomings that improve it than others in certain circumstances.

You may have to picture the result of logical exploration, a business chart, an industry infographic, or a pitch deck segment. Diagrams and outlines are simple methods of exhibiting every one of these substance.

What is a Graph?

A chart is a pictorial portrayal of information in a coordinated way. Charts are generally shaped from different information focuses, which address the connection between at least two things.

An image says 1,000 words, they say. A chart, then again, says 1,000 words as well as recounts 1,000,000 stories.

Each point, stroke, shading, or shape on a diagram has an alternate implying that helps in deciphering a chart. They are of various kinds and differ in structure, with some having simply focuses, others have focuses consolidated by lines, etc.

We will be discussing the below mentioned graphs:

  1. Line Graph
  2. Scatter Plot
  3. Area Chart
  4. Histogram
  5. Pie Chart
  6. Bar Chart/Graph
  7. Box Plot

1. Line Graph

Line charts are addressed by a gathering of information focuses combined by a straight line. Every one of these information focuses depicts the connection between the flat and the vertical pivot on the diagram.

Fig. 1 Line Graph

The diagram may rise, plummet, or do both contingent upon what sort of information is being envisioned. When contemplating the connection among cost and supply, it goes down and for harmony and request, it goes up.

While developing a line graph, you may choose to incorporate the information focuses or not.

Types of Line Graph

  1. Simple Line Graph

In a simple line graph, only one line is plotted on the graph. One of the axes defines the independent variables while the other axis contains dependent variables.

2. Multiple Line Graph

Multiple line graphs contain two or more lines representing more than one variable in a dataset. This type of graph can be used to study two or more variables over the same period of time.

3. Compound Line Graph

A compound line graph is an extension of the simple line graph, which is used when dealing with different groups of data from a larger dataset. Each line in a compound line graph is shaded downwards to the x-axis.

In a compound line graph, each group of data represented by a simple line graph is stacked upon one another.

Application Line Graph

  • It helps in studying data trends over a period of time.
  • They are easy to read and plot.

Disadvantages of a Line Graph

  • It can only be used to visualize data over a short period of time.
  • It is not convenient to plot when dealing with fractions and decimals

2. Scatter Plot

Scatter plots are charts used to visualize random variables with dot-like markers that represent each data point. These markers are usually scattered across the chart area of the plot.

Types of Scatter Plot

Scatter plots are grouped into different types according to the correlation of the data points. These correlation types are highlighted below:

  1. Positive Correlation

Two groups of data visualized on a scatter plot are said to be positively correlated if an increase in one implies an increase in the other. A scatter plot diagram can be said to have a high or low positive correlation.

Fig. 2 Scatter Plot

2. Negative Correlation

Two groups of data visualized on a scatter plot are said to be negatively correlated if an increase in one implies a decrease in the other A scatter plot diagram can be said to have a high or low negative correlation.

3. No Correlation

Two groups of data visualized on a scatter plot are said to have no correlation if there is no clear correlation between them.

Advantages

  • It clearly shows data spread
  • Ir is usually colorful and visually appealing

Disadvantages

  • It cannot give the exact extent of correlation.
  • It can only be used to study the relationship between 2 variables.

3. Area Chart

Area chart are utilized to on the whole quantify information patterns throughout some stretch of time by shading the region between the line portion and the x-pivot. In easier terms, a zone diagram is an expansion of the line outline.

Fig. 3 Area Chart

Types of Area Chart

  1. Simple area Chart

In a simple area chart, the colored segments overlap each other in the chart area. They are placed above each other such that they intersect.

2. Stacked Area Chart

In a stacked area chart, the colored segments are stacked on top of one another so that they do not intersect.

3. 100% Stacked area Chart

This is a type of stacked area chart where the area occupied by each group of data on the chart is measured as a percentage of its amount from the total data. The vertical axis usually totals a hundred percent.

4. 3-D Area Chart

This is the type of area chart measured on a 3-dimensional space.

Application Area Chart

  • It is visually appealing.
  • It gives a clear comparison of different groups of data.
  • Used especially in Classification task to find AUC.

Advantages

  • It may be difficult to read when compared to other data types.

4. Histogram

Histogram chart visualizes the frequency of discrete and continuous data in a dataset using joined rectangular bars. Each rectangular bar defines the number of elements that fall into a predefined class interval.

Fig. 4 Histogram

Types of Histogram Chart

The histogram chart is classified into different parts depending on their distribution

  1. Normal Distribution

A normally distributed histogram chart is usually bell-shaped. As the name suggests, this distribution is normal and is the standard for how a normal histogram chart should look like.

2. Bimodal Distribution

In a bimodally distributed histogram chart, we have two groups of histogram charts that are of normal distribution. It is formed as a result of combining two processes in a dataset.

3. Skewed Distribution

This is an asymmetric graph with an off-center pick usually tending towards the end of the graph. A histogram chart can be said to be right or left-skewed depending on the direction where the peak tends towards.

4. Random Distribution

This type of histogram chart does not have a regular pattern. It produces multiple peaks and can also be called a multimodal distribution.

5. Edge Peak Distribution

This distribution has a structure that is similar to that of a normal distribution with a large peak at one of its edges being the distinguishing factor.

Application of Histogram

  • It helps in visualizing large amounts of data.
  • It reveals the variation, centering, and distribution of the data.

Advantages

  • It does not visualize the exact values in a dataset.
  • It only visualizes continuous data.

5. Pie Chart

A pie diagram is a round chart used to represent mathematical extents in a dataset. This chart is normally separated into different areas, where every area addresses the extent of a specific mathematical component in the set.

Fig. 5 Pie Chart

Very much like a pizza is separated into various cuts, every area in a pie diagram addresses the extent of a component in the dataset. The extent is characterized by the level of the area and the rate territory as for the space of the circle.

Types of Pie Chart

  1. Simple Pie Chart

This is the most basic type of pie chart and can also be simply called a pie chart.

2. Exploded Pie Chart

In an exploded pie chart, one of the sectors of the circle is separated (or exploded) from the chart. It is used to lay emphasis on a particular element in the data set.

3. Pie of Pie

As the name suggests, a pie of pie is a chart that generates an entirely new (usually small) pie chart from the existing one. It can be used to reduce clutteredness and lay emphasis on a particular group of elements.

4. Bar of Pie

This is similar to the pie of pie, with the main difference being that a bar chart is what is generated in this case rather than a pie chart.

5. 3D Pie Chart

This is a type of pie chart that is represented in a 3-dimensional space.

Application

  • It summarizes data into a visually appealing form.
  • It is quite simple compared to many graph types.

Disadvantages

  • It is inapplicable for large datasets.
  • It cannot visualize groups of data.

6. Bar Chart/Graph

A bar diagram is a chart addressed by divided rectangular bars that portray the information focuses in a bunch of information. It is normally used to plot discrete and all out information.

The flat pivot of the diagram addresses absolute information while the vertical hub of the graph characterizes discrete information. Albeit the rectangular bars in a bar graph are generally positioned vertically, they can likewise be flat.

Fig. 6 Bar Grap/Chart

For evenly positioned rectangular bars, the straight out information is characterized on the vertical hub while the level pivot characterizes the discrete information.

Types of Bar Chart

  1. Grouped Bar Chart

Grouped bar charts are used when the datasets have subgroups that need to be visualized on the graph. Each subgroup is usually differentiated from the other by shading them with distinct colors.

2. Stacked Bar Chart

The stacked bar graphs are also used to show subgroups in a dataset. But in this case, the rectangular bars defining each group are stacked on top of each other.

3. Segmented Bar Chart

This is the type of stacked bar chart where each stacked bar shows the percentage of its discrete value from the total value. The total percentage is 100%

Advantages

  • Summarizes a large amount of data in an understandable form.
  • Easily accessible to a wide audience.

Disadvantages

  • It does not reveal key assumptions like causes, effects, patterns, etc.
  • May require further explanation.

7. Box Plot

A box plot is a factual information representation procedure that utilizes rectangular bars to show information bunches through their quartiles. It might likewise have line expansions stretching out from the crates, which generally demonstrates inconstancy past the upper and lower quartiles.

The name, box and hair plot is gotten from the idea of the diagram. That is, the rectangular bars(or boxes), top of the crates showing the upper quartile, the lower part of the containers demonstrating the lower quartile, the centerline showing the edge, and the line drawn from each finish of the cases is known as the stubble.

The cases can either be drawn vertically or evenly relying upon the objective of picturing the information. Albeit uncommon, some case plots don’t have hairs.

Elements of a Box Plot

  1. The Median

The median is the quantity that falls in the middle when a set of values are arranged in an ascending or descending order. The median can be easily formulated when the dataset contains an odd number of values.

However, when it is even, the median is calculated by finding the average of the two numbers in the middle. The median is also known as the second quartile.

2. First Quartile(Q1)

The first quartile is also known as the lower quartile because it is calculated at the 25th percentile. That is the lower quartile value.

It is calculated by multiplying the one-fourth of the value by 1. For example, the first quartile of 100 is (¼)*100*1= 25.

3. Third Quartile(Q3)

The third quartile is also known as the upper quartile because it is calculated at the 75th percentile. That is the upper quartile value.

It is calculated by multiplying the one-fourth of the value by 3. For example, the first quartile of 100 is (¼)*100*3= 75.

4. Interquartile Range(Q3-Q1)

The interquartile range is the difference between the first quartile and the third quartile. It is often said to be a better measure of spread when compared to the range.

5. Highest Value

This is simply the highest non-outlier value in the dataset being visualized by the box plot. The highest value, in this case, is not necessarily the highest value in the dataset.

Given the dataset 1,2,3,4,5,1000 for instance, the highest value is 1000. However, this is most likely not the highest value in the box plot because there is a high probability that that 1000 will be an outlier.

The most feasible highest value is 5.

6. Lowest Value

This is simply the lowest non-outlier value in the dataset being visualized by the box plot. The lowest value, in this case, is not necessarily the lowest value in the dataset.

Given the dataset -100, 50, 60, 70, 80, 90 for instance, the lowest value is -100. However, this is most likely not the lowest value in the interquartile range of distribution because there is a high probability that that -100 will be an outlier.

The most feasible highest value is 50.

Example of Box Plot is demonstrated below:

Example 1: David and Bryan are both sales attendants at a Phone shop. At the end of each month, they record the number of phones sold. By the end of the year, they both submitted their sales record, and they made the following number of sales.

David: 51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.

Bryan: 30, 56, 23, 65, 42, 61, 54, 17, 21, 34, 3, 16.

  1. Arrange the monthly sales made by David and Bryan in a tabular form.
  2. Give a five-number summary of David and Bryan’s sales.
  3. Make a box and whisker plots describing the sales made by David and Bryan.

Solution

  1. The monthly sales made by David and Bryan are arranged in the table below
Fig. 7 Data of two person
  1. The five-number summary of the data is the median, first quartile, third quartile, First minimum value, and maximum value

David
6, 7, 13, 17, 20, 25, 39, 41, 43, 49, 51, 62.
Median = (sixth + seventh observations) ÷ 2
= (25 + 39) ÷ 2
= 32

There are six numbers below the median, namely: 6, 7, 13, 17, 20, 25.
Q1 = the median of these six items
= (third + fourth observations) ÷ 2
= (13 + 17) ÷ 2
= 15

Here are six numbers above the median, namely: 39, 41, 43, 49, 51, 62.
Q3 = the median of these six items
= (third + fourth observations) ÷ 2
= 46

The five-number summary for David’s sales is 6, 15, 32, 46, 62.

Using the same calculations, for Bryan, we have: 3, 16, 17, 21, 23, 30, 34, 42, 54, 56, 61, 65.

Median = (sixth + seventh observations) ÷ 2
= (30+34) ÷ 2
= 32

There are six numbers below the median, namely: 3, 16, 17, 21, 23, 30.
Q1 = the median of these six items
= (third + fourth observations) ÷ 2
= (21 + 17) ÷ 2
= 19

Here are six numbers above the median, namely: 34, 42, 54, 56, 61, 65.
Q3 = the median of these six items
= (third + fourth observations) ÷ 2
= 55

The five-number summary for Bryan’s sales is 3, 19, 32, 55, 65.

  1. The resulting box plot from the monthly sales data can be found below.
Fig. 8 Sample box plot

Interpret Box Plot:

Prior to going into deciphering a Box and Whisker plot, we need to initially comprehend the various pieces of a case plot. Subsequently, we should consider this container plot drawn utilizing some information produced from Excel’s Random number generator.

On the chart over, the flat line inside the blue box addresses the middle estimation of the informational index. For this situation, it is … inches. The x on first in class, actually situated inside the blue box is the mean estimation of the information.

In any case, you need to take note of that the mean worth doesn’t really need to be a worth in the information. It is just a factual model utilized in addressing the information.

Presently, we should appropriately recognize the pieces of a container plot. The blue box addresses the information focuses that fall between the first and third quartiles of the haphazardly produced informational index.

The highest point of the case addresses the third quartile, while the lower part of the case addresses the primary quartile. The middle can likewise be alluded to as the subsequent quartile.

You will see two vertical lines, one drawn from the highest point of the case to a point in the diagram, while the different was drawn from the lower part of the container to a point in the graph. These two lines are alluded to as the stubbles.

The even line opposite to the top stubble shows the most extreme worth, while the one opposite to the base bristle demonstrates the base worth in the informational collection.

Very much like the crate gives us the quartile scope of the information, the hairs assist us with deciding the scope of the informational collection. One can undoubtedly peruse this data initially.

In conclusion, the dab at the farther top of the chart, some place over the greatest worth is known as the anomaly. The exception is a surprising information present in the informational collection.

This carries us to clarify the way that the most extreme and least qualities are not really the real max and min of the informational collection. They address the most extreme and insignificant of the standard qualities present in the informational index.

Interpretation with Plot:

Since we have had the option to completely comprehend what the various pieces of a crate plot mean, we can go into deciphering the Box Plot. To appropriately clarify this current, we should consider the crate plot beneath, depicting the normal yearly pay of people that fall specifically age gatherings.

Fig. 9 Sample for explaination

Notice that in the chart above, there are two arrangements of box plots, with blue addressing the men and orange addressing the ladies. Box plots can undoubtedly make examinations of the components of a huge informational collection.

In the above plot, for instance, you can undoubtedly see the normal yearly pay of guys and females across the distinctive age gatherings. By and large, it is not difficult to find that guys for the most part procure more than females across ages.

We additionally see that the cash procured by men is considerably more even, with the compensation hole across people isn’t unreasonably a lot. Indeed, the most extreme yearly pay isn’t noticeable and can be derived to be close or equivalent to the third quartile. Then again, ladies’ yearly pay shifts all the more to a great extent.

Also, given the any longer “stubbles” for ladies, we can decipher that they shift all the more generally in the measure of cash they procure yearly, while men will in general community more toward the normal.

The third is the slant of the information. Slant alludes to the imbalance of your information. In the event that you take a gander at the ladies, the crate and stubbles are quite even on one or the other side of the middle/mean. Be that as it may, the case is very extraordinary in men. Henceforth, we say that this information is slanted

At long last, we search for exceptions, which genuinely address distinctive information focuses. We notice that lone men has exceptions

Boxplots are helpful little designs that contain a ton of data in next to no space. They are best utilized toward the start of information investigation to recognize early examples in the information. Despite the fact that, as we have seen here, they are helpful for announcing brings about clear and brief ways.

Advantages

  • It can easily visualize large datasets. Due to the five-number summary technique embraced by the box plot, it can summarize large datasets and easily describe it on the graph.
  • It gives a clear summary of the datasets under consideration. It allows the reader to easily detect the symmetry of the data at a glance.
  • Unlike most data visualization techniques, the box plot displays outliers within a dataset. Outliers are values in a dataset that falls outside the minimum and maximum values on the box plot. One can easily detect outliers on the box plot.

Disadvantages

  • It does not retain the exact values of the dataset. It only displays the summary of the values in the dataset. Hence, it is advised to use a box plot together with other data visualization techniques that give a detailed analysis of the data.
  • It is not easy for laymen to understand box plots. It is quite complicated for non-scientists.
  • It is difficult to detect the meaning of the data from the box plot.

Conclusion:

In this article, we have seen different types of the graphs/plot which are used to visualise the data. In next article we will discuss how we can plot these graphs by various library in python.

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Reference:

Jupyter Notebook: Link

If you have any query feel free to contact me with any of the -below mentioned options:

Website: www.rstiwari.com

Medium: https://tiwari11-rst.medium.com

Google Form: https://forms.gle/mhDYQKQJKtAKP78V7

YouTube : Link

--

--