Various Types of Data Plots for Visualization: from Concept to Code

Sameer Mahajan
16 min readDec 15, 2021

Background

When data is collected, there is a need to interpret and analyze it to provide insight into it. This insight can be about patterns, trends, seasonality, variation, or relationships between variables. Data interpretation is the process of reviewing data through some predefined methods which will help assign some meaning to the data and arrive at a relevant conclusion. The analysis is the process of ordering, categorizing, manipulating, and summarizing data to obtain answers to research questions. It needs to be done quickly and effectively. The results need to stand out and should be right in your face. Visualization is an important aspect of this end. It can be addressed by various data plots. With growing data, this need is growing currently and hence data plots become very important in today’s world.

There are, however, numerous types of plots used in data visualization and it is sometimes tricky choosing which type is best for your business or data. Each of these plots has its strengths and weaknesses that make it better than others in some situations.

This article provides a comprehensive list of twelve types of data plots and their further sub types along with their detailed discussion so that which one is right for the given problem can be identified easily.

Several packages can be used for this purpose. A couple of popular packages widely used for this purpose are plotly and seaborn. In this article, we will look at code that draws these various plots in plotly and seaborn / matplotlib. The visual representation of these various plots is also provided here for understanding. The code used in this article to generate plots and corresponding generated visual plots can also be seen on GitHub at: https://github.com/sameermahajan/MLWorkshop/tree/master/13.%20Visualization

These plots are also called graphs or charts depending on the context.

Bar Graph

A bar graph is a graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar graph is sometimes called a column graph.

Following is an illustration of bara graph indicating the population in Canada by years.

Following is the code indicating how to do it in plotly.

import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()

Following is representational code of doing it in seaborn.

import seaborn as sns
sns.set_theme(style="whitegrid")
ax = sns.barplot(x="year", y="pop", data=data_canada)

This is how it looks:

The following are types of bar graphs:

Grouped Bar Graph

Grouped bar graphs are used when the datasets have subgroups that need to be visualized on the graph. Each subgroup is usually differentiated from the other by shading them with distinct colors. Here is an illustration of such a graph:

Here is a code snippet on how to do it in plotly:

import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x="sex", y="total_bill",
color='smoker', barmode='group',
height=400)
fig.show()

Here is a code snippet on how to do it in seaborn:

import seaborn as sb
df = sb.load_dataset('tips')
df = df.groupby(['size', 'sex']).agg(mean_total_bill=("total_bill", 'mean'))
df = df.reset_index()
sb.barplot(x="size", y="mean_total_bill", hue="sex", data=df)

Stacked Bar Graph

The stacked bar graphs are also used to show subgroups in a dataset. But in this case, the rectangular bars defining each group are stacked on top of each other. Here is an illustration:

Here is a code snippet on how to do it in plotly:

import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x="sex", y="total_bill", color='time')
fig.show()

Seaborn code snippet:

import pandas
import matplotlib.pylab as plt
import seaborn as sns
plt.rcParams["figure.figsize"] = [7.00, 3.50]
plt.rcParams["figure.autolayout"] = True
df = pandas.DataFrame(dict(
number=[2, 5, 1, 6, 3],
count=[56, 21, 34, 36, 12],
select=[29, 13, 17, 21, 8]
))
bar_plot1 = sns.barplot(x='number', y='count', data=df, label="count", color="red")
bar_plot2 = sns.barplot(x='number', y='select', data=df, label="select", color="green")
plt.legend(ncol=2, loc="upper right", frameon=True)
plt.show()

Segmented Bar Graph

This is the type of stacked bar graph where each stacked bar shows the percentage of its discrete value from the total value. The total percentage is 100%. Here is an illustration:

Line Graph

A line graph is a type of graph that displays information as a series of data points called ‘markers’ connected by straight line segments. The measurement points are ordered (typically by their x-axis value) and joined with straight line segments. A line graph is often used to visualize a trend in data over intervals of time thus the line is often drawn chronologically.

The following is an illustration of Canadian life expectancy by years in Line Graph.

Here is how to do it in plotly:

import plotly.express as px
df = px.data.gapminder().query("country=='Canada'")
fig = px.line(df, x="year", y="lifeExp", title='Life expectancy in Canada')
fig.show()

Here is how to do it in seaborn:

import seaborn as sns
sns.lineplot(data=df, x="year", y="lifeExp")

Here are types of line graphs:

Simple Line Graph

In a simple line graph, only one line is plotted on the graph. One of the axes defines the independent variables while the other axis contains dependent variables.

Multiple Line Graph

Multiple line graphs contain two or more lines representing more than one variable in a dataset. This type of graph can be used to study two or more variables over the same period.

It can be drawn in plotly as:

import plotly.express as px
df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='lifeExp', color='country', symbol="country")
fig.show()

Here is the illustration:

In seaborn as:

import seaborn as sns
sns.lineplot(data=df, x='year', y='lifeExp', hue='country')

Here is the illustration:

Compound Line Graph

A compound line graph is an extension of the simple line graph, which is used when dealing with different groups of data from a larger dataset. Each line in a compound line graph is shaded downwards to the x-axis. In a compound line graph, each group of data represented by a simple line graph is stacked upon one another.

Here is an illustration:

Pie Chart

A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents. It is named for its resemblance to a pie that has been sliced.

Here is how to do it in plotly:

import plotly.express as px
df = px.data.gapminder().query("year == 2007").query("continent == 'Europe'")
df.loc[df['pop'] < 2.e6, 'country'] = 'Other countries' # Represent only large countries
fig = px.pie(df, values='pop', names='country', title='Population of European continent')
fig.show()

And here is how it looks:

Seaborn doesn’t have a default function to create pie charts, but the following syntax in matplotlib can be used to create a pie chart and add a seaborn color palette:

import matplotlib.pyplot as plt
import seaborn as sns

data = [15, 25, 25, 30, 5]
labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4', 'Group 5']

colors = sns.color_palette('pastel')[0:5]

plt.pie(data, labels = labels, colors = colors, autopct='%.0f%%')
plt.show()

This is how it looks:

These are types of pie charts:

Simple Pie Chart

This is the most basic type of pie chart and can also be simply called a pie chart.

Exploded Pie Chart

In an exploded pie chart, one (or more) of the sectors of the circle is separated (or exploded) from the chart. It is used to emphasize a particular element in the data set.

This is a way to do it in plotly:

import plotly.graph_objects as go

labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen']
values = [4500, 2500, 1053, 500]

# pull is given as a fraction of the pie radius
fig = go.Figure(data=[go.Pie(labels=labels, values=values, pull=[0, 0, 0.2, 0])])
fig.show()

And this is how it looks:

In seaborn the explode attribute of the pie method in matplotlib can be used as:

import matplotlib.pyplot as plt
import seaborn as sns

data = [15, 25, 25, 30, 5]
labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4', 'Group 5']

colors = sns.color_palette('pastel')[0:5]

plt.pie(data, labels = labels, colors = colors, autopct='%.0f%%', explode = [0, 0, 0, 0.2, 0])
plt.show()

Donut Chart

In this pie chart, there is a hole in the center which makes it look like a donut as the name suggests.

The way to do it in plotly is:

import plotly.graph_objects as go

labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen']
values = [4500, 2500, 1053, 500]

# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.show()

And this is how it looks:

This is how it is done in seaborn:

import numpy as np
import matplotlib.pyplot as plt
data = np.random.randint(20, 100, 6)
plt.pie(data)
circle = plt.Circle( (0,0), 0.7, color='white')
p=plt.gcf()
p.gca().add_artist(circle)
plt.show()

Pie of Pie

As the name suggests, a pie of pie is a chart that generates an entirely new (usually small) pie chart from the existing one. It can be used to reduce clutteredness and lay emphasis on a particular group of elements.

Here is an illustration:

Bar of Pie

This is similar to the pie of pie, with the main difference being that a bar chart is what is generated in this case rather than a pie chart.

Here is an illustration:

3D Pie Chart

This type of pie chart is represented in a 3-dimensional space.

Here is an illustration:

The shadow attribute can be set to True for doing it in seaborn / matplotlib.

import matplotlib.pyplot as plt
labels = ['Python', 'C++', 'Ruby', 'Java']
sizes = [215, 130, 245, 210]
# Plot
plt.pie(sizes, labels=labels,
autopct='%1.1f%%', shadow=True, startangle=140)
plt.axis('equal')
plt.show()

Histogram

A histogram is an approximate representation of the distribution of numerical data. The data is divided into non-overlapping intervals called bins and buckets. A rectangle is erected over a bin whose height or area is proportional to the number of data points in the bin. Histograms give a rough sense of the density of the underlying distribution of the data.

Here is a visual:

Plotly code:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

Seaborn code:

import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm")

It is classified into different parts depending on its distribution as below:

Normal Distribution

A normally distributed histogram chart is usually bell-shaped. As the name suggests, this distribution is normal and is the standard for how a normal histogram chart should look like.

Bimodal Distribution

In a bimodally distributed histogram chart, we have two groups of histogram charts that are of normal distribution. It is formed as a result of combining two processes in a dataset.

Visualization:

Plotly code:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", color="sex", marginal="rug",
hover_data=df.columns)
fig.show()

Seaborn:

import seaborn as sns
iris = sns.load_dataset("iris")
sns.kdeplot(data=iris)

Skewed Distribution

This is an asymmetric graph with an off-center pick usually tending towards the end of the graph. A histogram chart can be said to be right or left-skewed depending on the direction where the peak tends towards.

Random Distribution

This type of histogram chart does not have a regular pattern. It produces multiple peaks and can also be called a multimodal distribution.

Edge Peak Distribution

This distribution has a structure that is similar to that of a normal distribution with a large peak at one of its edges being the distinguishing factor.

Comb Distribution

The comb distribution has a “comb-like” structure, where the rectangular bars alternate between tall and short.

Area Chart

An area chart is represented by lines and the area between these lines and the axis. The area is proportional to the amount of quantity it represents.

These are types of area charts:

Simple area Chart

In a simple area chart, the colored segments overlap each other in the chart area. They are placed above each other such that they intersect.

Stacked Area Chart

In a stacked area chart, the colored segments are stacked on top of one another so that they do not intersect.

100% Stacked area Chart

This is a type of stacked area chart where the area occupied by each group of data on the chart is measured as a percentage of its amount from the total data. The vertical axis usually totals a hundred percent.

3-D Area Chart

This is the type of area chart measured on a 3-dimensional space.

We will look at visual representation and code for the most common type of Stacked Area Chart below.

Visual:

Plotly:

import plotly.express as px
df = px.data.gapminder()
fig = px.area(df, x="year", y="pop", color="continent",
line_group="country")
fig.show()

Seaborn:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme()

df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8],
'team_A': [20, 12, 15, 14, 19, 23, 25, 29],
'team_B': [5, 7, 7, 9, 12, 9, 9, 4],
'team_C': [11, 8, 10, 6, 6, 5, 9, 12]})

plt.stackplot(df.period, df.team_A, df.team_B, df.team_C)

Dot Graph

A dot graph consists of data points plotted as dots on a graph.

There are two types of these:

The Wilkinson Dot Graph

This type of dot graph uses the local displacement to prevent the dots on the plot from overlapping.

Cleaveland Dot Graph

This is a scatterplot-like chart that displays data vertically in one dimension.

Plotly code:

import plotly.express as px
df = px.data.medals_long()

fig = px.scatter(df, y="nation", x="count", color="medal", symbol="medal")
fig.update_traces(marker_size=10)
fig.show()

Visual:

Seaborn:

import seaborn as sns
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.stripplot(x="day", y="total_bill", data=tips)

Visual:

Scatter Plot

A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. A scatter plot can be used either when one continuous variable is under the control of the experimenter and the other depends on it, or when both continuous variables are independent.

Visual:

Plotly code:

import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()

Seaborn code:

import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip")

Scatter plots are grouped into different types according to the correlation of the data points. These correlation types are listed below

Positive Correlation

Two groups of data visualized on a scatter plot are said to be positively correlated if an increase in one implies an increase in the other. A scatter plot diagram can be said to have a high or low positive correlation.

Negative Correlation

Two groups of data visualized on a scatter plot are said to be negatively correlated if an increase in one implies a decrease in the other A scatter plot diagram can be said to have a high or low negative correlation.

No Correlation

Two groups of data visualized on a scatter plot are said to not correlate if there is no clear correlation between them.

Bubble Chart

A bubble chart displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses v1 through x location, v2 through y location, and v3 through its size.

Visualization:

Plotly code:

import plotly.express as px
df = px.data.gapminder()

fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp",
size="pop", color="continent",
hover_name="country", log_x=True, size_max=60)
fig.show()

Seaborn code:

import matplotlib.pyplot as plt
import seaborn as sns
from gapminder import gapminder # import data set

data = gapminder.loc[gapminder.year == 2007]

b = sns.scatterplot(data=data, x="gdpPercap", y="lifeExp", size="pop", legend=False, sizes=(20, 2000))

b.set(xscale="log")

plt.show()

Bubble Charts are divided into different parts according to the number of variables in the dataset, the type of data it visualizes, and the number of dimensions the graph is in.

Simple Bubble Chart

The simple bubble chart is the most basic type of bubble chart and is equivalent to the normal bubble chart.

Labeled Bubble Chart

The bubbles on a labeled bubble chart are usually labeled for easy identification, particularly when dealing with different groups of data.

Multivariable Bubble Chart

In a multivariable bubble chart, the variables in the dataset are usually more than 3 (particularly 4). Therefore, the fourth variable is usually distinguished with color.

Map Bubble Chart

A map bubble chart is usually used to illustrate data on a map.

3D Bubble Chart

This is a bubble chart designed in a 3-dimensional space. The bubbles on a 3D bubble Chart are spherical.

Radar Chart

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point.

Visualization:

Plotly code:

import plotly.express as px
import pandas as pd
df = pd.DataFrame(dict(
r=[1, 5, 2, 2, 3],
theta=['processing cost','mechanical properties','chemical stability',
'thermal stability', 'device integration']))
fig = px.line_polar(df, r='r', theta='theta', line_close=True)
fig.show()

Seaborn code:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
stats=np.array([1, 5, 2, 2, 3])
labels=['processing cost','mechanical properties','chemical stability',
'thermal stability', 'device integration']
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
fig=plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2)
ax.fill(angles, stats, alpha=0.25)
ax.set_thetagrids(angles * 180/np.pi, labels)
ax.set_title("Radar Chart")
ax.grid(True)

These are types of radar charts:

Simple Radar Chart

This is the most basic type of radar chart and is equivalent to the normal radar chart. It consists of a sequence of radii drawn from the center point and joined together.

Radar Chart with Markers

For radar charts with markers, each data point on the spider graph are marked

Filled Radar Chart

In the filled radar charts, the space between the lines and the center of the spider web is colored.

Pictogram Graph

Pictogram Charts use icons to give a more engaging overall view of small sets of discrete data. Typically, the icons represent the data’s subject or category, for example, data on population would use icons of people. Each icon can represent one unit or any number of units (e.g. each icon represents a million). Data sets are compared side-by-side in either columns or rows of icons, to compare each category to one another.

Here is an illustration:

In plotly, marker symbol can be used with graph_objs Scatter. Icons attribute can be used in the figure method of matplotlib. The complete code listing can be referred to in GitHub.

Spline Chart

A Spline chart is a form of line/area chart where each data point from the series is connected with a fitted curve that represents a rough approximation of the missing data points.

Visual illustration:

In plotly, it is achieved in line plot by specifying line_shape to be spline. Scipy interpolation and NumPy linspace can be used to achieve this in matplotlib. Again the complete code listing can be referred to in GitHub.

Box Plot

Box Plot is a good way of looking at how data is distributed. As the name suggests, it has a box. One end of the box is at the 25th percentile of the data. 25th percentile is the line drawn where 25% of the data points lie below it. The other end of the box is at the 75th percentile (which is defined similarly to the 25th percentile as above). The median of the data is marked by a line. There are two other lines which are called whiskers. The 25th percentile mark is also known as Q1 (representing the first quarter of the data). 75th percentile is also known as Q3. The difference between marks Q3 and Q1 (Q3 — Q1) is known as IQR (Inter Quartile Range). Whiskers are marked at last data points on either side within the extreme range of Q1–1.5 * IQR and Q3 + 1.5 * IQR. The data points outside these whiskers are called outliers as they deviate significantly from the rest of the data points.

Plotly code:

import numpy as np 
import plotly.express as px
data = np.array([-40,1,2,5,10,13,15,16,17,40])
fig = px.box(data, points="all")
fig.show()

Visualization:

Seaborn code:

import seaborn as sns
sns.set_style( 'darkgrid' )
fig = sns.boxplot(y=data)

Visualization:

Box Plot is useful in understanding the overall distribution of data even with large datasets.

Cheat Sheet

Here is a cheat sheet of methods and attributes in plotly and seaborn for generating these plots.

Conclusion

We looked at a variety of plots. We saw when to use each one of them. We looked at code in plotly and seaborn for generating these plots. We went over visualizations of these plots for better understanding. There is also a reference cheat sheet provided on which methods and attributes to be used in plotly and seaborn for generating these plots.

Now that you are equipped with these tools, techniques, and tips, wish you happy plotting!

--

--

Sameer Mahajan

Generative AI, Machine Learning, Deep Learning, AI, Traveler