What is the best visualization to show the relationship or correlation between two variables?

Introduction to scatterplots

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose x-y coordinates relates to its values for the two variables.

Introduction to scatterplots

Continuing from the previous post, to discuss about the most widely used analysis and their best chart type.

Previous Article, discusses the most widely used data analysis on how to compare or rank two or more categories among each other based on one or more set of criteria.

This article, will discuss about another common and most widely used data analysis which is Correlation, It helps us in identifying the relationships between 2 measures. —it’s something we do all the time in data analysis for instances-

Q1. Does smoking cause cancer?

Q2. Does Car speed is related to the accident?

Q3. Does product price play any role in the quantity sold?

Some of the best visualizations for finding the relationship between 2 measures are as follow:

1.     Scatter Plot

2.     Bubble chart

3.     Combination of Line Chart with Bar Chart

Scatter Plot: The most sophisticated and classical chart for any statistician when it comes to correlation and distribution analysis. This is most widely used chart type to see whether there is any relationship between two variables. Be mindful that correlation doesn’t mean a relationship, instead it only helps you to see a potential relationship.

To Create a Scatter plot, we need 2 measures for which we want to find out any relations. To build a visualization, the variable on the y-axis is a dependent variable while the x-axis variable – independent.  

Let us now discuss the possible outcome of scatter plots

A. Positive Relationship: Correlation is Positive when the values increase together on both the axis. 

B. Negative Relationship: Correlation is Negative when one value decreases as the other increases

C. No Relationship: This means both the variables are not linked with each other: -

DATA

Below is the sample data of XYZ company’s Sales Data. It shows us how much the quantity has been sold at what price on each given month. 

In the Given data, let us try to analyze and find does product price play any role in the quantity sold? So let us put Sales price appears on the Y-axis, sales quantity on the X-axis for every given month.

It is now easy to see that lower product price leads to more quantity purchase, but the relationship is not perfect, it has a high negative relationship.

But does this mean that the company should lower prices to boost sales?

The answer to this would depend upon looking at the net profit company is making on either side of price. If company is making more net profit when there is high quantity sold, then we should definitely reduce the price.

This question cannot be answered via classic scatter plots. As scatter plot just uses 2 measures, however Bubble Chart or Combination of Line & bar charts can be used to look at 3rd measure Net Profit along with relationship of Product price & Quantity Sold.

Bubble Chart: The bubble chart is essential for visualizing the 3- or 4-dimensional data on the plane.  To create bubble chart let us put Sales Price (Independent Variable) on Y Axis & Quantity Sold (Dependent Variable) on X Axis, however the third measure (Net profit) can be represented by the size.

Now when we overlay the net profit onto the size of the cycles. From this, it looks like the company makes the greatest profit on both ends.

Combination of Bar & Line Chart: Another combination of two-line charts with a bar chart can give us same results. By having trend lines for sales price and quantity side-by-side at the top, to see the comparison of these two trends. The negative correlation remains clear, while the net-profit bar chart below it provides additional information without interrupting the correlation analysis.

Conclusion

To create charts that illuminate and deliver the right picture of data, one should first understand the what kind of comparison is required. Scatter plot is a sophisticated & a classical choice for any statistician when it comes to compare 2-dimensional data i.e. 2 measures, However bubble chart is essential for visualizing the 3- or 4-dimensional data on the go. In this chart, the third and fourth variables are represented by the size of a data point or its color. The size should be proportional to the value of the dependent variable and the color should correspond to a certain category. Along with the scatter plot & bubble chart we can use the combination of trend lines & bar chart in a single view which gives us the same results in terms of analysis. Hence choosing the right chart type depends upon individual requirements.

By Kanav Taneja

What is the best method to visualize correlation between two variables?

The most useful graph for displaying the relationship between two quantitative variables is a scatterplot.

How do you visualize a correlation between two variables?

The simplest way to visualize correlation is to create a scatter plot of the two variables. A typical example is shown to the right. (Click to enlarge.) The graph shows the heights and weights of 19 students.

What is the best visual representation to show a correlation?

Scatter plot (scattergram) A classical chart for any statistician when it comes to correlation and distribution analysis. It's perfect for searching distribution trends in data. The variable on the y-axis is a dependent variable while the x-axis variable – independent.

What kind of visualization is typically used in correlation analysis?

Scatter Plot A scatter plot displays data for two variables as represented by points plotted against the horizontal and vertical axis. This type of data visualization is useful in illustrating the relationships that exist between variables and can be used to identify trends or correlations in data.