Descriptive Statistics: Covariance and Correlation

Abdulazeez Abdullah
Feb 13, 2023
2 min read

Last time out, we talked about the measures of distribution and dispersions. We will now focus on association measures, i.e. covariance and correlation.

A Scatter plot is a chart that helps us understand the relationship between two variables. Consider the chart below.

From the chart, we can say that as cost increases, revenue also increases. What if we want to quantify this relationship, i.e. put some numbers to it? That’s where the measures of association come in. They tell you the direction of the relationship and quantify it.

What are measures of association?

These are any number of different coefficients or factors that can be employed to quantify the relationship between two or more variables.

Covariance

The direction of the relationship between two variables is measured by covariance. Variables move together when the covariance is positive, i.e. as one increases, so does the other, and inversely when the covariance is negative.

Let’s see how to do this in excel.

One drawback of covariance is that it is greatly affected by units of measurement.

As a result, the covariance measure is not the best choice if one wishes to determine how strongly two variables are related. The measurement is acceptable as long as we are only interested in the direction of the relationship. What happens to the other variable when one changes or rises, or falls, in other words? The covariance, however, cannot be immediately read regarding the relationship’s strength. To do that, we add another metric for measuring the relationship between two variables—the correlation coefficient.

The values for covariance range from -∞ to +∞

Correlation

Similar to covariance, correlation is used to quantify the relationship between two variables, but unlike covariance, it is not affected by the unit of measurement.

The values here range from -1 to 1, i.e. the stronger the positive relationship, the more the number tends towards one. Correlations of >+0.5 or <-0.5, respectively, are often regarded as indicating a strong positive or strong negative association between two variables.

It is crucial to realize that these metrics show us how two variables differ from one another. In other words, how does an increase or reduction in one variable affect the change in the other variable?

We examined the heights and weights of a few Olympic competitors and concluded that there was a significant positive relationship between an athlete’s weight and height.

This correlation does not, however, demonstrate that an increase in height is the reason for weight increase. This is where causality, a different idea from correlation, enters the picture. Unfortunately, this distinction is frequently overlooked.

To establish causation, we must rule out other variables that might have caused the variable in question to change. For example, there is a strong relationship between smoking and lung cancer, but this does not mean smoking causes cancer.

But over time, meticulously linked research has demonstrated that the link between smoking and cancer is more significant than a simple correlation. It has been conclusively proven that smoking does cause cancer.

Studies had to account for various potential cancer-causing factors to demonstrate a clear causal relationship between smoking and cancer.

I hope this has been informative. See you next time. Adios.