Correlation

Many people think they know what correlation is – and they’re wrong. Sure, its a measure of how things change together, but it’s much more – and less – than just that.

Correlation measures the strength of a linear relationship between two variables.  It’s that never-mentioned, often-ignored, qualifier that can trip you up.  You calculate the correlation coefficient and find that it’s low and conclude the two entities aren’t closely related.  Boy! Are you in for a surprise!

Consider these two plots of \(y \text{ vs } x\). The first is random, and the correlation coefficient is zero. (What’s new?). The next plot shows a perfect quadratic relationship between \(y\) and \(x\). It’s correlation is also zero!

A Random Relationship has Zero Correlation.

But Zero Correlation Does NOT Mean No Relationship.

 

Find this hard to believe? Here are the data. You can copy them into your own analysis package (like EXCEL, for example) plot them and calculate the correlations for yourself.

Zero Correlation Dataset 1

Zero Correlation Dataset 2

    x

    y

    X

    Y

0.3716803
0.2778111
0.8152372
0.7715097
0.0163179
-0.4898738
-0.6060137
-0.8882970
0.2913591
-0.3661791
0.1320750
0.2637229
-0.7390226
-0.0395929
0.3387334
0.8598541
0.7388236
-0.5928083
0.9226006
-0.3571427

0.6396969
0.7942405
-0.6364473
-0.6845633
-0.6908862
-0.5034169
0.5745298
-0.1247591
-0.5129564
0.0745857
0.0733665
-0.0118882
0.1763471
0.1027599
-0.9737805
0.8747677
0.9479392
0.0843604
-0.3518961
-0.3034039

-0.1833341
0.6564449
0.8725039
0.3610921
0.7926144
0.1833341
-0.6564449
-0.4141061
-0.8725039
0.8269985
-0.5878715
-0.2950443
-0.3610921
-0.8269985
-0.0470327
0.4141061
0.0470327
0.2950443
-0.7926144
0.5878715

0.0336114
0.4309199
0.7612631
0.1303875
0.6282377
0.0336114
0.4309199
0.1714839
0.7612631
0.6839264
0.3455929
0.0870511
0.1303875
0.6839264
0.0022121
0.1714839
0.0022121
0.0870511
0.6282377
0.3455929

It’s always a good idea to read the fine print, and statistics has a lot of fine print.   Knowing that two variables are independent will tell you that their correlation coefficient is zero. But knowing the correlation coefficient to be zero does NOT mean the two variables are independent, or otherwise unrelated. It only means there is no linear relationship.