Correlation analysis quantifies the degree to which two variables vary together. If two variables are independent, then the value of one variable has no relationship to the value of the other variable. If they are correlated, then the value of one is related to the value of the other. Figure 5.1 illustrates this relationship. For example, when an increase in one variable corresponds to an increase in the other, a positive correlation results. However, when an increase in one variable leads to a decrease in the other, a negative correlation results.
A commonly used correlation measure is Pearson’s r. Pearson’s r has the following characteristics:
Pearson’s correlation coefficient measures the linear association between two variables and ranges between -1.0 ≤ r ≤ 1.0.
When r is near -1.0 then there is a strong linear negative association, that is, a low value for x tends to imply a high value for y.
When r = 0, there is no linear association, There may be an association, just not a linear one.
When r is near +1.0 then there is a strong positive linear association, that is, a low value of x tends to imply a low value for y.
Remember that just because you can compute a correlation between two variables, it does NOT necessarily imply that one causes the other. Social/demographic data (e.g., census data) are usually correlated with each other at some level.
For fun: try and guess the correlation value using this correlation applet [4].
Interactively build a scatterplot [5] and control the number of points and the correlation coefficient value.
Links
[1] https://creativecommons.org/licenses/by-nc-sa/4.0/
[2] https://senseaboutscience.org/wp-content/uploads/2016/11/Makingsenseofstatistics.pdf
[3] http://www.tylervigen.com/spurious-correlations
[4] http://www.rossmanchance.com/applets/GuessCorrelation.html
[5] https://shiny.rit.albany.edu/stat/corrsim/