What is Correlation?
Correlation is a statistical measure that is used to find the relationship between two or more variables. It measures the degree to which variables are related to each other and tells both the strength and direction of the relationship.
Pearson Correlation Coefficient Formula
The Pearson correlation is the most common way of measuring the relationship between two variables. Its value always lies between -1 and +1. The values -1 and +1 denote a strong correlation whereas 0 represents no correlation.
Pearson Correlation Coefficient =
where,
- cov(x,y) is covariance between variable x and y
- σ is the standard deviation of x and y respectively
Covariance Formula
Cov(x,y) =
where
- x is variable1 and y is variable2
- = current value of variable1
- = mean of variable1
- = current value of variable2
- = mean of variable2
- n = data size
Types of Correlation
There are three types of correlation: positive, negative and zero correlation. let’s understand the types using a scatter plot.
1. Positive Correlation
- The positive correlation between the two variables states that their values move in the same direction.
- The +1 value of correlation is the perfect positive correlation.
- The value of correlation also signifies the strength. For example correlation 0.1 and 0.9 both are positive, but 0.9 value denotes a much strong correlation as compared to 0.1.
2. Negative Correlation
- The positive correlation between the two variables states that their values move in the opposite direction.
- The -1 value of correlation is the perfect negative correlation.
- The value of correlation also signifies the strength. For example correlation -0.2 and -0.8 both are negative, but -0.8 value denotes a much strong correlation as compared to -0.2.
3. Zero Correlation
The zero correlation between two variables states that there is no relation between them.
When to use Pearson Correlation Coefficient
Below are a few condition data should follow for a meaningful correlation:
- The relationship between two variables is linear
- There is no outlier in the data
- When both variables are normally distributed
- When both variables are quantitative
Correlation is not Causation
“Correlation is not Causation” means just because two variables are strongly correlated, it does not mean that one is the cause and the other is the effect or that one has caused the other.
For Example, the sales of ice creme and sales of umbrellas may be highly correlated but ice creme sales have no effect on sales of umbrellas and vice versa.
Limitations of Correlation Analysis
The study of how variables are correlated is called correlation analysis. But it has a few limitations:
- It cannot measure cause and effect.
- It assumes that two variables are linearly correlated.