QQ Plot stands for the Quantile Quantile plot. The q-q plot is the graphical method of finding out if the two samples come from the same population or the same distribution like Normal or Exponential distribution. Hence it is plotted by considering the quantiles of the first sample and the quantiles of the second sample.
Quantile or Percentile is the percentage of values below a given number. For example, the 90th percentile indicates the point where 90% of the data have values less than this number.
Uses of Quantile Quantile Plot
- It can be used to check whether a sample follows a particular distribution. In this case, the distribution of one sample can be kept fixed and the distribution of another sample can be compared with it.
- It can be used to compare the distributions of two samples.
If both the samples follow the same distribution, we should see points forming a roughly straight line.
Q-Q Plot in Python
Let us create a scatter plot for two cases: samples having the same distribution and samples having different distributions in python.
Step 1: Import Libraries
import numpyas numpy import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import scipy.stats as stats
Step 2: Create a normal distribution and an exponential distribution
#A standard normal distribution, which means mean=0,and stddev=1 norm_dist=stats.norm(loc=0, scale=1) #An Exponential Distribution expon_dist=stats.expon()
Step 3: Create a sample from a normal distribution and another sample from an exponential distribution.
#Generate a sample of points from Normal Dist. sample_norm=norm_dist.rvs(size=1000) #Generate a sample of points from Exponential Dist. sample_expon= expon_dist.rvs(size=1000)
#Plot Sample Normal Distribution sns.set() sns.distplot(sample_norm)
#Plot Sample Exponential Distribution sns.distplot(sample_expon)
Case 1: Q-Q Plot on Same Distribution
stats.probplot(sample_norm,dist="norm",plot=plt) plt.show()
The graph is plotted for sample normal distribution and normal distribution. It returns an approximately straight line. But why approx not exactly straight? This is because the sample size was small. Let us increase the sample size and plot the graph again.
sample_norm_1=norm_dist.rvs(size=100000) stats.probplot(sample_norm_1,dist="norm",plot=plt) plt.show()
Now the graph plots a better straight line. This is because as the sample size increases the sample becomes a better representation of the population. Therefore, now sample normal distribution is a better representation of normal distribution.
Case 2: Q-Q Plot on Different Distribution
Let us compare samples from exponentially distributed data to normal distribution.
stats.probplot(sample_expon,dist=stats.norm(),plot=plt) plt.plot()
As we can see the above graph is not at all a straight line, which means the distribution of the two data does not match.
Advantages of Quantile Quantile Plot
- The sample size of the two sample datasets need not be equal as the q-q plot is like a probability plot.
- There is no need to care about the range of the values as scaling data is important.