Partial Correlation Formula


\[\Huge r_{cor_xy,z} = \frac{r_{cor_xy} - r_{cor_xz} \cdot r_{cor_yz}}{\sqrt{(1-r_{cor_xz}^2) \cdot (1 - r_{cor_yz}^2)}}\]

Partial Correlation

Partial correlation allows us to control for variables that may be confounding variables in a data set. This allows us to see what the correlation between two variables would be if we control for a single variable. We can also control for more than one variable.

  • First order partial coefficient – is a correlation between two variables with just one additional variable partialed out of both.
  • Higher order partial correlation – is a correlation between two variables with more than one control variable partialed out by both.

Partial Correlation Overview

When performing a single-order partial correlation we can simple do this by running a correlation on each variable in question. For our purposes we are utilizing the following variables horsepower (hp), mile pers gallon (mpg) and lastly cylinders (cy). If you would like to follow along we are utilizing the mtcars data set that is built into R.

  • Predictor (X) (Independent Variable) = hp (horsepower)
  • Criterition (Y) (Dependent Variable) = mpg (miles per gallon)
  • Control (Z) (Control Variable) = cyl (cylinders)


Take a look at the structure of the mtcars data set

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Description of the variables in the mtcars data set

A data frame with 32 observations on 11 (numeric) variables.

  • Column 1 = mpg - Miles/(US) gallon
  • Column 2 = cyl - Number of cylinders
  • Column 3 = disp - Displacement (cu.in.)
  • Column 4 = hp - Gross horsepower
  • Column 5 = drat - Rear axle ratio
  • Column 6 = wt - Weight (1000 lbs)
  • Column 7 = qsec - 1/4 mile time
  • Column 8 = vs - Engine (0 = V-shaped, 1 = straight)
  • Column 9 = am - Transmission (0 = automatic, 1 = manual)
  • Column 10 = gear - Number of forward gears
  • Column 11 = carb - Number of carburetors

Correlation between Predictor and Criterion or X and Y

Here we find the correlation between our mpg and hp

cor.test(mtcars$mpg,mtcars$hp)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

Correlation between Predictor and Control or X and Z

Here we find the correlation between our hp and cyl

cor_test_2<- cor.test(mtcars$hp,mtcars$cyl)

tidy_result_2<- tidy(cor_test_2)# Convert result to a neat table
knitr::kable(tidy_result_2, caption = "Correlation Test Results",row.names = T,)
Correlation Test Results
estimate statistic p.value parameter conf.low conf.high method alternative
1 0.8324475 8.228604 0 30 0.6816016 0.9154223 Pearson’s product-moment correlation two.sided
cor.test(mtcars$hp,mtcars$cyl)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$hp and mtcars$cyl
## t = 8.2286, df = 30, p-value = 3.478e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6816016 0.9154223
## sample estimates:
##       cor 
## 0.8324475

Correlation between Criterion and Control or Y and Z

Here we find the correlation between our mpg and cyl

cor.test(mtcars$mpg,mtcars$cyl)
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$cyl
## t = -8.9197, df = 30, p-value = 6.113e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9257694 -0.7163171
## sample estimates:
##       cor 
## -0.852162

Partial Correlation: Putting it altogether

###Partial Correlation when controlling for another variable
###X is hp
###Y is mpg
###Z is cyl
cor_xy <- cor(mtcars$mpg,mtcars$hp)
cor_xz <- cor(mtcars$hp,mtcars$cyl)
cor_yz <- cor(mtcars$mpg,mtcars$cyl)

print(cor_xy)
## [1] -0.7761684
print(cor_xz)
## [1] 0.8324475
print(cor_yz)
## [1] -0.852162

Calculate the numerator of the formula

\[\large r_{cor_xy} - r_{cor_xz} \cdot r_{cor_yz} \]

\[\large -0.7761684 - (0.8324475 \cdot -0.852162) \]

###Top of the Formula
top_of_formula<- cor_xy - (cor_xz*cor_yz)
print(top_of_formula)
## [1] -0.06678832

Calculate the denominator of the formula

\[\large \sqrt{(1-r_{cor_xz}^2) \cdot (1 - r_{cor_yz}^2)} \]

\[\large \sqrt{(1-0.8324475^2) \cdot (1 - (-0.852162)^2)} \]

###Bottom of formula
bottom_of_formula<- sqrt((1-(cor_xz^2))*(1-(cor_yz^2)))
print(bottom_of_formula)
## [1] 0.2899505
final_answer= top_of_formula/bottom_of_formula
print(final_answer)
## [1] -0.2303439
print(round(final_answer,2))
## [1] -0.23

Note: Round up at the very end of the calculation.

\[\large r = \frac{-0.06678832}{0.2899505} \] \[\large r = -0.230439 \]

Partial Correlation utilizing the ppcor package

First order partial coefficient – is a correlation between two variables with just one additional variable partialed out of both.

Here we can utilize the ppcor package as an easy button method to calculate the partial correlation. You will need to install the ppcor package via CRAN: ppcor package

###Partial Correlation when controlling for another variable
##Load the ppcor library
library(ppcor)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
###ppcor function controlling for cylinders in the mtcars dataset
ppcor::pcor.test(mtcars$mpg, mtcars$hp, mtcars[, c("cyl")])

Higher order partial correlation – is a correlation between two variables with more than one control variable partialed out by both.

###ppcor function controlling for cylinders,displacement in the mtcars dataset
ppcor::pcor.test(mtcars$mpg, mtcars$hp, mtcars[, c("cyl","disp")])

References

Hatcher, L. (2013). Advanced statistics in research: Reading, understanding, and writing up data analysis results. Shadow Finch Media.

Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

Kim S (2015). ppcor: Partial and Semi-Partial (Part) Correlation. R package version 1.1, https://CRAN.R-project.org/package=ppcor.

© Intro Stats. All rights reserved.