ANCOVA

Author

James Van Slyke

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car)
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

Analysis of Covariance

The ANOVA allows us to analyze the effect of various levels of an independent variable on the dependent variable. The t test is used for two groups that vary based on the presence of an independent variable, but the ANOVA allows us to look at 3 or more groups. The ANCOVA is used when analyzing levels of an independent variable adjusted based on the expected effect of another second variable (Field).

In this case, the second variable is typically a continuous variable and usually functions as a type of pre-existing condition. So the second variable is not something manipulated in the experiment, but still thought to effect the relationship between the independent and dependent variable. This second variable is called a covariate.

ANOVA Review

Remember that the ANOVA or F statistic is based on comparing the explained variance or variance we can attribute to the manipulation of the independent variable to the unexplained variance or variance attributed to measurement error. So the basic comparison is between the variation between the groups compared to the variation within the groups.

\[ \frac{SS_{between}}{SS_{within}} \] However it takes a few more steps to get to the F value base on the mean squares \(MS\), so the full formula looks like this: \[ \frac{SS_{between}/df_{between}}{SS_{within}/df_{within}} = \frac{MS_{between}}{MS_{within}}= F \]

ANOVA in R Studio

When performing ANOVA in R Studio, we use the general linear model, which is highly related to ANOVA, so the basic formula in R Studio looks like this with \(M\) standing for Model and \(R\) standing for residuals:

\[ \frac{SS_{M}}{SS_{R}} \]

Confounds

One of the primary issues in experimental design is confounds. Confounds are other variables (sometimes called nuisance variables) besides the independent variable that have an effect on the dependent variable. So a lot of work in experimental design is trying to control for or eliminate confounds, which is why experimental psychologists work so hard to try and have everything the same in the control and experimental conditions, except for the manipulation of the independent variable.

Reducing Error Variance

The ANCOVA tries to account for other potential confounds by reducing the within groups error variance \(SS_R\). \(SS_R\) is variance that is unexplained. So if part of that unexplained variance can be accounted for by another variable (the covariate) it increases the likelihood of detecting a real effect of the independent variable.

The covariate is typically a variable that effects the relationship between the independent variable and the dependent variable, but is not likely to be something that could be manipulated in an experiment. This is often because it is something that is a pre-existing condition or a variable that develops in individuals participants over time.

Revisiting Some Mischief

For the section on the independent t test, we looked at the effect of an invisibility cloak on acts of mischief. The idea was that if you had an invisibility cloak (like the one used by Harry Potter to explore Hogwarts) you might be more inclined to do more types of mischief.

However, a potential covariate would be how mischievous you are more generally. So if you are already someone who is always getting into trouble (pulling pranks, being sent to the principal’s office, often serving detention, etc), you are already more likely to do a lot of mischief before you get access to an invisibility cloak. So an ANCOVA allows for a participant’s previous mischief to be accounted for when exploring the relationship between the primary relationship between the invisibility cloak and mischief. This would help to separate the influence of being mischievous more generally from the effect of the invisibility cloak on mischievous behavior.

Here’s a look at the data.

library(haven)
Invisibility_Baseline <- read_sav("Invisibility Baseline.sav")
Invisibility_Baseline
# A tibble: 80 × 3
   cloak        mischief1 mischief2
   <dbl+lbl>        <dbl>     <dbl>
 1 0 [No Cloak]         4        11
 2 0 [No Cloak]         5         7
 3 0 [No Cloak]         8         8
 4 0 [No Cloak]         6         7
 5 0 [No Cloak]         6        10
 6 0 [No Cloak]         4         7
 7 0 [No Cloak]         3        11
 8 0 [No Cloak]         6         7
 9 0 [No Cloak]         7        11
10 0 [No Cloak]         1         6
# ℹ 70 more rows

In this case, mischief1 is the baseline level of mischief or how often a subject engages in mischievous acts on a regular basis. Baseline data is used in a lot of different situations to refer to how persons perform on a variable in everyday situations. So you might think about a baseline heart rate or a baseline level of religiosity. So prior to any experimental manipulation, baseline gives you the current level of some measure of a variable.

An Example with Puppies

Let’s look at an example using puppies or more specifically using puppies as a form of therapy (Field). Here’s the dataset.

library(haven)
Puppy_love <- read_sav("Puppy Love.sav")
Puppy_love
# A tibble: 30 × 4
   Person Dose        Happiness Puppy_love
    <dbl> <dbl+lbl>       <dbl>      <dbl>
 1      1 1 [Control]         3          4
 2      2 1 [Control]         2          1
 3      3 1 [Control]         5          5
 4      4 1 [Control]         2          1
 5      5 1 [Control]         2          2
 6      6 1 [Control]         2          2
 7      7 1 [Control]         7          7
 8      8 1 [Control]         2          4
 9      9 1 [Control]         4          5
10     10 2 [15 mins]         7          5
# ℹ 20 more rows

The independent variable is Dose. There are three levels to the independent variable (Control, 15 mins, and 30 mins), which refers to the amount of time spent with the puppies.

Puppy_love$Dose
<labelled<double>[30]>: Dose of puppies
 [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3

Labels:
 value   label
     1 Control
     2 15 mins
     3 30 mins

The dependent variable would be Happiness, which refers to the amount of happiness participants experienced after spending time with the puppies, which is on a 0-10 point scale.

Puppy_love$Happiness
 [1] 3 2 5 2 2 2 7 2 4 7 5 3 4 4 7 5 4 9 2 6 3 4 4 4 6 4 6 2 8 5
attr(,"label")
[1] "Happiness (0-10)"
attr(,"format.spss")
[1] "F8.0"
attr(,"display_width")
[1] 11

So, the primary hypotheses would be:

Null - Puppy therapy has no effect on happiness

Alternative - Puppy therapy has an effect on happiness

As before, the first step will be to turn Dose into a factor to better show the levels of the independent variable. This time we’ll use the tidyverse to modify this type of variable

Puppy_love <- Puppy_love |> 
  mutate(Dose = factor(Dose, levels = c(1, 2, 3), 
         labels = c("Control", "15 mins", "30 mins")))

To begin, look at the mean difference of happiness based on the condition of puppy therapy.

Puppy_love |> 
  group_by(Dose) |>
    summarize(n = n(),
              mean = mean(Happiness))
# A tibble: 3 × 3
  Dose        n  mean
  <fct>   <int> <dbl>
1 Control     9  3.22
2 15 mins     8  4.88
3 30 mins    13  4.85

Notice that there is no real difference between 15 mins and 30 mins of puppy therapy, but there is a difference between the control group and both 15 and 30 mins of puppy therapy. Next, we’ll need to run the ANOVA to decide if this mean difference is significant or not.

Puppy Love ANOVA

To run the ANOVA we need to use the aov argument and save it as an object.

Puppy_ANOVA <- aov(Happiness ~ Dose, data = Puppy_love)

Once the ANOVA has been saved, we can look at the results

summary(Puppy_ANOVA)
            Df Sum Sq Mean Sq F value Pr(>F)
Dose         2  16.84   8.422   2.416  0.108
Residuals   27  94.12   3.486               

Notice that the F value is not significant. It’s only 2.416 with a p value of 0.1008. So on the face of it, it would appear that Puppy therapy doesn’t have an effect on happiness. This is where looking at the influence of a covariate might be helpful. The ability for puppy therapy to have an effect on your level of happiness would certainly be related to whether or not you liked puppies in the first place! Perhaps you are not a dog lover or you’ve had negative experiences with dogs. If we can account for the influence of this covariate, it may help us to see if puppy therapy actually has an effect on happiness.

In the dataset, the covariate is lised as Puppy_love and is based on a 0 to 7 scale.

Puppy_love$Puppy_love
 [1] 4 1 5 1 2 2 7 4 5 5 3 1 2 2 6 4 2 1 3 5 4 3 3 2 0 1 3 0 1 0
attr(,"label")
[1] "Love of puppies (0-7)"
attr(,"format.spss")
[1] "F8.0"
attr(,"display_width")
[1] 12

The code for this model is changed slightly to account for the covariate, Puppy_love.

Puppy_ANCOVA <- aov(Happiness ~ Dose + Puppy_love, data = Puppy_love)

To look at the results we have to use a slightly different version of ANOVA (“type = III”) using the car package. So this is what the summary looks like

Anova(Puppy_ANCOVA, type = "III")
Anova Table (Type III tests)

Response: Happiness
            Sum Sq Df F value  Pr(>F)  
(Intercept) 12.943  1  4.2572 0.04920 *
Dose        25.185  2  4.1419 0.02745 *
Puppy_love  15.076  1  4.9587 0.03483 *
Residuals   79.047 26                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first thing to notice here is that the variable Dose is now significant with an F value of 4.1419 and a p value of 0.02745. Puppy_love is also significant with an F value of 4.9587 and a p value of 0.03483.

If you look back at the original ANOVA, the residual sum of squares is 94.12. Notice that here the residual sum of squares is 79.047, so the inclusion of the co-variate has reduced the residual sum of squares. It’s important to note that the total sum of squares or total variance has not changed, but how that variance has been accounted for has changed, which changes the overall results.

Effect size for ANCOVA

The effect size for ANCOVA uses something slightly different then the eta squared \(\eta^2\) we used for the ANOVA called a partial eta squared. It is a proportion of variance that a variable explains in comparison to the variance not explained by other variables. Here is the formula.

\[ \frac{SS_{effect}}{SS_{effect}+SS_{residual}} \]

So you look at the sum of squares for the effect of interest (in this case, either the independent variable or covariate) then divide by that same effect plus the residual. Here is the formula for the independent variable Dose.

\[ Dose=\frac{25.185}{25.185+79.047} \]

In R Studio

25.185/(25.185+79.047)
[1] 0.2416245

And then the formula for the covariate Puppy_love

\[ Puppy_love=\frac{15.076}{15.076+79.047} \]

In R Studio

15.076/(15.076+79.047)
[1] 0.1601734

So writing out the results looks like this:

The covariate, love of puppies, was significantly related to the participant’s happiness, F(1, 26) = 4.96, p = 0.035, partial \(\eta^2\) =0.16. There was also a significant effect of puppy therapy on levels of happiness after controlling for the effect of love of puppies, F(2, 26) = 4.14, p = 0.027, partial \(\eta^2\) = 0.24.

Graphing the results

For graphing the results, a bar graph is the best way to look at the means. We can use the same basic formula we used for the one way anova to find our descriptive statistics.

ANCOVA_Descriptives <- Puppy_love |>
  group_by(Dose) |>
  summarize(n = n(),
            mean = mean(Happiness),
            sd = sd(Happiness),
            se = sd / sqrt(n),
            ci = qt(0.975, df = n - 1) * sd / sqrt(n))

And then create the graph

ggplot(ANCOVA_Descriptives, 
       aes(x = Dose, 
           y = mean)) +
  geom_bar(stat = "identity") +
  geom_errorbar(aes(ymin=mean-ci,
                    ymax=mean+ci))

Of course, it’s better to improve the graph a bit.

ggplot(ANCOVA_Descriptives, 
       aes(x = Dose,
           y = mean)) +
  theme_minimal() +
  geom_bar(stat = "identity", fill="steelblue") +
  geom_errorbar(aes(ymin=mean-ci,
                    ymax=mean+ci), width=.3, size=1) +
  labs(title = "Does Puppy Therapy Effect Happiness?", 
       y="Mean Level of Happiness", x="Time Spent in Puppy Therapy") 
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.