Multifactor interventions and interactions

The Multiphase Optimisation Strategy for designing multifactor behavioural interventions should be used more. The idea is that you have a lot of potentially good ideas for things that might work, alone or in combination. You don’t want to test them one at a time, because that takes forever. You don’t want to test all against none, because they might not all be compatible, and in any case you don’t want to be stuck doing them all if you don’t need to.

Instead, as any sensible engineering or agricultural researcher would, you put together a small factorial (or fractional factorial) experiment and test them all at once. If you have, say, six factors, you can identify main effects and two-way interactions in a \(2^{6-2}\) design, which has only 16 intervention groups. Win!

People who are used to ordinary clinical trials (where a \(2^3\) or even \(2\times 3\) design is seen as … adventurous) often have unfounded concerns about these designs. There’s a nice misconceptions page at the PennState Methodology Center.

However, there’s one ‘misconception’ rebuttal that I’ve seen escape and mutate into a statistical myth in the other direction. And because it’s listed by a highly reputable website as correcting a common misconception, it can be hard to get people to listen

Misconception 5: There is always less statistical power for interactions than for main effects in a factorial ANOVA. Power decreases as the order of the interaction increases.

Reality: When effect coding is used, statistical power is the same for all regression coefficients of the same size, whether they correspond to main effects or interactions, and irrespective of the order of the interaction. (Note that the regression coefficient is not the only way to express the effect size of an interaction.)

That is, of course, true. If all your contrast vectors are orthonormal (or orthogonal of any other fixed length), every contrast coefficient is independent and has the same variance, so the power is the same. Hooray for balanced designs.

The note is also important, and that’s what people sometimes miss. I have seen, for example, a claim that if you use effect coding rather than dummy variable coding, the power for detecting an interaction increases. It doesn’t, because the models with and without the interaction are the same two models under either coding, just reparametrised. The test is the same; it must have the same power.

Suppose we have the world’s least interesting interaction test. We have a \(2\times 2\) factorial design of treatments A and B, with the following means for \(Y\) in a linear model

	A	not A
B	3	1
not B	1	0

Clearly, there is an interaction. We could define variables \(x_A, x_B, x_{AB}\) with dummy variable coding

	A	not A
B	1,1,1	0,1,0
not B	1,0,0	0,0,0

or variables \(z_A, z_B, z_{AB}\) with effect coding

	A	not A
B	+1,+1,+1	-1,+1,-1
not B	+1,-1,-1	-1,-1,+1

If we write \(\alpha\) for the coefficients in the first model and \(\beta\) for the coefficients in the second model, and include an intercept, we have

\(\alpha_A=1, \alpha_B=1, \alpha_{AB}=1\)
\(\beta_A=3/4, \beta_B=3/4, \beta_{AB}=1/4\)

There are two reasons for the difference; an interesting one and a boring one. The boring reason is that \(1-0=1\) but \(1--1=2\), so the \(\beta\)s are half as big as the \(\alpha\)s. The interesting reason is that the ‘main effect’ \(\alpha_A\) is the difference under ‘not B’, but the main effect \(\beta_A\) is averaged over B and not B; it includes some of what \(\alpha\) calls the interaction, so it’s bigger and \(\beta_{AB}\) is correspondingly smaller. We can think of \(\alpha_A\) as the average effect of intervention A in settings where intervention B is naturally absent; \(\beta_A\) is the average effect of intervention A in settings where intervention B is naturally present half the time.

The regression coefficient is not the only way to express the effect size of an interaction.

Now, we know that the test for \(\alpha_{AB}=0\) must have the same power as the test for \(\beta_{AB}=0\), because it compares the same two models. What’s a bit less obvious is what happens to the ‘main effects’. Let’s simulate

x_dummy<-data.frame(a=c(0,0,1,1),b=c(0,1,0,1),ab=c(0,0,0,1))
z_effects<-data.frame(a=c(-1,-1,1,1),b=c(-1,1,-1,1),ab=c(1,-1,-1,1))
i<-rep(1:4,1000)
mu<-c(0,1,1,3)
y<-rnorm(4000,mu[i])
x_model<-lm(y~a+b+ab,data=x_dummy[i,])
z_model<-lm(y~a+b+ab,data=z_effects[i,])
x_noint<-lm(y~a+b,data=x_dummy[i,])
z_noint<-lm(y~a+b,data=z_effects[i,])

First, lets look at the interaction tests

anova(x_noint, x_model)

## Analysis of Variance Table
## 
## Model 1: y ~ a + b
## Model 2: y ~ a + b + ab
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3997 4373.2                                  
## 2   3996 4155.7  1    217.42 209.06 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(z_noint,z_model)

## Analysis of Variance Table
## 
## Model 1: y ~ a + b
## Model 2: y ~ a + b + ab
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3997 4373.2                                  
## 2   3996 4155.7  1    217.42 209.06 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The tests are identical, as they should be.

Now look at the coefficients and their tests

summary(x_model)

## 
## Call:
## lm(formula = y ~ a + b + ab, data = x_dummy[i, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5432 -0.7034  0.0188  0.7051  3.9701 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.01472    0.03225   0.457    0.648    
## a            1.03193    0.04561  22.627   <2e-16 ***
## b            1.00289    0.04561  21.990   <2e-16 ***
## ab           0.93257    0.06450  14.459   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.02 on 3996 degrees of freedom
## Multiple R-squared:  0.5265, Adjusted R-squared:  0.5261 
## F-statistic:  1481 on 3 and 3996 DF,  p-value: < 2.2e-16

summary(z_model)

## 
## Call:
## lm(formula = y ~ a + b + ab, data = z_effects[i, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5432 -0.7034  0.0188  0.7051  3.9701 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.26528    0.01612   78.47   <2e-16 ***
## a            0.74911    0.01612   46.46   <2e-16 ***
## b            0.73459    0.01612   45.56   <2e-16 ***
## ab           0.23314    0.01612   14.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.02 on 3996 degrees of freedom
## Multiple R-squared:  0.5265, Adjusted R-squared:  0.5261 
## F-statistic:  1481 on 3 and 3996 DF,  p-value: < 2.2e-16

The \(t\)-statistics for the interaction terms, again, are the same. In each model, the \(t\)-statistics for the main effects are larger than for the interaction effect. In the dummy-variable model, the coefficients are the same, but the standard error is larger for the main effects. In the effects-coding model, the standard errors are all the same, but the coefficient is smaller for the interaction.

The regression coefficient is not the only way to express the effect size of an interaction.

Finally, look at the coefficients and tests for the no-interaction model. This model is misspecified – the residual variance isn’t constant – but with equal group sizes that doesn’t matter much.

summary(x_noint)

## 
## Call:
## lm(formula = y ~ a + b, data = x_dummy[i, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7764 -0.7356  0.0045  0.7083  4.2032 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.21842    0.02865  -7.625 3.04e-14 ***
## a            1.49822    0.03308  45.294  < 2e-16 ***
## b            1.46917    0.03308  44.416  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.046 on 3997 degrees of freedom
## Multiple R-squared:  0.5017, Adjusted R-squared:  0.5015 
## F-statistic:  2012 on 2 and 3997 DF,  p-value: < 2.2e-16

summary(z_noint)

## 
## Call:
## lm(formula = y ~ a + b, data = z_effects[i, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7764 -0.7356  0.0045  0.7083  4.2032 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.26528    0.01654   76.50   <2e-16 ***
## a            0.74911    0.01654   45.29   <2e-16 ***
## b            0.73459    0.01654   44.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.046 on 3997 degrees of freedom
## Multiple R-squared:  0.5017, Adjusted R-squared:  0.5015 
## F-statistic:  2012 on 2 and 3997 DF,  p-value: < 2.2e-16

Now the interesting reason for differences in coefficients has gone and the \(\alpha\)s are just twice the \(\beta\)s; there’s no difference in the tests.

If you stick to quantities that have the same meaning under the two parametrisations, you get the same answer. So what pairs of statements about interaction and power describe the simulation setting and have the same meaning? Working in terms of coefficients isn’t very satisfactory

dummy: if the interaction coefficient is the same size as the main effects you need four times the sample size to detect it
effects: if the interaction coefficient half as large as the main effects you need four times the sample size to detect it

Working in terms of contrasts is a bit better

dummy: if the difference between AB and the average of A and B is the same as the average of A and B, you need four times the sample size to detect it
effects: if the difference between AB and the average of A and B is the same as the average of A and B, you need four times the sample size to detect it

Or in sound bites

dummy: Interactions are hard to detect because the uncertainty is larger than you might expect
effects: Interactions are hard to detect because the coefficient is smaller than you might expect

And, as an update, what value would we need for the top left (AB) mean of \(Y\) to have \(\beta_{AB}=\beta_A=\beta_B\)? We would need

	A	not A
B	4	1
not B	1	0

So, that’s what the table looks like when the interaction parameter is the same size as the main-effects parameter under the effects coding.

The regression coefficient is not the only way to express the effect size of an interaction.