ANOVA CHEATSHEET (Applies to both One-way and Two-way)
- BUSINESS PROBLEM – Test of Means across groups(1 or more factors)
- H0: Mean1 = Mean2 ….= Mean(One-way – only 1 Mean, Two-way – max 3 Mean if interaction, or min 2 Means if no interaction)
- Samples are independent and random
- Groups/Levels are normally distributed using Shapiro test
- For Shapiro test,
- H0: Data/Group/Levels is normal
- Ha: Data is not normal
- Favourable: P > 0.05 because we do not want to reject NULL hypothesis
- For n levels, we do Shapiro test n times
- Homogenity of variances using LEVENE’S test
- For LEVENE’s test,
- H0: Sigma1pow2 = Sigma2pow2… = Sigmanpow2
- Ha = Atleast one is different
- Favourable: P > 0.05
- For n factors, we do LEVENE’s test n times
- F-stat > F-crit => P-value < 0.05 => REJECT H0
- F-stat < F-crit => P-value > 0.05 => DO NOT REJECT H0
- If H0 is rejected, TukeyHSD test is done.
- In TukeyHSD test, pairwise testing of means is done between each levels.
- H0: For three levels, Mu1 = Mu2, Mu1 = Mu3, Mu2 = Mu3
- With Interaction: ‘*’
- Without Interaction: ‘+’
- For Anova, the first condition is data/levels should be balanced
Consider the following sample Dataset showing sales figures of different stores using combination of coupon and promotion:
NULL Hypothesis H0:
- For Coupon: Level1 Mean = Level2 Mean and
- For Promotion: Level1 Mean = Level2 Mean = Level3 Mean
- NULL hypothesis is that there is no interaction effect
Open R Studio and enter the following code selecting dataset stored in .csv file
## Summary Statistics
## Coupon and Instore Promotion Data
df$promotion; df$coupon;
#Step 2: Clearly identify the factors in the data
df$promotion<-factor(df$promotion, labels=c("1","2","3"))
df$coupon<-factor(df$coupon, labels=c("1","2"))
df$promotion; df$coupon;
###Calculate the means sales for promotion and coupon
tapply(df$sales, list(df$promotion, df$coupon), mean)
## Note: High Promotion and high coupon gives relatively higher sales values
## As the intensity of promotion and coupons are reduced, sales reduces accordingly
###Create a plot to identify the interaction effects between promotion & coupon
interaction.plot(df$promotion, df$coupon, df$sales)
##Both lines are almost parallel hence we can say interaction effect is negligible
## Test for normality for all the 3 groups in promotion
cat("Normality p-values by Factor Place: ")
for (i in unique(factor(df$promotion))){
cat(shapiro.test(df[df$promotion==i, ]$sales)$p.value," ")
####P values are greater than 0.05, hence we do not reject the null hypothesis
###Run the normality assumption test for both groups in coupon
cat("Normality p-values by Factor Place: ")
for (i in unique(factor(df$coupon))){
cat(shapiro.test(df[df$coupon==i, ]$sales)$p.value," ")
####P values are greater than 0.05, hence we do not reject the null hypothesis
###Create a qqplot to check the normality of entire dataset visually
qqnorm(df$sales, pch=19, cex=0.6)
qqline(df$sales, col = 'red')
###From the image, we can see the entire dataset is normally distributed
######## Levene's test for variance
## Test for homogeneity of variance
####Testing variance visually using a boxplot
###From the boxplot, we see that 2, 3 have almost same variance .
####Check for variances in groups using Levene's test
####As all the p values are greater than 0.05, we do not reject the null hypothesis
####ANOVA based on promotion
aov1 <- aov(df$sales~df$promotion)
##p value is less than 0.05, hence we reject the null hypothesis
###We reject the null hypothesis as all the intervals do not contain zero within them
###Business Intuiton
###High, medium & low are boosting sales in some manner
###Jump from low to high gives a sales boost of 4.6
###Jump from medium to high gives a sales boost of 2.1
###If the costs of doing a high promotion are less than the above boosted sales values
### We can conclude saying that all stores can run the high promotion for better sales
###However, if the costs of doing high promotion are more than the boosted sales figures
### We can actually drop the high promotion stores to medium or even low
####ANOVA based on coupon
aov1 <- aov(df$sales~df$coupon)
####p value is less than 0.05, hence we reject the null hypothesis
###Since the intervals do not contain zero, it means there is a significant difference
###In Business terms, I can offer higher coupons in all stores if my net profit increases
###We would need to calculate the net increase in profit using the sales boost of 2.666667
