A decent researcher has probably used the p-test at least once in their life to show they results are statistically significant. In addition to this popular p-test, there exists another more generalized statistical method called ANOVA, which we are going to explore it briefly in this blog post.

What is ANOVA?

Developed by Ronald Fisher in 1921, ANOVA is a widely used statistical method for testing statistical differences between two or more means by analyzing variance1. It is more generalized than p-test, which can test two means at a time. The non-specific omnibus null hypothesis claims that all population means are equal. When the null hypothesis is rejected, the conclusion is that at least one population mean is different from at least one other mean. It is used to test general rather than specific differences, since ANOVA doesn’t not reveal which means are different from which. The Tukey HSD test offers more specific differences, but it’s not as commonly used as ANOVA.

Assumptions made by ANOVA/t-test:

  • Homogeneity of variance: the populations have the same variance.
  • Normality: the populations are normally distributed.
  • Independence: Each value is sampled independently from each other value. (If a subject provides two scores, then the values are not independent.)

Key concepts (Jargons)

Factor and level

Factors refers to the number of independent variables that the experiment wants to investigate. Level refers to the number of different settings within the factor. For example, one study can have three levels of age group within the age factor, or five different level of dosage within measuring the effect of a drug, or two levels within the gender factor(or three levels for non-binary as well). You get the idea.

An ANOVA conducted on a design in which there is only one factor is called a one-way ANOVA or a univariate ANOVA. If an experiment has two (or more) factors, then the ANOVA is called a two-way ANOVA or a multivariate ANOVA.

When all combinations of the levels are included, the design is called a factorial design. For example, a concise way of describing a double factor design is as a Gender (2) x Age (3) factorial design with the number of levels in parentheses.

MSE and MSB

One estimate is called the mean square error (MSE) and is based on differences among scores within the groups. MSE estimates regardless of whether the null hypothesis is true (the population means are equal). The second estimate is called the mean square between (MSB) and is based on differences among the sample means. MSB only estimates if the population means are equal.

A standard method of determining whether the difference is the F ratio between MSB to MSE. By comparing the obtained F ratio in the F distribution, the probability density at that F ratio is the statistical value that determines whether the null hypothesis can be rejected (e.g. ).

F distribution

The shape of the F distribution depends on two degrees of freedom that relates to the number of groups / conditions, , and the sample size, ( where = number of data per group). df of numerator (MSB) = and df of denominator (MSE) = . We often represent the distribution as a function, . The numerator is based on differences between groups, i.e. variance of the group means. The denominator is based on differences within groups. The F distribution has a long tail to the right which means it has a positive skew. E.g. 7 groups with 15 participants in each group: = 6, = 7(15-1) = 98.

Calculations for one-way ANOVA

Sum of squares

The between group sum of squares is the sum of squared differences between each group mean and the overall mean (Grand Mean, GM) multiply by , number of data in the respective factor.

The within group sum of squares is the sum of squared deviation of each data from its group mean. Data in group is index as , where .

Note that we can compute sum of squares error with the total sum of squares.

MSE and MSB

Now we have calculated the sum of squares and degree of freedom, the remaining mean square is very easy to compute.

Now you can go back to compute the F-ratio and compare the the value from the F-distribution. If the probability is lower than (usually 0.05), the null hypothesis can be rejected (the mean difference is significant).

Relationship to the t-test?

When we only test the significant between two means, the degree of freedom in the numerator = 2-1 = 1. The relationship to t-test is as follows:

where is the degrees of freedom for the denominator 2 of the F test, and df is the degrees of freedom for the t test. will always equal .

More: multivariate ANOVA

Notice that we have just covered the basic idea of ANOVA and demonstrated how to compute an univariate ANOVA. One can extend an univariate ANOVA to a multivariate ANOVA (MANOVA). In MANOVA, there are two or more dependent variables (the responding variables to the change in independent variables), unlike only one dependent variable in univariate ANOVA. Of course one can use multiple univariate ANOVA for each dependent variables, but MANOVA has more statistical power. But multiple univariate ANOVA is sometimes preferred if the dependent variables are uncorrelated or highly correlated. I’ll reserve this topic sometimes in the future. Technologynetworks provides a nice comparison between the two.


  1. http://onlinestatbook.com/2/analysis_of_variance/anova.pdf 

  2. Note that is sometimes referred to for degrees of freedom error.