Genetics of Evolution - Hardy-Weinberg Equilibrium

Preliminaries

If you are not already familiar with the structure of these exercises, read the Introduction first.

Note

Reminder: Save your work regularly.

Important

If you are using a Mac, we recommend that you use either Chrome or Firefox to complete these exercises. Some of the default settings in Safari prevent these exercises from running.

Contact information

If you have questions about these exercises, please contact Dr. Kevin Middleton (middletonk@missouri.edu) or drop by Tucker 224.

Learning objectives

The learning objectives for this exercise are:

  • Describe and identify the mechanisms by which variation arises and is fixed (or lost) in a population over time.
  • Model how random mating yields predicted genotype frequencies in Hardy-Weinberg Equilibrium (HWE), and how non-random mating affects allele and genotype frequencies.
  • Statistically test whether HWE is present in a population.

Mechanisms of biological evolution

Evolution is defined by the change in allele frequencies over time. In this context time refers to subsequent generations in a reproducing population. Two main scales of evolution are:

  1. Microevolution: both adaptive and non-adaptive (neutral) changes within populations across generations
  2. Macroevolution: higher level changes involving origination and diversification of species.

Separating microevolution from macroevolution like this might make it appear that they are completely distinct processes. In reality:

  • A continuum exists between the two: microevolutionary change can lead to observable macroevolutionary patterns.
  • The fundamental process of change in allele frequencies over time operates the same in both.

One final point to remember is that populations don’t evolve in isolation. Species that live in communities with one another interact with both native and introduced species at many different trophic levels (Figure 1).

Image of a fly, two birds, and a louse.
Figure 1: In the Galápagos islands, the medium ground finch (Geospiza fortis, upper right) and Galápagos mockingbird (Mimus parvulus, lower left) are both under threat from an introduced nest fly (Philornis downsi, upper left). Simultaneously, these species have unique parasites, a feather mite and feather louse (lower right) that both respond to and impact the evolutionary history of these species. Image from University of Utah

Hardy-Weinberg Equilibrium

If evolution is the change in allele frequencies over time, what defines the lack of change in allele frequencies? In population genetics, no change in allele frequencies is called Hardy-Weinberg Equilibrium (often abbreviated HWE for simplicity). The equations for HWE were developed during the first decade of the 1900s, shortly after the re-discovery of Mendelian genetics.

HWE allows us to predict genotype (and thus phenotype) frequencies under a specific set of conditions in which there are no additional forces, either internal or external, acting on a population:

  1. Infinite population size
  2. All mating is random
  3. No migration
  4. No selection
  5. No mutation

There are many processes that can lead to deviations from Hardy-Weinberg Equilibrium.

For each of the conditions above, (1) give an example of a process that would lead to a deviation from HWE, and (2) predict whether that process would lead to increased or decreased genetic variation within a single population in the subsequent generation.

Evolutionary biologists are often interested in determining if a population is in Hardy-Weinberg Equilibrium. If a population is found to be violating HWE, then it suggests that one of the processes listed above is happening in that population.

Evolution of single alleles

The first phenotypes that you learned about, as well as those described in the first set of exercises (Transmission of Genetic Information), were Mendelian traits. In Mendelian traits, a single gene is responsible for a single trait. In this context, you also learned about dominant and recessive alleles (and their variations), which lead to different observable phenotypes.

The simplest case to use for exploring Hardy-Weinberg Equilibrium is a single allele in a diploid organism. In this case, there are only three possible genotypes for two alleles (p and q)1:

  • pp
  • pq
  • qq

The HWE equation results from the basic rules of probability that you learned about in the first set of exercises. To explore HWE in a population, we will use the example of the Peppered moth (Biston betularia; Figure 2)2.

(a) Dark morph
(b) Light morph
Figure 2: The Peppered moth (Biston betularia) exhibits two different color morphs, dark and light. These phenotypes are controlled by a single dominant allele p. (a) Individuals with either pp or pq genotypes have the dark morph. (b) Light morphs have the qq genotype.

Imagine a population of Peppered moths with the following allele frequencies:

  • p = 0.1
  • q = 0.9

There are a few things to note here:

  • The frequencies summarize information about an entire population, not about any particular individuals.
  • The frequencies sum to 1: either p or q (just like a flipped coin can be either heads or tails).
  • The frequencies don’t tell us about whether one allele is dominant.

Probabilities of allele combinations

We can use the rules of probability that you have learned about to determine the probability of an individual having each of the possible genotypes: pp, qq, or pq. Because alleles assort independently, the probabilities are just the products of the probabilities.

\[pp = 0.1 \times 0.1 = 0.01\]

\[qq = 0.9 \times 0.9 = 0.81\]

\[pq = (0.1 \times 0.9) + (0.1 \times 0.9) = 0.18\]

Because pq is not distinguishable from qp, we add the probabilities of each combination (0.1 x 0.9 = 0.09 and 0.9 x 0.1 = 0.09).

Either way we add up these probabilities, the sum is 1:

Thus, basic probability leads to the expectation for a population in HWE:

\[p^2 + 2pq + q^2 = 1\]

A population in Hardy-Weinberg Equilibrium will satisfy this equation.

Counts of genotypes

We start with a population of 1,000 Peppered moth individuals that is in HWE. As above, the probability of p is 0.1 and of q is 1 - 0.1 = 0.9.

We expect the following genotypes in the population:

Notice that we calculate the probability of q as 1 - p, so we only have to change the value of p. With these starting parameters, there are 10 pp, 180 pq, and 810 qq individuals.

If each of the individuals makes 10 gametes, then pp individuals will contribute 2 p alleles, pq will contribute 1 p allele and 1 q allele, and qq individuals will contribute 2 q alleles to the gene pool.

We will have the following numbers of alleles represented and use those to calculate the resulting frequencies of p and q in the next generation.

In the first code block above, iteratively change the values of pop_size and p. Start by leaving p at 0.1 and change pop_size to larger or smaller values. Run the first code block and then the second. See how the frequencies of p and q change. Then set pop_size back to 1,000 and change the value for p to some number between 0 and 1. Again run the first code block and then the second to re-run all the calculations.

Testing for HWE in a population

A population in Hardy-Weinberg Equilibrium will satisfy the equation:

\[p^2 + 2pq + q^2 = 1\]

How can we use this information to statistically test whether a population satisfies the assumptions of HWE?

In the first sets of exercises (Transmission of Genetic Information and Case study: Investigating a newly discovered muscle mutation in mice), you learned about using the chi-squared test to determine if predicted counts of births per day and counts of mice from a test-cross showing the small muscle phenotype matched the theoretical predictions.

Because Hardy-Weinberg Equilibrium allows us to make predictions about the frequencies of alleles in a population, we can compare observed counts of alleles to the predicted counts of alleles.

If the population is in HWE, then the test will not be significant. Remember that the chi-squared test is really testing for deviations from the predictions. So the test will be significant (P < 0.05) if the counts do not match what we expect.

Imagine a population with 500 diploid individuals. These 500 individuals will have 1,000 total alleles. You observe the following genotypes among the 500 individuals:

  • pp: 5
  • pq: 95
  • qq: 400

We first calculate the counts for the p and q alleles from the counts of individuals. Each pp individual contributes 2 p and each pq individual contributes 1.

From the counts we can calculate the frequencies:

p = 0.105 and q = 0.895 are the observed frequencies in the population.

We use these two values to calculate the expected frequencies for pp, pq, and qq, assuming the population is in HWE.

The expected frequencies for pp, pq, and qq are converted into expected counts for 500 individuals:

You will notice that we have fractional individuals; we expect 5.5125 pp individuals. While this would not happen in real life, for our calculations, it is not a problem.

Recall that the equation for a chi-squared test is:

\[\chi^2 = \sum_{i=1}^n \frac{(Observed_i - Expected_i)^2}{Expected_i}\]

We have all the information we need to carry out the chi-squared test. All we have to do is organize the data into a format that we can use for the test.

We will make a data structure that holds the observed and expected counts:

And now performing the chi-squared test is just a matter of calling the function chisq.test():

The Observed column is the observed counts of each genotype. The Expected column is the expected counts of each genotype if the population is in HWE. rescale.p = TRUE tells the function that the supplied Expected values are in counts and not in probabilities.

In this case, the observed counts do not deviate from the expected counts (P = 0.97). This result makes sense because the counts only differ by 1 in 500.

Case Study: Testing for HWE in Small Muscle Phenotype Mice

Let’s apply what we have learned to the data for mice that have the small muscle phenotype from earlier exercise: Case study: Investigating a newly discovered muscle mutation in mice.

Comparing three of the replicate lines, it appeared that the frequency of the small muscle phenotype was increasing substantially in two of the lines (Lines 3 and 6; Figure 3). Although the allele frequencies all started at the same low level, the red lines appear to be increasing rapidly starting about generation 7.

Figure 3: Frequency of mice exhibiting the small muscle phenotype in three lines during the first 22 generations of selection. Data from Garland et al. (2002)

We can a use chi-squared test to determine whether these breeding lines deviate from HWE. If so, it would suggest that the allele for the small-muscle phenotype was being unintentionally favored in the Selected lines (red lines in Figure 3).

To fill in the equation for the chi-squared test, we need to know the observed and expected counts for genotypes. We can use the frequencies in the figure to determine the counts for each. For this test we will only consider only two generations: 14 and 22.

If these lines are in HWE, then the frequencies of the mini-muscle allele will not change significantly between generations 14 and 22.

In this experiment, each generation has 200 individuals (the population size). Because the “mini-muscle” allele is an autosomal recessive, we know that all individuals that show the phenotype must be homozygous for the allele (pp).

There appears to be a big change in the allele frequency in Line 6, so let’s begin with that line. Line 6 is one that is selected for increased wheel activity. Although the mini-muscle allele was not directly selected for, it appears that the frequency increases across generations.

Try to develop a hypothesis for why a line which is being selected for increased wheel activity would also show an increase in an allele that alters muscle physiology?

Here are the counts of animals in Line 6 with the small-muscle phenotype at generations 14 and 22.

  • Generation 14: 10
  • Generation 22: 136

At generation 14, there are 10 pp mice out of 200. At generation 22, there are 136 out of 200.

We can use this information to calculate the frequency of pp at generation 14. With 10 affected (pp) animals out of a population size of 200, there are 20 p alleles out of 400 total. We count up all the p alleles (\(p^2\)) and then take the square-root to get \(p\). We then use \(p\) to calculate \(q\).

The allele frequencies are p = 0.224 (“mini-muscle” allele) and q = 0.776 (“wild type”).

Using these frequencies we can determine the expected counts of animals (out of 200) with each of the genotype combinations.

We can then repeat these calculations for Generation 22, using the observed count of 136 mice at that later generation. We will do this all in one code block.

The allele frequencies are now p = 0.82 and q = 0.18. The expected counts are: 136, 57.8, and 6.2

We now have the expected counts from generation 14 that we would also predict to be found at generation 22 if the population followed HWE.

Combining all the counts into a table:

Note that the generation 14 counts are the “Expected” counts at generation 22. The “Observed” counts are what we find in reality.

At this point, we have what we needed to carry out the chi-squared test to determine if the observed counts at Generation 22 match the predicted counts. What is your prediction for this test?

What are the results of the chi-squared test for Line 6 comparing generation 14 with 22?

Is your hypothesis supported or not? Briefly explain.

Testing a Control line

Line 5 – the blue line in Figure 3 – is one of the randomly-bred control lines. In the control lines, all of the mice are tested for their wheel-activity, but the breeders are chosen randomly. This procedure is in contrast to the Selected lines in which the breeders are the top males and females.

The counts for small-muscle phenotype mice at generations 14 and 22 for Line 5 are:

  • Generation 14: 10
  • Generation 22: 19

As in Line 6, there are 10 mice at generation 14. But there are only 19 mice at generation 22.

Run the code blocks below to perform the same calculations

Calculate p and q for Line 5.

Calculate the observed counts.

Tabulate the observed and expected counts.

At this point, we have the data we need to carry out the chi-squared test to determine if the observed counts at Generation 22 match the predicted counts for Line 5. What is your prediction for this test?

Run the code block below to test your prediction.

What are the results of the chi-squared test for Line 5 comparing generation 14 with 22?

What mechanism(s) can explain the changing allele frequency in Line 5, which is not under selection? Can you hypothesize why Line 5 does not appear to follow HWE?

Feedback

We would appreciate your anonymous feedback on this exercise. If you choose to, please fill out this optional 4-question survey to help us improve.

References

Garland, T., Jr, M. T. Morgan, J. G. Swallow, J. S. Rhodes, I. Girard, J. G. Belter, and P. A. Carter. 2002. Evolution of a Small-Muscle Polymorphism in Lines of House Mice Selected for High Activity Levels. Evolution 56:1267–1275. Wiley.

Footnotes

  1. p and q are most commonly used as the allele names, but you could substitute any pair: A and B, A1 and A2, etc.↩︎

  2. Thanks to Dr. Elizabeth King for this example.↩︎