Journal of Applied Probability 38 (2001), 324-334 Reprint as PDF
Sankoff and Ferretti (1996) introduced a model of the evolution of chromosome size by reciprocal translocations. They ignored the existence of centromeres so on each step two of the chromosomes are randomly chosen with probabilities proportional to their lengths and each is broken at a place that is uniformly distributed along its length. They then paired the left half of one with the right half of the other. Sankoff and Ferretti were able to find the stationary distribution for the model with two chromosomes but since they defined the state of the model to be chromosome lengths in decreasing order they were not able to do this for three or more. In a project during the summer 2000 REU at Cornell, Akendra De, Michael Ferguson, and Suzanne Sindi worked with Durrett on a generalization of their model with explicit chromosome centromeres in which two of the 2k chromome arms were randomly chosen with probabilities proportional to their length and the resulting fragments were combined so that each of the new chromosomes had one centromere. This model is symmetric: p(x,y) = p(y,x) so its stationary distribution is uniform over all vectors x = (x1, x2, ... x2k) with sum equal to T the total number of nucleotides in the genome.
The next figure compares the average of 100 realizations of the Sankoff-Ferretti model with 22 chromosomes (the line with tick marks) and the new model with 44 chromosome arms (plain line) to the lengths of the 22 autosomes in the human genome (dots). (Sex chromosomes do not undergo reciprocal translocations and so they are left out of the analysis).
As one can see from this the new model fits better, but in both cases the short chromosomes are too short and the long chromosomes are too long. Sankoff and Ferretti fixed this problem by imposing an absolute lower bound on chromosome size. We took a softer approach by introducing a relative fitness f(x) for an arrangment with chromosome lengths x = (x1, x2, ... xk) and defining a Metropolis algorithm type whain in which q(x,y) = p(x,y) if f(x) < f(y) and q(x,y) = p(x,y)f(y)/f(x) if f(x) > f(y). Since f(x)q(x,y) = f(y) p(x,y), the stationary distribution is proportional to f(x).
The introduction of the fitness function causes the fit to be quite good for human, and rat (and pig and yeast not shown).
However the fit is not very good for sheep and wheat (and mouse and rice not shown).
In the case of sheep recent chromosome fusions may have resulted in a nonequilibrium configuration. The wheat genome on the other hand has resulted from the fusion of three closely related seven chromosome genomes so it is not surprising that its length distribution is more uniform than our model predicts.