9.4 STARTING FROM SCRATCH WITH A NEW MAPPING PROJECT

Previous Next

9.4 STARTING FROM SCRATCH WITH A NEW MAPPING PROJECT

9.4.1 Overview

There are two types of experimental situations in which established mapping panels may not be sufficient to the needs of an independent investigator. In the first instance, an investigator may want to pursue the mapping of a large group of cloned loci to obtain, for example, a very high resolution map for an isolated genomic region. For extended mapping projects of this and other types, it becomes both cost-effective and time-effective to perform an "in-house" cross for the production of a panel of samples over which the investigator has complete control.

With a second class of experimental problems, an investigator will have no choice but to perform an "in-house" cross for analysis. This will be the case in all situations where the test locus is defined only in the context of a mutant phenotype. Often, the goal of such projects will be to clone the locus of interest through knowledge of its map position. To map a mutationally defined locus, one will have to generate a special panel of samples in which segregation of the mutant and wild-type alleles can be followed phenotypically in animals prior to DNA preparation for marker locus typing. What follows in this Section is a summary of the choices that confront an investigator in the development of a mapping project from scratch, and the process by which an investigator should proceed through the project from start to finish.

At the outset, the investigator must make decisions concerning the form of the breeding cross itself. In particular, which parental strains will be used and what type of breeding scheme will be followed? To map a mutationally defined locus, one will obviously have to include one strain that carries the mutation. The second parental strain should be chosen based on the contrasting considerations of genetic distance (the more distant the strain, the greater the chance of uncovering polymorphisms at DNA marker loci) and ability to generate offspring in which segregation of the mutant allele can be observed. The choice of breeding scheme is limited typically to one of two different two-generation crosses: the outcross-backcross (F₁ X P, where P represents one of the original parental strains) or the outcross-intercross (F₁ X F₁) illustrated in Figures 9.11 and 9.12, respectively. If the purpose of the analysis is to map loci associated with a mutant phenotype, the nature of the phenotype may limit this choice further as discussed more fully in Sections 9.4.2 and 9.4.3.

Once the strains and a breeding scheme have been chosen, one can begin to carry out the first generation cross. The number of mating pairs that should be set up need not be as large as one might think because of the expansion that will occur at the second generation. Backcrosses are usually peformed with females that are F₁ hybrids and intercrosses, by definition, are always based on F₁ hybrid females. As such, the second generation cross is likely to be highly productive with larger and more frequent litters than one obtains with inbred females (see Section 4.1). Consider the goal of obtaining 1,000 offspring from either an outcross-intercross or an outcross-backcross. ⁹¹ If one assumes that 90% of the second generation mating pairs will be productive with an average of four litters with eight pups in each, one would need to set up only 35 such matings. Working backwards, to generate the 35 F₁ females and/or males required would entail only ten initial matings between the two parental strains with the assumption that 50% would be productive and these would each have three litters of five pups.

An alternative backcross strategy that may sometimes be even more efficient is to set up F₁ males with inbred females of one of the two parental strains in the second generation. This approach is only effective when the backcross parent to be used is a common inbred strain such as B6. In this situation, there is no limit to the number of females that can purchased at a modest cost from various suppliers, and individual F₁ males can be rotated among multiple cages of these females. Thus, what is sacrificed in terms of hybrid vigor is made up for in terms of absolute number of crosses. As few as ten males could be rotated every 5 days among cages with two females each for a total of 120 matings in a month. One should be aware, however, that an analysis of this type will be based entirely on recombination in the male germline which may or may not be beneficial to the investigator according to different experimental requirements as discussed at the end of Section 9.4.4.2.

When offspring from the second-generation cross are born, one will need to analyze each for expression of the mutant phenotype. In some cases, it will be possible to use both the expression and non-expression of phenotypes as direct indicators of genotype. In other cases, it will only be possible to use phenotypic expression as an indicator of genotype in a subset of animals. This will be true for all phenotypes that are only partially penetrant as well as those that are only expressed in homozygous offspring from a second generation intercross. In both cases, the lack of phenotypic expression in any particular animal will preclude an unambiguous determination of its genotype. When it is only possible to incorporate a subset of offspring into the ultimate genetic analysis, it will obviously be necessary to generate more offspring at the outset to achieve the same level of genetic resolution. Once "phenotyping" is accomplished, animals can be converted into DNA for incorporation into the panel that will be used for analysis of marker segregation. Optimal strategies for determining map position are discussed in Sections 9.4.4 and 9.4.5.

9.4.2 Choosing strains

9.4.2.1 For developing DNA marker maps

Upon commencing a new linkage study, an investigator will first have to decide upon the two parental mouse strains that will be used in the initial cross to generate F₁ animals. This choice will be informed by the goal of the linkage study. If the goal is simply to develop a new panel for mapping loci defined as DNA markers, there will be no a priori limitation on the strains that can be chosen. The most important considerations will be the degree of polymorphism that exists between the two parental strains and the ease with which they, and their offspring, can be bred to produce a large panel of second generation animals for DNA typing.

As discussed earlier in this chapter and previous ones, the traditional inbred M. musculus strains show minimal levels of interstrain polymorphism. It was for this reason that the initial two-generation mapping panels were all based on interspecific crosses between a M. musculus strain and a M. spretus strain as described in Section 9.3. M. spretus is the most distant species from M. musculus that still allows the production of fertile F₁ hybrids (see Section 2.3.5). As such, the M. musculus X M. spretus cross will provide the highest level of polymorphism that is theoretically obtainable for the purpose of mapping.

It is certainly possible to replicate this interspecific cross with any one of a number of inbred M. musculus strains (such as B6, C3H, or DBA) and an inbred M. spretus strain (such as SPRET/Ei) that can be purchased from the Jackson Laboratory or another supplier. However, the interspecific cross is less than ideal for several reasons. First, breeding between M. musculus strains and M. spretus is generally poor with infrequent, small litters. As a consequence, one must begin with a larger number of initial mating pairs, and wait for a considerable length of time before obtaining a complete panel of second generation animals. Second, only F₁ females are fertile. This rules out any possibility of setting up a second generation intercross. Finally, as discussed in Section 9.3.2.1, M. musculus and M. spretus differ by at least one, and perhaps many more, small inversions that will act to eliminate recombination and, as a consequence, distort the true genetic map.

These limitations have led investigators to test the practicality of using an intersubspecific cross as an alternative that would still show sufficient levels of polymorphism but without any of the problems inherent in the interspecific cross. In particular, several laboratories have published mapping results based on crosses between the inbred strain B6, which is derived predominantly from the subspecies M. m. domesticus and the inbred strain CAST/Ei (distributed by the Jackson Laboratory) which is derived entirely from the M. m. castaneus subspecies (Dietrich et al., 1992; Himmelbauer and Silver, 1993).

The two subspecies M. m. domesticus and M. m. castaneus evolved apart from a common ancestor approximately one million years before present (see Figure 2.2 and Section 2.3.2). As a consequence, the level of polymorphism between the two is much greater than that observed among strains that are predominantly derived from just M. m. domesticus, but not quite as high as that observed between M. m. domesticus and M. spretus which evolved apart three million years ago. A sense of the relative levels of polymorphism that exist in various pairwise comparisons can be achieved by looking at the frequencies with which TaqI RFLPs are detected at random loci. In a comparison of two predominantly M. m. domesticus strains — B6 and DBA — 19% of the tested loci showed TaqI RFLPs; between B6 and the M. m. castaneus strain CAST/Ei, 39% of the tested loci showed RFLPs. Finally, between B6 and M. spretus, 63% of the tested loci showed RFLPs (LeRoy et al., 1992; Himmelbauer and Silver, 1993). Relative rates of polymorphism have also been surveyed at random microsatellite loci with the following results: among M. m. domesticus strains, the average rate of polymorphism is ~50%; between B6 and CAST/Ei, the polymorphism rate is 77%, and between B6 and M. spretus, the polymorphism rate is 88% (Love et al., 1990; Dietrich et al., 1992).

The bottom line from these various comparisons is that the level of polymorphism inherent in the B6 X CAST/Ei cross seems more than sufficient for generating high resolution linkage maps, especially with the use of highly polymorphic markers like microsatellites. Furthermore, the somewhat lower rate of polymorphism is more than compensated for by various advantages that this cross has over the interspecific cross with M. spretus. First, the two strains, B6 and CAST/Ei, breed easily in the laboratory with the production of large numbers of offspring. Second, both male and female F₁ hybrids are fully fertile. Third, the single well-characterized interspecific inversion polymorphism does not exist between B6 and CAST/Ei (Himmelbauer and Silver, 1993), and it is likely that most other postulated interspecific inversion polymorphisms are also absent as well (Copeland et al., 1993). Consequently, the linkage map that one obtains with this intersubspecific cross is much more likely to represent the map that would have be derived from a cross within the M. m. domesticus subspecies itself.

As indicated in Figure 2.2, and as discussed in Section 2.3.2, there are a number of other M. musculus subspecies that are just as divergent from M. m. domesticus as is M. m. castaneus. Inbred strains have been developed from the M. m. musculus subspecies, and at least two (Skive and CzechII) are available from the Jackson Laboratory. In addition, another set of inbred strains (MOLF/Ei) have been derived from the faux subspecies M. m. molossinus which is actually a natural mixture of M. m. musculus and M. m. castaneus (see Figure 2.2 and Section 2.3.3). It is likely that each of these inbred strains could be used in place of CAST/Ei with a similar level of polymorphism relative to M. m. domesticus, and with the same advantages described above. In fact, the availability of several unrelated wild-derived strains provides a means for overcoming the limitation to genetic resolution caused by recombination hotspots as described in Section 7.2.3.3 (and illustrated in Figure 7.5). This is because F₁ hybrids between B6 and CAST/Ei, or MOLF/Ei or Skive are all likely to have different hotspots for recombination. Thus, by combining data from all three crosses, one will be able to "see" recombination sites that are spread out among perhaps three times as many possible locations.

9.4.2.2 For mapping a simple mutation

Another factor in strain choice comes into play when the goal of a breeding study is to map a locus defined solely by a mutant phenotype. In this case, it is obvious that one of the parental strains must carry the mutant allele to be mapped. Ideally, the mutation will be carried in an inbred, congenic, or coisogenic strain. In the second best situation, the mutation will be present in a genetic background that is a mixture of just two well-defined inbred strains. Finally, the most potentially difficult situation occurs when the mutation is present in a non-inbred, undefined genetic background.

In this last situation, it is advisable to use a single male as the sole representative of the mutant strain in matings to produce all F₁ hybrids. The advantage to this approach is that the number of alleles contributed by the mutant "strain" at any one locus in all of the F₁ animals will be limited to just two. If, on the other hand, one had begun with multiple males as representatives of the non-inbred mutant strain, the number of potential alleles at every locus in the panel would be twice the number of males used. The larger the number of alleles, the more complicated the analysis could become. By rotating a single male among a large set of cages containing females from the second strain, it will be possible to produce a sufficient number of F₁ hybrids in a reasonable period of time.

In essentially all cases, the mice that carry the mutation will be derived from the traditional inbred strains which are themselves mostly derived from the M. m. domesticus subspecies. For all of the same reasons discussed in the previous subsection, the best choice of a second parental strain would be one that is inbred from a different M. musculus subspecies such as CAST/Ei, MOLF/Ei, or "Mus musculus" Skive or CzechII, which are all available from the Jackson Laboratory.

9.4.3 Choosing a breeding scheme

The second choice that an investigator will make upon beginning a new linkage study is between the two prescribed breeding schemes. With both schemes, illustrated in Figures 9.11 and 9.12, the first mating will always be an outcross between the two parental strains chosen according to the strategies outlined above. However, once F₁ hybrid animals have been obtained, an investigator must decide whether to backcross them to one of the parental strains or intercross them with each other. There are advantages and disadvantages to each breeding scheme.

9.4.3.1 The backcross

The primary advantages of the backcross approach are all based on the fact that each offspring from the backcross can be viewed as representing an isolated meiotic event. The entire set of alleles contributed by the inbred parent (strain B in Figure 9.11) is pre-determined. Thus, the only question to be resolved at each typed locus is whether the F₁ parent has contributed the same parental allele (from strain B) or the allele from the other parent (strain A): in the first instance, typing would demonstrate the presence of only the strain B allele, and in the second instance, typing would demonstrate the presence of both the strain A and strain B alleles.

By looking at Figure 9.15, one can visualize the actual meiotic products contributed by the F₁ parent in the form of individual haplotypes. Every recombination event can be detected and the frequency of recombination between any two loci can be easily determined. The existence of strong interference over distances of 20 cM or more can be used to advantage in the determination of gene order, since any order which requires nearby double crossover events in any haplotype is likely to be incorrect. ⁹²

The analysis of backcross data is very straightforward, and when all loci are known to map on the same chromosome, it is possible to derive linkage relationships even in the absence of specialized computer programs. However, with the use of the Macintosh computer-based Map Manager program (described in Appendix B), data presentation and analysis become even more transparent. The major disadvantage with the backcross is that it is not universally applicable to all genetic problems. In particular, it cannot be used to map loci defined only by recessive phenotypes that interfere with viability or absolute fecundity in both males and females.

9.4.3.2 The intercross

The intercross approach has two main advantages over the backcross. The first is that it can be used to map loci defined by recessive deleterious mutations since both heterozygous F₁ parents will be normal, and homozygous F₂ offspring can be recovered at any stage (postnatal or prenatal if necessary) for use in typing further markers. The second advantage is a consequence of the fact that informative meiotic events will occur in both parents. This will lead to essentially twice as much recombination information on a per animal basis as compared to the backcross approach.

The main disadvantage with the intercross is also a consequence of informative meiotic events in both parents. The problem is that the data obtained are more complex, as illustrated in Figure 9.4 and discussed in Section 9.1.3.4, and more difficult to analyze because of the impossibility of determining which allele at each heterozygous F₂ locus came from which parent. Thus, while each animal will, by definition, carry two separate haplotypes for each linkage group, the assignment of alleles to each haplotype can only be accomplished retrospectively or, in some circumstances, not at all. In addition, interference is no longer as powerful a tool for ordering loci, since nearby crossover sites can be brought together into individual F₂ animals from the two parents. To generate de novo linkage maps from large scale intercross experiments, it is essential to use computer programs such as Mapmaker that carry out multilocus maximum likelihood analysis (Lander et al., 1987 and Appendix B). However, when previously mapped codominant anchor loci are typed within an intercross, the more user friendly Map Manager program (version 2.5 and higher) can be used for data input and analysis.

9.4.3.3 Making a choice

In large-scale mapping experiments with many loci spread over one or more chromosomes, the backcross is usually the breeding scheme of choice. What is sacrificed in terms of mapping resolution is made up for in terms of ease of data handling and presentation. However, when an investigator is focusing on a small genomic region (on the order of a few centimorgans or less) for very high resolution mapping as a precursor to positional cloning, the intercross may be a better choice. At this level of analysis, the data will be much less complex with only a small fraction of animals expected to show mostly single recombination events in the interval of interest; the advantage gained by doubling the frequency of such events may be critical to efforts aimed at zeroing in on the locus of interest.

Of course, as discussed above, in the case of recessive deleterious mutations, one may not have a choice but to use the intercross. Unfortunately, in situations where the mutation is strictly recessive, one will only be able to map the mutant locus with those 25% of F₂ animals that express the mutant phenotype because the genotype of non-expressing animals cannot be determined (see Figure 9.12). Since two meiotic events are scored in each F₂ animal, the total amount of genetic information obtained will be approximately double that obtained from an equivalent number of backcross animals that can be typed. Nevertheless, this still comes out to only 50% of the information obtained from typing a complete backcross panel of the same size as the complete intercross panel. Consequently, if the trait under analysis is strictly recessive but does not seriously hinder viability or fecundity in homozygotes of at least one sex, it is more advantageous to use the backcross. In these situations, a backcross can be set up with a homozygous mutant parent, as illustrated in Figure 9.11, and 100% of the offspring can be scored phenotypically for the contribution of either the mutant or wild-type allele from the F₁ parent.

9.4.4 The first stage: mapping to a subchromosomal interval

9.4.4.1 A stratified approach to high resolution mapping

An optimal strategy for high-resolution linkage mapping is one that proceeds in stages with nested sets of both marker loci and animals. One can see the logic of this sequential approach by considering the numbers of markers and animals required to obtain a high-resolution map in a single pass. For example, suppose one wanted to obtain a linkage map with both an average crossover resolution of 0.1 cM and an average marker density of one per centimorgan. In a one-pass approach, one would have to analyze 1,000 backcross animals for segregation at 1,500 marker loci (spanning 1,500 cM), which would require one and one-half million independent typings.

A much more efficient approach is to divide the protocol into two separate stages. The goal of the first stage should be to link the locus to a defined subchromosomal interval. This can be accomplished by typing a relatively small set of markers on a relatively small random subset of phenotypically typed animals from within the larger panel. Once this first stage is completed, it becomes possible to proceed to the second stage which should focus on the construction of a high resolution map just in the vicinity of the locus of interest with a selected set of markers and a selected set of animal samples. The ultimate goal of this entire protocol is the identification of a handful of markers and recombinant animals that bracket a very small interval containing an interesting gene that can then be subjected to positional cloning as described in Section 10.3.

9.4.4.2 How many animals and how many markers?: evaluation of the swept radius

The first step in the first stage of the protocol is to develop a framework map that is "anchored" by previously well-mapped loci spaced uniformly throughout the entire genome. To accomplish this task most efficiently, it is critical to calculate the minimum number of anchor loci required to develop this low resolution, but comprehensive, map. This calculation is based on the length of the swept radius that extends on either side of each marker. As discussed earlier in this chapter (Section 9.2.2.3), the swept radius is a measure of the distance over which linkage can be detected between any marker and a test locus when both are typed in a set number of offspring generated with a defined breeding protocol. Although the swept radius was defined originally in terms of map distances (Carter and Falconer, 1951), it is much easier to work directly with recombination fractions, and in the following discussion, charts, and figures, this alternative metric will be used. ⁹³

Two measures of the backcross swept radius, determined for sample sizes that range from 20 to 100 animals, are presented in Figure 9.13. The first measure is based on the traditional view of a swept radius as a boundary that separates significant from non-significant rates of observed recombination. This "experimental swept radius" is shown as the solid curve in Figure 9.13. The graph can be used to find out quickly whether any experimentally determined recombination fraction, or concordance value, meets the strictly defined Bayesian-corrected cutoff for demonstration of linkage at a probability of 95% or greater (see Section 9.1.3.6).

Although the experimental swept radius provides a means to evaluate the significance of newly derived data, it is not useful as a means to establish the distances that should separate marker loci to be chosen for a framework map in a new cross. The problem is that marker loci that are actually separated by a map distance equivalent to the experimental swept radius will, by chance, recombine to a greater or lesser extent with equal probability in any particular experimental cross, and in those 50% of the crosses where a higher recombination fraction is observed, the data will not be sufficient to establish linkage at a 95% level of significance. Thus, a second, more conservative measure of swept radius is needed to determine the maximum actual recombination distance between two loci that will allow the demonstration of linkage at a probability of 95% with a frequency of 95%. ⁹⁴ I will call this parameter the "framework swept radius".

The "framework swept radius" can be evaluated as a recombination fraction associated with a 95% confidence interval having an upper confidence limit equivalent to the value of the experimental swept radius for a sample set of a particular size. In the discussion that follows, I will use this framework swept radius as a means for establishing the distances that should separate markers to be used in setting up a new framework map.

With a set number of backcross samples, one can use Figure 9.13 to find the corresponding framework swept radius associated with each anchor locus. For example, with 52 samples, the framework swept radius is 15 cM, with 72 samples, it is 20 cM, and with 94 samples, it becomes 24 cM. It is clear that once a critical number of samples has been reached (45-50), further increases in number provide only a marginal increase in the distance that is swept. Figure 9.13 can also provide a first approximation of the framework swept radius associated with a panel of intercross samples. To a first approximation, each intercross sample is equivalent to two backcross samples. Thus, a swept radius of ~15 cM can be obtained with 26 intercross samples, and ~20 cM can be obtained with 36 intercross samples. ⁹⁵

The framework swept radius can be used in conjunction with the lengths of each individual chromosome to determine the number of anchor loci required to provide complete coverage over the entire genome. Essentially, anchors can be chosen such that their "swept diameters" (twice the swept radius) cover directly adjacent regions that span the length of every chromosome as illustrated in Figure 9.14. The first and last anchors on each chromosome must be placed within one swept radius of their respective ends, while the distance between adjacent anchors should be within two swept radii. The estimated lengths of all twenty mouse chromosomes are sorted into a set of ranges in Table 9.4. The number of anchors required per chromosome for a backcross analysis is calculated by dividing the chromosome length by the swept diameter defined with a sample set of a particular number (from the graph in Figure 9.13) and rounding up to the nearest integer. As indicated in Table 9.4, with 52 backcross samples, it is possible to cover the entire mouse genome with 60 well-placed anchors. With 72 samples, the number of required anchors decreases to 46, and with 94 samples, it decreases to 43. It is clear that little is to be gained by including more than 72 samples in this initial analysis.

The minimalist approach just outlined to a comprehensive framework map has only become feasible as this chapter is being written. This feasibility is based on the availability of over 3,000 highly polymorphic microsatellite loci that span the genome with an average spacing of less than 1 cM (Copeland et al., 1993). Primer pairs that define each of these loci are commercially available at a modest cost from Research Genetics Inc. in Huntsville, Alabama. By contacting the Genome Center at the Whitehead Institute, as described in Appendix B, one can obtain chromosome-specific lists of microsatellites that are polymorphic between the particular parental strains that an investigator has used to generate his or her linkage panel. With this information, one can choose specific microsatellite loci that map to each of the general locations required to span each chromosome as illustrated in Figure 9.14.

In the backcross linkage studies reported to date, the gender of the F₁ hybrid used in the second-generation cross has usually been female. In the case of the interspecific cross, there is no other choice since the F₁ male is sterile. However, this is not a factor in the intraspecific or intersubspecific cross. Rather, F₁ hybrid females are used for two other reasons. First, they have a much higher fecundity relative to inbred females, and second, they generally display higher frequencies of recombination (Section 7.2.3.2) which, in turn, will produce a higher resolution map in the second stage of linkage analysis described in the next Section. Interestingly, the lower recombination frequency associated with male mice is actually better suited to the first stage of mapping because it can act, in effect, to reduce the centimorgan length of each chromosome by 15%-40%. Thus, with the use of male F₁ hybrids in the backcross, one would, in theory, need fewer anchor loci to span the genome. Furthermore, as discussed in Section 9.4.1, in backcrosses to a common inbred parent such as B6, the use of F₁ males is likely to be much more efficient and provide many more N₂ progeny more quickly than the reciprocal cross. Unfortunately, at the time of this writing, male-specific linkage maps have not been developed for the new libraries of microsatellite loci. Hence, at the current time, the spacing of microsatellites for this purpose would be a matter of guesswork.

9.4.4.3 Determining linkage

The first analysis of backcross data should be directed at simply determining the existence of linkage to the locus of interest. This is accomplished by comparing the pattern of allele segregation from the new locus with the patterns of allele segregation from each anchor locus. Essentially, the frequency of recombination between the new locus and each anchor locus is calculated, one at a time, to identify one or more anchors that show a significant departure from the independent assortment frequency of 50%. This task is performed most easily by entering the accumulated allele segregation data into an electronic file that is analyzed by a special computer program developed for this type of analysis. A number of such computer programs have been described (see Appendix B). The most user-friendly of these is the Apple Macintosh-based Map Manager program developed by K. Manly (1993) and described in Appendix B.

It is also possible to determine linkage, when a backcross set is not too large, without the use of a specialized computer program. This can be accomplished by entering the allele segregation information for each locus along a separate row or line in a spreadsheet or word processing file, where each column represents a separate animal (analogous to the RI strain data matrix illustrated in Figure 9.6). Anchor loci should be placed in sequential rows according to their known order along each chromosome. The very first rows should be reserved for the new locus (or loci). The complete file will be a matrix of information with the number of rows equal to the number of anchor and new loci typed and the number of columns equal to the number of backcross animals analyzed. For the N = 52 backcross typed for one new locus in addition to a minimal number of anchors (from Table 9.4), this would be a 61 X 52 matrix of data.

Next, one would take the row representing a new locus and compare it row by row, either on the computer or on paper, for pattern similarities with each anchor locus allele distribution. Visual inspection alone will be sufficient to distinguish similar runs of alleles in two rows. The total recombination fraction between the new locus and any anchor locus identified in this way can be easily calculated; if the fraction of recombinants is greater than the experimental swept radius found in Figure 9.13 (0.27 for N = 52), linkage can be rejected and one can move on to the next locus. Although this process is somewhat tedious, the time that it takes is minimal compared to the time involved in actually typing DNA markers in the first place. In contrast, with whole genome data obtained from an intercross, manual determination of linkage is extremely difficult. Instead, one should use one of the limited number of programs available for this type of analysis. The most well-known of these programs is Mapmaker developed by Eric Lander (1987) (and Appendix B).

Ideally, linkage analysis will identify at least one, and at most two, loci that are linked at a significance level of 95% to the new locus of interest. If there are two linked loci, they should be adjacent to each other within the framework map formed on the same chromosome. With results of this type, one can move on to the next task of determining the order of the new locus relative to the framework map as discussed below.

It is possible that the data will not be sufficient to demonstrate linkage with a significance of 95% to any of the anchor loci that were typed. It is critical at this point to confirm all DNA marker typings as well as phenotypic determinations for each animal. If there is still no evidence of linkage at the 95% significance level, one can attempt to uncover potential linkage relationships by reducing the required level of significance. ⁹⁶ This may allow the suggestion of linkage in the middle of a particular chromosomal interval between two anchors or near a chromosome end. If this approach fails, one should examine the recombination intervals that separate each anchor along each chromosome (with the haplotype method described in the next section) in order to pick out intervals that are larger than anticipated. One can retype the same set of backcross animals for new anchors in regions suggested by any of these approaches. If this approach fails as well, one should consider the possibility that the new locus may map very close to a centromere or telomere; to test this possibility, it would be necessary to type more centromeric and telomeric anchors on each chromosome. Finally, one should consider the possibility that complex genetic interactions such as incomplete penetrance and/or polygenic effects may be acting to distort the one to one relationship between phenotype and genotype at any single locus (see Section 9.5).

9.4.4.4 Pooling DNA samples for the initial identification of linked markers

In essentially all mapping experiments performed today, the vast majority, if not all, of the marker loci used are typed by DNA-based techniques. At the time of this writing, the most versatile, and most commonly used, genetic marker is the microsatellite (Section 8.3.6). But other DNA markers that are useful in particular cases include those that can be assayed by the SSCP protocol (Section 8.3.3) and RFLP analysis (Section 8.2). The genotyping of all of these marker types within offspring from a mapping cross is based on the detection of "codominant" alleles recognized as different size bands after gel electrophoresis.

In the mapping approach just described in the previous section, each backcross animal is converted into a DNA sample that is typed independently for each marker locus that has been chosen to sweep the genome. The total number of PCR reactions (or restriction digests) required can be determined from Table 9.4 by multiplying the number of markers by the number of backcross animals. The smallest number is obtained with 52 animals typed for 60 markers, which comes out to 3,120 reactions (followed by an equivalent number of lanes on gels). Unless one has access to automated PCR and gel-running equipment and unlimited funds for thermostable DNA polymerase, this approach could be prohibitive in cost.

A much more efficient approach can be used when the goal of a cross is to map the locus or loci responsible for a particular mutant phenotype or polymorphic trait that is segregating in either a backcross or an intercross. The only essential prerequisite is that the parents used in the first-generation mating must be from an inbred or segregating inbred strain (see Section 3.2.4).

The basic strategy is to reduce the number of PCR reactions (or restriction digests) and subsequent gel runs through the analysis of only one or two combined DNA samples that are obtained by pooling together equivalent amounts of high quality DNA from all second generation animals expressing the same phenotype (Michelmore et al., 1991; Asada et al., 1994). This pooled DNA strategy works for both the backcross protocol and the intercross protocol. It works for incompletely penetrant traits and for quantitative traits controlled by segregating alleles at more than one locus (see Section 9.5.4.2). However, it requires the use of markers with segregating alleles that can be reproducibly distinguished and detected with equal levels of intensity. Thus, not all PCR-based markers will be suitable.

Let us consider the simple example of a backcross in which all N₂ animals can be phenotypically distinguished at a single mutant locus as illustrated in Figure 9.11. The first step of the analysis would be to classify each animal as +/m or m/m followed by the conversion of each individual into a high quality DNA sample. Then, equal amounts of DNA from each m/m sample would be combined into one pool, and equal amounts of DNA from each +/m sample would be combined into a second pool. A third control sample would be formed by combining equal amounts of DNA from the two parents of the cross: the F₁ hybrid and strain B in Figure 9.11. Finally, an aliquot from each of these three composite samples would be subjected to PCR amplification with primer pairs specific for one marker at a time (or restriction digestion), and the amplified (or digested) samples would be separated by gel electrophoresis and analyzed by ethidium bromide staining, or probing, or autoradiography.

The results expected for markers showing different linkage relationships to the mutant locus are illustrated in Table 9.5. For all markers that are not linked to the test locus, the allele patterns obtained with the three composite DNA samples should be indistinguishable with a ratio of 1:3 in the intensities of the strain A and strain B alleles. In contrast, when a marker is very closely linked to the mutant locus, the ratio of alleles in the two pooled samples will diverge significantly in opposite directions from the control sample: in the m/m sample, the strain A allele will be absent or very light, while in the +/m sample, the intensity of the strain A allele will climb to equality with the strain B allele (whose signal will decrease proportionally). For ease of analysis, it is best to run the control sample in between the two pooled N₂ samples.

The power of this strategy for linkage analysis derives from the huge reduction in the number of samples that must be typed for each marker. Instead of 40, 50, 60, or more, the number is reduced to just three. However, to get a sense of the overall savings in time and cost, it is important to consider several factors: (1) the number of individual N₂ samples that must be included in each pool; and (2) the recombination distance over which a significant departure from the control sample can be observed.

Increasing the number of individual samples in each pool serves two purposes. First, random errors in the measurement of individual sample aliquots will tend to become evened out over a larger pool. Second, chance departures from the control ratio of alleles (i.e. false positives) will become much less frequent for unlinked markers (see Figure 9.13). For both of these reasons, one should set a minimum pool size at 20 animals. There is no maximum to the pool size but there is nothing to be gained from pooling more than 50 samples together.

It is difficult to predict the level of concordance that must exist between the test locus and a marker before one can judge a result to be evidence of linkage. A certain level of non-genetic variation is likely from sample to sample, and thus, a positive result must be one with a signal ratio that goes significantly beyond this normal variation. Consequently, the swept radius for markers analyzed in pooled samples will almost certainly be less than that possible with individual animal analysis as well as different from one marker to another. From the numbers shown in Table 9.5, the detection of linkage out to a distance of ~20 cM, but not much farther, would appear possible. Thus, up to 50% more markers may be required to sweep the entire genome.

The pooled DNA approach is maximally resolving when the nature of the phenotype under analysis allows the investigator to obtain two pools representing samples from each of the parents in the backcross (the F₁ and strain B in Figure 9.11) or the two original strains used to generate the intercross (strain A and strain B in Figure 9.12). In a situation of this type, each departure from the control ratio observed for a marker in one pool should be accompanied by a departure in the opposite direction for the other pool (see Table 9.5). This requirement for confirmation will act to reduce the frequency of false positive results. In many experimental situations, however, it will only be possible to develop a single pool of homozygous m/m samples for analysis. This will be the case for backcross studies of incompletely penetrant traits and for intercross studies of fully recessive phenotypes (Figure 9.12). In such cases, it will be necessary to generate and phenotype a larger number of animals in order to identify the smaller subset of samples that can be included within the single pool that can be made available for comparison to the control.

Once markers potentially linked to the test locus have been identified by the DNA pooling approach, it is essential to go back with each "positive" marker and individually type each sample in the pool to obtain quantitative confirmation of linkage or to rule it out. However, even with the reduction in genetic resolution and the requirement for confirmatory analysis, the DNA pooling approach can reduce the number of samples to be analyzed by at least an order of magnitude with large savings in labor and cost. If linkage to a single marker has been confirmed through individual sample analysis, the investigator can retype each of the samples with additional markers that lie within a 30 cM radius on either side to pursue the haplotype analysis described in the next section.

9.4.4.5 Determining gene order: generating a map

Once linkage has been demonstrated for a new locus, it is usually straightforward to determine its relative position on the chromosome framework map. For backcross data, this is accomplished by a method referred to as haplotype analysis. Haplotype analysis is performed on one linkage group at a time. For the mapping of any new locus, it is only necessary to carry out this approach for the chromosome to which the locus has been linked. The first task is to classify each backcross animal according to the alleles that it carries at the anchor loci typed just on the chromosome of interest. By definition, when two or more animals carry an identical set of alleles, they have the same "haplotype" on that chromosome. By comparing the data obtained for all members of the backcross panel, one can determine the total number of different haplotypes that are present.

As illustrated in Figure 9.15, each distinct haplotype is represented by a column of boxes, with one box for each locus; each box is either filled in to indicate one parental allele or left empty to indicate the other parental allele. Anchor loci are placed according to their order along the chromosome from most centromeric at the top to most telomeric at the bottom. The number of animals that carry each haplotype is indicated at the bottom of each column. Haplotypes are presented in order from left to right according to the number and location of recombination events. Parental haplotypes — showing no recombination — are indicated first. Haplotypes with single recombination events are presented next, followed by those with two recombination events, and more, if they exist. Vertical lines can be used to separate haplotype pairs defined by reciprocal allele combinations. The class of single recombination haplotypes are presented in order from left to right according to the position of the breakpoint from most centromeric to most telomeric. The haplotype diagram shown in Figure 9.15 can be generated automatically (in printable form) from recorded data in the Manly (1993) Map Manager program.

The haplotype diagram can be used to generate a linkage map by adding up the total number of animals that are recombinant between adjacent loci. For example, the G, H, I, and K haplotypes show recombination between the hypothetical D51 and D33 loci shown in Figure 9.15; these haplotypes are carried by 9, 10, 1, and 1 animals respectively. Thus, in total, 21 animals are recombinant between these loci for a calculated recombination fraction (rf) of 0.404. When a recombination fraction is larger than 0.25, one should use the Carter-Falconer mapping function (Equation 7.3) to obtain a more accurate estimate of map distance in centimorgans. The calculated mFC value is 44 cM. Similarly, the recombination fractions that separate D81 from D12, and D12 from D51 are both found to be 0.269. With the Carter-Falconer equation, this recombination fraction value is adjusted slightly to a map distance of 27.3 cM.

With a framework haplotype diagram and map, it becomes possible to determine the location of a new locus under analysis. Consider the hypothetical example in Figure 9.15 where linkage has already been demonstrated between a new locus and just one anchor locus — D51. In this case, the new locus could be in either one of two positions on the chromosome, proximal or distal to D51. To test these two locations, one can draw a second set of haplotype diagrams that include only those newly defined haplotypes showing recombination between the linked anchor D51 and the new locus. In this example, a subset of animals from the previously defined haplotype classes A, G, H, and I define four new haplotypes labeled A', G', H' and I' respectively as illustrated in Figure 9.15. These haplotypes are drawn in two different ways with the new locus either proximal or distal to D51. The correct order can be determined by minimizing both the number of multiply recombinant haplotypes and the total number of implied recombination events within the sample set. In the example shown, a distal location requires a total of eight crossover events that take place within four single recombinant chromosomes and two double recombinant chromosomes. Alternatively, a proximal location requires a total of 18 crossover events with no single recombinant chromosomes, one double, six triples and one quadruple. Data of this type clearly point to a distal location for the new locus. Although any real set of data will obviously give different results, the same logical progression will almost always provide a definitive map position. With the computer program Map Manager, this analysis can be accomplished automatically.

With intercross data, whole chromosome haplotype analysis can be much less straightforward (as illustrated in Figure 9.3). Consequently, gene order is usually determined computationally by the method of maximum likelihood analysis (Lander et al., 1987). Nevertheless, with the aid of a framework map, it is usually possible to break down F₂ genotype information into pairs of most likely haplotypes for each animal (D'Eustachio and Clarke, 1993). At this point, a new locus could be mapped according to the same logic described above.

9.4.5 The second stage: high resolution mapping

The ultimate goal of the second stage of many mapping projects is to identify both DNA markers and recombination breakpoints that are tightly enough linked to a new locus of interest to provide the tools necessary to begin positional cloning. This second stage can be broken down optimally into a series of steps as follows:

Step 2.1. The first goal of this second stage should be to narrow down the map interval as much as possible using only the small panel of samples typed in stage 1. This can normally be accomplished by selecting and typing additional microsatellite markers spaced across the 20 cM region to which the locus of interest has been mapped. With an original panel of 54 backcross samples, for example, recombination breakpoints will be distributed at average distances of about 2 cM. Thus, by typing additional markers, one should be able to reduce the size of the gene-containing interval from an original 25-40 cM down to 4-10 cM. The goal of this step is to identify the closest "limiting markers" on both sides of the locus of interest that do show recombination with it in order to establish an interval within which the locus must lie.

Step 2.2. The next step requires the breeding of a large number of animals that segregate the mutant allele. Ideally, the total number of animals bred should be at least 300 with a maximum of 1,000 spread out among several crosses (see Section 7.2.3.3). However, this large set can be quickly reduced to the smaller set of samples that show recombination in the interval to which the gene has already been mapped. This can be accomplished by typing each animal for just the two "limiting markers" identified in step 2.1. If, for example, the locus-containing interval had previously been restricted to a 10 cM region bounded by these markers, this analysis would eliminate from further consideration approximately 90% of the total samples in the large cohort. If a PCR-based analysis is used to type the two markers, rapid methods for obtaining small quantities of partially purified DNA from members of the large cohort may be sufficient (Gendron-Maguire and Gridley, 1993).

Step 2.3. The smaller subset of animals selected in step 2.2 can now be typed with a larger set of markers previously localized to the genomic interval between the two limiting markers defined in step 2.1. At this point, it makes sense to test all segregating microsatellites that have been placed into 1 cM bins extending from one limiting marker to the other as well as any other suitably located DNA markers (Copeland et al., 1993 and Appendix B). Newly tested markers that show no recombination with either one limiting marker or the other (among all animals tested) are likely to map outside the defined interval. However, all new markers that show recombination in different samples with each of the previously defined limiting markers will almost certainly map between them. Haplotype analysis can be used once again to obtain a relative order for these newly mapped markers. If the initial interval defined in step 2.1 is 10 cM or less, double recombination events will be extremely unlikely, and with this underlying assumption, it should be possible to obtain an unambiguous order for all markers that show recombination with each other and/or the phenotypically-defined locus.

Step 2.4. As multiple new markers are mapped to the interval between the two previously defined limiting markers, it should become possible to reduce the size of the gene-containing interval even further than the one defined in step 2.1. As the size of the interval is reduced, the number of animal samples within the panel that need to be analyzed further can also be reduced to include only those that show recombination between the newly defined limiting markers. Additional markers should be typed until one reaches the ultimate goal of identifying limiting markers that each show only one (ideally) or a few recombination events on either side of the locus of interest along with one or more markers that show absolute concordance with the locus itself as illustrated in Figure 10.1. If one exhausts the available sources of markers without coming close to this goal, it may be necessary to derive additional region-specific markers as discussed in Section 8.4.

With the identification of one or more DNA markers that show no recombination with the locus of interest, Figures 9.16 or 9.17 can be used to gain a sense of the distance that separates them. For example, with an initial cohort of 380 animals, the average distance that will separate marker and locus will be 0.2 cM and the 95% confidence interval will extend out to 1 cM. With an initial cohort of 1,000 animals, the mean distance will be less than 0.1 cM and the 95% confidence interval will extend out less than 0.4 cM. At this stage of analysis, one can move on to the task of generating a physical map that extends across the genomic region between the two closest limiting markers as described in Section 10.3.3.3 and Figure 10.1.