8.2 RESTRICTION FRAGMENT LENGTH POLYMORPHISMS

Previous Next

8.2 RESTRICTION FRAGMENT LENGTH POLYMORPHISMS

8.2.1 The molecular basis for RFLPs

A restriction fragment length polymorphism is defined by the existence of alternative alleles associated with restriction fragments that differ in size from each other. ⁵⁴ RFLPs are visualized by digesting DNA from different individuals with a restriction enzyme, followed by gel electrophoresis to separate fragments according to size, then blotting and hybridization to a labeled probe that identifies the locus under investigation. An RFLP is demonstrated whenever the Southern blot pattern obtained with one individual is different from the one obtained with another individual. An illustration of a result of this type is shown in Figure 8.2. In this example, DNA samples from five individual mice were digested with the same enzyme, electrophoresed side by side on a gel, and probed with the same clone of a single-copy DNA sequence. The five patterns detected are all different from each other and are representative of five different genotypes. The simplest interpretation of data of this type is that the first and last samples shown in the Figure are homozygous for a different restriction fragment while the middle samples are all heterozygous with different combinations of alleles. This simple interpretation can be tested and confirmed (or rejected) by simply breeding each of the animals to mates with different genotypes at this "RFLP locus" so that segregation of the two restriction fragment alleles can be demonstrated from putative heterozygotes and uniform transmission of the same restriction fragment allele can be demonstrated from putative homozygotes. ⁵⁵

RFLPs were the predominant form of DNA variation used for linkage analysis until the advent of PCR. Even now, in the PCR age, RFLPs provide a convenient means for turning an uncharacterized DNA clone into a reagent for the detection of a genetic marker. The main advantage of RFLP analysis over PCR-based protocols is that no prior sequence information, nor oligonucleotide synthesis, is required. Furthermore, in some cases, it may not be feasible to develop a PCR protocol to detect a particular form of allelic variation. Nevertheless, if and when a PCR assay for typing a particular locus is developed, it will almost certainly be preferable over RFLP analysis for the reasons to be described in Section 8.3.

The detection of a RFLP, in and of itself, does not provide information as to the mechanism by which it was created. Although the different-sized restriction fragments shown in Figure 8.2 can be followed readily in a genetic cross, one cannot tell, from these data alone, how they differ from each other at the molecular level. In fact, RFLPs can be generated by all of the mechanisms through which DNA variation can occur. The simplest RFLPs are those caused by single base-pair substitutions. However, RFLPs can also be generated by the insertion of genetic material, such as transposable elements, or by tandem duplications, deletions, translocations, or other rearrangements.

Several different mechanisms of RFLP generation are illustrated in Figure 8.3. In this set of hypothetical examples, the first chromosome represents the ancestral state, and chromosomes 2, 3, and 4 represent different mutations from this state. In this example, DNA has been digested with the enzyme TaqI (with a recognition site of TCGA), fractionated and probed with a clone that recognizes the region shown in the Figure. The Southern blot results that would be obtained with animals that carry representative pairs of these chromosomes are shown in Figure 8.2. The length of the restriction fragment that will be observed with each chromosome type is indicated by the boxed-in region. Chromosome 1 has TaqI sites that flank the probed region at a distance of 4 kb from each other. In chromosome 2, the right-flank TaqI site has been mutated (the base substitution is marked with a *); a previously more distal TaqI site now becomes the new flanking site, leading to the production of a 5 kb restriction fragment. In chromosome 3, a mutation has occurred with an opposite effect, causing the creation of a TaqI site where none existed before; this new TaqI site becomes the left-flank site, leading to the production of a smaller restriction fragment. Finally, in chromosome 4, an insertion has occurred within the region between the two flanking TaqI sites, leading to an actual increase in the length of the region between these same two sites. More complicated scenarios can be built upon these simple examples with restriction sites created or removed from within the probed region itself, or with new restriction sites brought in with inserted DNA elements. A final class of RFLPs is commonly generated through the expansion and contraction of families of tandemly repeated DNA elements as illustrated in Figure 8.4. Loci having this type of organization are referred to as minisatellites, or VNTRs, and will be discussed separately in Section 8.2.3.

Attempts to identify RFLPs between different inbred strains of mice often meet with limited success even after testing with large numbers of enzymes. In one study, RFLPs were identified at only 30% of the single copy loci tested with 22 different restriction enzymes (Knight and Dyson, 1990). Furthermore, when RFLPs are identified, they are almost always di-allelic binary systems — the insertion, deletion, or restriction site change is either present or absent. ⁵⁶ Unfortunately, di-allelic loci can only be mapped in crosses where the two parental chromosomes carry the two alternative alleles. Thus, even if a RFLP is identified between two inbred strains of mice, there is no guarantee that another pair of strains will also happen to carry alternative alleles. As a consequence, only a subset of the RFLP markers developed for analysis of one cross between traditional mouse strains will be of use for mapping in a cross between any other pair of inbred strains.

A major leap in mouse genetics came with the observation of an extremely high rate of RFLP detection between the common M. musculus-based inbred strains and the independent species M. spretus. As described in Section 2.3, these species can breed under laboratory conditions to produce interspecific F₁ hybrids. Although F₁ males are sterile, the F₁ females are fertile and they can be backcrossed to either parent to obtain offspring which can be analyzed for linkage (Bonhomme et al., 1978; Avner et al., 1988). More recently, the observation of an increased rate of RFLPs has been extended to comparisons between the inbred strains and wild-derived samples of M. m. castaneus which are more closely related to each other than either is to M. spretus (Figure 2.2). The pros and cons of performing genetic studies with interspecific or intersubspecific crosses are discussed in Section 9.4.2.1.

8.2.2 Choice of restriction enzymes to use for RFLP detection

With so many restriction enzymes available, how does one decide which ones are the best to use in the search for RFLPs? Obviously, cost is an important consideration. Another consideration is whether the enzyme is optimally active with genomic DNA obtained from animal tissues. ⁵⁷ However, a critical consideration is the rate at which RFLPs can be detected based on the enzyme that is chosen.

A systematic study of RFLP detection between B6 and M. spretus DNA subsequent to digestion with one of ten different enzymes has been reported by LeRoy et al. (1992). One hundred and ten anonymous DNA sequences of less than 4 kb in length were used as probes. The highest rate of RFLP detection — 63% — was observed with DNA digested with TaqI. The second highest rate — 56% — was observed with MspI. In decreasing order of effectiveness were the enzymes BamHI (50%), XbaI (47%), PstI (44%), BglII (41%), Hind III (39%), PvuII (38%) Rsa I (38%), and Eco RI (33%). It is ironic that of the ten enzymes tested, the one most commonly used in molecular biological research — EcoRI — was the worst one, by a long shot, at detecting polymorphisms.

A theoretical explanation for the observation that TaqI and MspI are more likely than other enzymes to detect RFLPs can be found in the dinucleotide CpG which is at the center of both recognition sites. This dinucleotide is unusual in two respects. First, it is present in mammalian genomes at a frequency one-fifth of that expected from base composition alone. Second, when it is present, the cytosine within the dinucleotide is usually methylated. ⁵⁸ As it turns out, the latter fact explains the former because methylated cytosine has a propensity to undergo spontaneous deamination to form thymidine. This complete transition is not recognized as abnormal by the repair machinery present in mammalian cells, and thus methylated-CpG dinucleotides serve as one-way hotspots for mutation (Barker et al., 1984). As a consequence, the CpG dinucleotide is relatively rare, and when it does occur in a methylated form, it is more likely to mutate than any other dinucleotide. Even an unmethylated CpG can undergo a spontaneous mutation from cytosine to uracil; however, this abnormal nucleotide is more likely to be recognized and repaired. Nevertheless, in those few cases where repair does not occur, the uracil will basepair with an adenosine in the following round of DNA replication, leading to the same substitution as found with methylated CpGs.

Thus, TaqI and MspI are the most useful enzymes for the identification of RFLPs. Both enzymes recognize four basepair sites, TaqI recognizes TCGA and MspI recognizes CCGG. If nucleotides were randomly distributed across the genome, TaqI and MspI sites would be distributed at average distances of 270 bp and 514 bp, respectively. ⁵⁹ However, as a consequence of the paucity of CpG dinucleotides, these two restriction enzyme sites are actually found much less frequently in mammalian DNA. Empirical data indicate restriction fragment size distributions that average 2.9 and 3.5 kb for TaqI and MspI respectively (Barker et al., 1984).

In practice, the enzyme TaqI is the better choice of the two for use in RFLP analysis. It is relatively cheap and it works well with animal DNA samples that other enzymes refuse to cut (presumably aided by the high temperature at which the digestion is carried out). MspI is somewhat more sensitive to contaminants within animal tissue DNA samples, but is a good second choice. When the results obtained with TaqI and MspI are combined, the Gu´net group detected RFLPs at 74% of the loci tested for variation between spretus and musculus (LeRoy et al., 1992). When the results obtained with XbaI were added in, 79% of the loci were polymorphic. When the results obtained with the remaining seven enzymes were included, RFLPs were detected at 83% of the loci. The take-home lesson from this study is that it is most cost-effective to search for RFLPs on standard 1% agarose gels with just three enzymes — TaqI, MspI, and XbaI. If the search is unsuccessful at this point, it would appear that the locus under analysis is not highly polymorphic at the DNA level, and in those cases where the locus is just "one more marker," it is probably not worth pursuing further. On the other hand, if the locus is of importance in and of itself, it makes sense to pursue more sensitive, PCR-based avenues of polymorphism detection such as ingle-strand conformation polymorphism (Section 8.3.3) or linked microsatellites (Section 8.3.6).

8.2.3 Minisatellites: variable number tandem repeat loci

In contrast to traditional RFLPs caused by basepair changes in restriction sites, a special class of RFLP loci present in all mammalian genomes is highly polymorphic with very large numbers of alleles. These "hypervariable" loci were first exploited in a general way by Jeffreys and his colleagues for genetic mapping in humans (Jeffreys et al., 1985).

Hypervariable RFLP loci of this special class are known by a number of different names including variable number tandem repeat (VNTR) loci and minisatellites, which is the more commonly used term today. Minisatellites are composed of unit sequences that range from 10 to 40 bp in length and are tandemly repeated from tens to thousands of times. Although various functions have been suggested for minisatellite loci as a class, none of these has withstood the test of further analysis (Jarman and Wells, 1989; Harding et al., 1992). Rather, it appears most likely that minisatellite loci (like microsatellite loci described in a Section 8.3.6) evolve in a neutral manner through expansion and contraction caused by unequal crossing over between out-of-register repeat units as diagrammed in Figure 8.4 (Harding et al., 1992). Recombination events of this type will yield reciprocal products which both represent new alleles with a change in the number of repeat units.

The frequency with which new alleles are created at minisatellite loci — on the order of 10^-3 per locus per gamete — is much greater than the classical mutation rate of 10^-5 to 10^-6 (Jeffreys et al., 1988). This leads to a much higher level of polymorphism between unrelated individuals within a population. At the same time, one change in a thousand gametes is low enough so as to not interfere with the ability to follow minisatellite alleles in classical breeding studies.

Length polymorphisms at minisatellite loci are most simply detected by digestion of genomic DNA samples with a restriction enzyme that does not cut within the minisatellite itself but does cut within closely flanking sequences. As with all other RFLP analyses, the restriction digests are fractionated by gel electrophoresis, blotted and hybridized to probes derived from the polymorphic locus. However, unlike traditional point mutation RFLPs, minisatellites are caused by, and reflect, changes in the actual size of the locus itself.

The best restriction enzymes to use for minisatellite analysis are those with 4 bp recognition sites such as HaeIII, HinfI or Sau3A; it is likely that one of these enzymes will not cut within the relatively short minisatellite unit sequence, but will cut within several hundred basepairs of flanking sequence on both sides. Standard 1% agarose gels with maximal separation in the 1-4 kb range are usually best for the resolution of minisatellite bands; however, conditions can be optimized for each minisatellite system under analysis.

There is nothing special about the unit sequence present within minisatellites, which are defined only by their repeated nature and their repeat unit size. Thus, it is not possible to develop a general protocol for identifying all minisatellite sequences within the genome, and there is no way of knowing how many loci of this type are actually present. However, significant homology (indicative of evolutionary relatedness) often exists among unlinked minisatellite loci that are scattered throughout the genome. Homologies that allow cross-hybridization define minisatellite families that can have as few as two and as many as 50 members (Nakamura et al., 1987). It is often possible to take advantage of these cross-homologies to map ten or more minisatellite loci as independent RFLPs within single Southern blot hybridization patterns.

The simultaneous detection of 10-40 unlinked and highly polymorphic loci provides a whole genome "fingerprint" pattern which is very likely to show differences between any two unrelated individuals (Jeffreys et al., 1985). These DNA fingerprints provide a powerful tool in human forensic analysis in the absence of any knowledge as to the map location of any of the individual loci that are being detected (Armour and Jeffreys, 1992). DNA fingerprinting per se is of much less use in the analysis of laboratory animals, who do not bring paternity suits or stand trial for rape or murder. However, fingerprinting can allow field biologists to follow individual animals in wild populations subjected to repeated capture and release sampling. It can also be used to monitor the integrity of inbred strains of mice and for the characterization and comparison of different breeds of domesticated animals that have commercial importance.

New minisatellite families are uncovered by chance, by cross-hybridization with probes defined in other species (Jeffreys et al., 1987), or by the use of "synthetic tandem repeats" of arbitrary 14-20mer oligonucleotides (Mariat and Vergnaud, 1992). The first analysis of minisatellites in the mouse was performed with the use of several human minisatellite sequences as probes (Jeffreys et al., 1987). The results obtained in the analysis of a set of recombinant inbred strains (described at length in Chapter 9) demonstrated the expected high level of polymorphism as well as a high level of stability over time, both of which are critical properties for a useful mapping tool. Julier and his colleagues have performed more detailed mapping studies with a larger panel of human minisatellite probes (Julier et al., 1990) and, in collaboration with Mariat and colleagues, they have also performed minisatellite mapping with the use of arbitrary oligonucleotides of 14-16 bases in length (Mariat et al., 1993). With the 29 human-derived minisatellite probes tested, these authors found that 48% gave well-resolved complex fingerprint patterns upon hybridization to the mouse genome. With a set of 24 arbitrary oligonucleotides that were preselected for detection of minisatellites in humans, 23 were found to detect polymorphic loci in the mouse as well.

In an initial analysis with just 11 of the human minisatellite probes, a total of 115-234 restriction fragment differences were detected in pairwise comparisons among a series of seven M. musculus-derived inbred strains. The least number of polymorphic loci was observed in a comparison of C3H/He and DBA/2J; the highest number were observed between SJL/J and 129/Sv. Approximately twice as many polymorphisms were observed in pairwise comparisons between M. musculus-derived strains and a M. spretus inbred line.

The 11 characterized probes were used to follow the segregation of minisatellite alleles in a higher resolution analysis of the BXD set of RI strains as described in Chapter 9 (Julier et al., 1990). The 346 polymorphic bands followed in this study sorted into 166 independent loci, approximately half of which were represented by a single restriction fragment, with the remaining represented by two or more fragments. As expected, in several cases, new fragments were detected in particular RI strains that were not present in either of the parental inbred strains from which they were generated, attesting to the rapid rate at which minisatellite loci mutate to new alleles.

Mapping with multi-locus minisatellite probes is most effective for whole genome studies rather than for single chromosomes analyses. Thus, like the two-dimensional RFLP and RAPD technologies described below, minisatellite mapping is actually of greatest use for the initial development of whole genome "framework maps" of relatively uncharacterized species, of which the mouse is not one.

8.2.4 Dispersed multilocus analysis with cross-hybridizing probes

Minisatellite families are just one example of dispersed, cross-hybridizing loci that can be mapped simultaneously by Southern blot analysis. Another class of this type includes those gene families that have multiple members dispersed to unlinked chromosomal locations. In general, protein-encoding genes will be much less polymorphic than minisatellite loci; thus, simultaneous mapping of multiple members of gene families through RFLP analysis is best accomplished with interspecific backcrosses of the spretus - domesticus type. In one such study, probes for just two gene families — ornithine decarboxylase and triose phosphate isomerase — were combined with a probe for the highly polymorphic mouse mammary tumor virus (MMTV) elements (described below) in traditional Southern blot studies to detect and map a total of 28 loci to 16 of the 19 mouse autosomes (Siracusa et al., 1991).

A third broad class of cross-hybridizing loci is represented by the endogenous retroviral and retroviral-like elements which have been dispersed to random positions throughout the genome. A number of different families and subfamilies of this class have been identified (see Section 5.4.1). The best characterized of these (with average copy number per haploid genome in parentheses) are MMTVs (4-12), ecotropic MuLVs (0-10), non-ecotropic MuLVs (40-60), VL30s (~200), and IAPs (~2,000). In all of these cases, polymorphisms are a consequence of the recent integration of proviral elements so that particular elements are present in the genomes of some strains but not others; thus, each polymorphism is represented by a binary plus/minus system.

Both the MMTVs and ecotropic MuLVs are present at copy numbers that are suitable for mapping by standard agarose gel electrophoresis. By combining data from various crosses, Jenkins and colleagues (1982) mapped a total of 18 ecotropic MuLV-integrations sites that were named Emv-1 through Emv-18. In similar studies, 26 MMTV integration sites have been mapped among various inbred strains (Kozak et al., 1987); these have been named Mtv-1 through Mtv-26.

The non-ecotropic MuLV elements are present at a copy number which is somewhat too high for complete resolution of all elements on standard agarose gels. To overcome this problem, and to obtain maximal mapping information, it is possible to take advantage of the subfamily structure of this class of elements. Oligonucleotides that recognize different subsets of 10 to 30 loci per genome have been used as Southern blot probes with excellent resolving power (Frankel et al., 1990; Frankel et al., 1992). In general, 30-50% of the non-ecotropic viral elements are shared in any one pairwise comparison of inbred strains. By combining data from different sets of recombinant inbred lines, Frankel and colleagues were able to map over 100 non-ecotropic integration sites; these have been named with the prefixes Polytropic murine virus (Pmv-), modified polytropic murine virus (Mpmv-), or Xenotropic murine virus (Xmv-) according to the particular oligonucleotide that cross-hybridized to each element. With the MMTV and various MuLV families, it is still possible to use the same probes to map even more integration sites through the examination of strains that were not previously studied.

The retroviral-like families IAP (Lueders and Kuff, 1977) and VL30 (Courtney et al., 1982; Keshet and Itin, 1982) are present in 200 and 2,000 copies, respectively, per haploid genome. These families and others of the same class contain a large potential reservoir of useful genetic markers. However, their copy number is much too high to allow the resolution of individual family members in traditional Southern blot studies with restriction digested DNA samples. It is typically difficult to resolve more than 20 bands in a traditional one-dimensional hybridization pattern. Furthermore, as the copy number of cross-hybridizing bands increases, the resolution of individual bands actually decreases as more and more merge into each other to eventually form a continuous smear.

In theory, this problem could be alleviated in two different ways. The first approach would be the same as that used for the non-ecotropic loci, which is to reduce the complexity of the Southern blot pattern with the use of oligonucleotide probes that detect small subsets of the whole family. The validity of this approach has been demonstrated for the IAP family of elements (Meitz and Kuff, 1992).

A second, very different approach is based on increasing resolving power, rather than decreasing complexity, by fractionating genomic DNA in two sequential dimensions. This can be accomplished as illustrated in Figure 8.5. DNA samples are first subjected to digestion with a restriction enzyme that cuts relatively infrequently (step 1 in the Figure) followed by fractionation on an agarose gel (step 2). At the completion of electrophoresis, each sample-containing gel lane is excised and incubated directly with a second restriction enzyme that cuts more frequently (step 3). Finally, the gel slice itself is used as the sample for a second round of electrophoresis in a direction perpendicular to the first round (step 4). At the completion of this second dimension run, the gel is blotted, hybridized to the high copy-number probe, and autoradiographed.

Separation of DNA fragments in two dimensions, rather than one, should theoretically provide a "power of two" increase in resolution, from approximately 20 bands to 400. In fact, over 130 restriction fragment "spots" have been observed in individual two-dimensional patterns obtained with probes for the IAP and VL30 families (Sheppard et al., 1991; Sheppard and Silver, 1993). In general, each spot represents a single retroviral-like locus of the type defined by the probe used for hybridization. The X coordinate of the spot measures the distance between flanking restriction sites produced in the first digestion. The Y coordinate provides a measure of the distance between the two closest restriction sites (of either type) that flank the locus after the double digestion.

The main advantage of a two-dimensional mapping approach is that large numbers of loci from each animal can be mapped simultaneously. There are two main disadvantages to the general use of this approach for analyzing a large cross. First, only one animal can be analyzed within each gel. Second, from start-to-finish, each gel run can take five days to complete, and there is very little tolerance for mistakes of any kind throughout the protocol.

8.2.5 Restriction landmark genomic scanning

A significant variation on two-dimensional RFLP analysis has been developed by Hayashizaki and his colleagues (Hatada et al., 1991). With this novel protocol, restriction sites are scanned directly without the intervention of probes for specific loci. This can be accomplished through the direct end-labeling of a class of restriction sites that are generated by a rare-cutting enzyme followed by additional rounds of restriction digestion and gel separation. Briefly, the first restriction digestion is carried out with a rare-cutting enzyme like NotI which has an 8 bp recognition site that is present, on average, only once per megabase in the mouse genome (a first component of Step 1 in Figure 8.2). Digestion with NotI will produce a total of only ~3,000 fragments from each haploid genome. Labeling of NotI sites is accomplished by filling-in the single strand restriction site overhangs with radioactive nucleotides. Subsequently, the Not I fragments are reduced in size by digestion with a second enzyme having a 6 bp recognition site that produces fragments with an average size of 4-6 kb (a second component of step 1). Although the total number of restriction fragments per genome is increased 200-fold by this second digestion, only those fragments that have an original NotI site at one end will be labeled. This total mixture is fractionated by agarose gel electrophoresis (step 2) and then digested in situ with a third enzyme that has a 4 bp recognition site and thus cuts very frequently in the genome (step 3). The average size of restriction fragments has now been reduced to several hundred basepairs. The gel strip containing each sample is now placed on top of a polyacrylamide gel and a second orthogonal dimension of electrophoresis is carried out (step 4).

This RFLP protocol differs from all those described previously in that the rare restriction sites are visualized directly without the use of probes that light up particular loci or locus families. Thus, the complete set of fragments that flank both sides of every NotI site in the genome of an individual will be displayed in the pattern that is obtained. The X coordinate of each labeled spot will be a measure of the distance between the first labeled NotI restriction site and the nearest neighbor second restriction site. The Y coordinate of each spot will be a measure of the distance between the first labeled restriction site and the nearest neighbor third restriction site. Polymorphisms can arise from changes that affect any of the three restriction sites that define each spot.

Since the rare restriction sites themselves are labeled, blotting and hybridization steps are eliminated and autoradiographs are obtained by direct exposure of gels to film. The elimination of two lengthy steps significantly reduces the overall time required to process each sample. In addition, without blotting and hybridization, spots are much more sharp and well delineated from each other. Resolution is also improved with the use of a polyacrylamide, rather than agarose, medium in the second dimension of separation. Hayashizaki, Hatada and colleagues have reported the detection of several thousand spots on two-dimensional gels derived from individual mice (Hatada et al., 1991). Analysis of the BXD set of RI strains with this protocol has allowed the mapping of 473 polymorphic loci.

The advantages in resolution notwithstanding, the restriction landmark genomic scanning (RLGS) protocol is still technically demanding and it still allows the processing of only one sample per gel. Like other multiplex whole genome scanning methods, it is actually of greatest utility for the initial development of whole genome maps of relatively uncharacterized species.