5.3 GENOMIC ELEMENTS, GENOME EVOLUTION, AND GENE FAMILIES

Previous Next

5.3 GENOMIC ELEMENTS, GENOME EVOLUTION, AND GENE FAMILIES

5.3.1 Classification of genomic elements

5.3.1.1 Functional and non-functional sequences

Sequences within the genome can be classified according to a number of criteria. The most important of these is functionality and the largest class of functional DNA elements consists of coding sequences within transcription units. Transcription units usually contain exons and introns, and are usually associated with flanking regulatory regions that are necessary for proper expression. For the most part, transcription units correspond one-to-one with Mendelian genes, and they usually function on behalf of the organism within which they lie. However, mammalian genomes also contain transcribable elements that do not benefit the organism and whose sole function appears to be self-propagation. Such sequences are referred to as selfish DNA or selfish genes and will be described at length in Section 5.4. Although these sequences may undergo transcription, they cannot be detected, in and of themselves, in terms of traditional Mendelian phenotypes. The functional class of DNA elements also includes a number of specialized sequences that play roles in chromosome structure and transmission. The best characterized structural elements are associated with the centromeres and telomeres.

Most of the genome appears to consist of DNA sequences that are entirely non-functional. This non-functional class includes pseudogenes that derive from, and still share homology with, specific genes but are not themselves functional with a lack of transcription or translation. However, for the most part, non-functional DNA is present in the context of long lengths of apparently random sequence — located between genes and within their introns — with origins that have long since become indecipherable as a consequence of constant "genetic drift."

5.3.1.2 Single copy and repeated sequences

Both functional and non-functional sequences can be distinguished by a second criterion — copy number. Sequences in a genome that do not share homology with any other sequences in the same genome are considered unique or single copy. This single copy class contains both functional and non-functional elements. Sequences that do share homology with one or more other genomic regions are considered to be repeated or multicopy.

At one homology extreme, two sequences can show 100% identity to each other at the nucleotide level. At the other extreme, homology may be recognized only through the use of computer algorithms that show a level of identity between two sequences that is unlikely to have occurred by chance. In the case of many gene families, individual members are not identical — in fact, they are likely to have evolved different functions — yet a probe from one will cross-hybridize with sequences from the others. Cross-hybridization provides a powerful tool for the identification of multi-copy DNA elements by simple Southern blot analysis and for their characterization by library screening and cloning.

Homologies among more distantly related functional sequences that do not show cross-hybridization can sometimes be uncovered through the use of the polymerase chain reaction (PCR). The rationale behind this approach — which has been used successfully with a number of different gene families — is that specific short regions of related gene sequences may be under more intense selective pressure to remain relatively unchanged due to functional constraints on the encoded peptide regions. These highly conserved regions may not be long enough to allow cross-hybridization under blotting conditions, but the constrained peptide sequences that they encode can be used to devise two degenerate oligonucleotides for use as primers to identify additional members of the gene family through amplification from either genomic DNA or tissue-specific cDNA.

All sequences that are partially identical to each other — as recognized by hybridization, PCR, or sequence comparisons — are considered to be members of the same DNA element family. Families of functional elements are called gene families. Families of non-functional elements have been referred to simply as "repeat families" or "DNA element families". Multicopy DNA families — both functional and non-functional — can be further classified according to copy number, element size, and distribution within the genome. Related sequences can be found closely linked to each other in a cluster, they can be unlinked to each other and dispersed to different chromosomes, or they can have a combination of these two arrangements with multiple clusters dispersed to different sites.

From a distance, the genome appears to be a chaotic mixture of sequences from all of these classes thrown together without any structure or order, like craters, one overlapping the next, on the surface of the moon. However, on closer examination, it becomes possible to make sense of the genome, the relationship of different genomic elements to each other, and the mechanisms by which they have evolved as indicated for the hypothetical genomic region shown in Figure 5.4.

5.3.2 Forces that shape the genome

5.3.2.1 Genomic complexity increases by gene duplication and selection for new function

Mice, humans, the lowly intestinal bacterium E. coli, and all other forms of life evolved from the same common ancestor that was alive on this planet a few billion years ago. We know this is the case from the universal use of the same molecule — DNA — for the storage of genetic information, and from the nearly universal genetic code. But E. coli has a genome size of 4.2 mb, while the mammalian genome is nearly a thousand-fold larger at ~3,000 mb. If one assumes that our common ancestor had a genome size that was no larger than that of the modern-day E. coli, the obvious question one can ask is where did all of our extra DNA come from?

The answer is that our genome grew in size and evolved through a repeated process of duplication and divergence. Duplication events can occur essentially at random throughout the genome and the size of the duplication unit can vary from as little as a few nucleotides to large subchromosomal sections that are tens, or even hundreds, of megabases in length. When the duplicated segment contains one or more genes, either the original or duplicated copy of each is set free to accumulate mutations without harm to the organism since the other good copy with an original function will still be present.

Duplicated regions, like all other genetic novelties, must originate in the genome of a single individual and their initial survival in at least some animals in each subsequent generation of a population is, most often, a simple matter of chance. This is because the addition of one extra copy of most genes — to the two already present in a diploid genome — is usually tolerated without significant harm to the individual animal. In the terminology of population genetics, most duplicated units are essentially neutral (in terms of genetic selection) and thus, they are subject to genetic drift, inherited by some offspring but not others derived from parents that carry the duplication unit. By chance, most neutral genetic elements will succumb to extinction within a matter of generations. However, even when a duplicated region survives for a significant period of time, random mutations in what were once-functional genes will almost always lead to non-functionality. At this point, the gene becomes a pseudogene. Pseudogenes will be subject to continuous genetic drift with the accumulation of new mutations at a pace that is so predictable (~0.5% divergence per million years) as to be likened to a molecular clock. Eventually, nearly all pseudogene sequences will tend to drift past a boundary where it is no longer possible to identify the functional genes from which they derived. Continued drift will act to turn a once-functional sequence into a sequence of essentially random DNA.

Miraculously, every so often, the accumulation of a set of random mutations in a spare copy of a gene can lead to the emergence of a new functional unit — or gene — that provides benefit and, as a consequence, selective advantage to the organism in which it resides. Usually, the new gene has a function that is related to the original gene function. However, it is often the case that the new gene will have a novel expression pattern — spatially, temporally, or both — which must result from alterations in cis-regulatory sequences that occur along with codon changes. A new function can emerge directly from a previously-functional gene or even from a pseudogene. In the latter case, a gene can go through a period of non-functionality during which there may be multiple alterations before the gene comes back to life. Molecular events of this class can play a role in "punctuated evolution" where, according to the fossil or phylogenetic record, an organism or evolutionary line appears to have taken a "quantum leap" forward to a new phenotypic state.

5.3.2.2 Duplication by transposition

With duplication acting as such an important force in evolution, it is critical to understand the mechanisms by which it occurs. These fall into two broad categories: (1) transposition is responsible for the dispersion of related sequences; (2) unequal crossing over is responsible for the generation of gene clusters. Transposition refers to a process in which one region of the genome relocates to a new chromosomal location. Transposition can occur either through the direct movement of original sequences from one site to another or through an RNA intermediate that leaves the original site intact. When the genomic region itself (rather than its proxy) has moved, the "duplication" of genetic material actually occurs in a subsequent generation after the transposed region has segregated into the same genome as the originally-positioned region from a non-deleted homolog. In theory, there is no upper limit to the size of a genomic region that can be duplicated in this way.

A much more common mode of transposition occurs by means of an intermediate RNA transcript that is reverse-transcribed into DNA and then inserted randomly into the genome. This process is referred to as retrotransposition. The size of the retrotransposition unit — called a retroposon — cannot be larger than the size of the intermediate RNA transcript. Retrotransposition has been exploited by various families of selfish genetic elements (described in Section 5.4), some of which have been copied into 100,000 or more locations dispersed throughout the genome with a self-encoded reverse transcriptase. However, examples of functional, intronless retroposons — such as Pgk2 and Pdha2 — have also been identified (Boer et al., 1987; Fitzgerald et al., 1993). In such cases, functionality is absolutely dependent upon novel regulatory elements either present at the site of insertion or created by subsequent mutations in these sequences.

5.3.2.3 Duplication by unequal crossing over

The second broad class of duplication events result from unequal crossing over. Normal crossing over, or recombination, can occur between equivalent sequences on homologous chromatids present in a synaptonemal complex that forms during the pachytene stage of meiosis in both male and female mammals. Unequal crossing over — also referred to as illegitimate recombination — refers to crossover events that occur between non-equivalent sequences. Unequal crossing over can be initiated by the presence of related sequences — such as highly repeated retroposon-dispersed selfish elements — located nearby in the genome (Figure 5.5). Although the event is unequal, in this case, it is still mediated by the homology that exists at the two non-equivalent sites.

So-called non-homologous unequal crossovers can also occur, although they are much rarer than homologous events. I say so-called because even these events may be dependent on at least a short stretch of sequence homology at the two sites at which the event is initiated. The initial duplication event that produces a two-gene cluster may be either homologous or non-homologous, but once two units of related sequence are present in tandem, further rounds of homologous unequal crossing over can be easily initiated between non-equivalent members of the pair as illustrated in Figure 5.5. Thus, it is easy to see how clusters can expand to contain three, four, and many more copies of an original DNA sequence.

In all cases, unequal crossing over between homologs results in two reciprocal chromosomal products: one will have a duplication of the region located between the two sites and the other will have a deletion that covers the same exact region (Figure 5.5). It is important to remember that, unlike retrotransposition, unequal crossing over operates on genomic regions without regard to functional boundaries. The size of the duplicated region can vary from a few basepairs to tens or even hundreds of kilobases and it can contain no genes, a portion of a gene, a few genes, or many.

5.3.2.4 Genetic exchange between related DNA elements

There are many examples in the genome where genetic information appears to flow from one DNA element to other related — but non-allelic — elements located nearby or even on different chromosomes. In some special cases, the flow of information is so extreme as to allow all members of a gene family to co-evolve with near-identity as discussed in Section 5.3.3.3. In at least one case — that of the class I genes of the major histocompatibility complex (MHC or H2) — information flow is unidirectionally selected, going from a series of 25 to 38 non-functional pseudogenes into two or three functional genes (Geliebter and Nathenson, 1987). In this case, intergenic information transfer serves to increase dramatically the level of polymorphism that is present at the small number of functional gene members of this family.

Information flow between related DNA sequences occurs as a result of an alternative outcome from the same exact process that is responsible for unequal crossing over. This alternative outcome is known as intergenic gene conversion. Gene conversion was originally defined in yeast through the observation of altered ratios of segregation from individual loci that were followed in tetrad analyses. These observations were fully explained within the context of the Holliday model ³² of DNA recombination which states that homologous DNA duplexes first exchange single strands that hybridize to their complements and migrate for hundreds or thousands of bases. Resolution of this "Holliday intermediate" can lead with equal frequency to crossing over between flanking markers or back to the status quo without crossing over. In the latter case, a short single-strand stretch from the invading molecule will be left behind within the DNA that was invaded. If an invading strand carries nucleotides that differ at any site from the strand that was replaced, these will lead to the production of heteroduplexes with basepair mismatches. Mismatches can be repaired (in either direction) by specialized "repair enzymes" or they can remain as they are to produce non-identical daughter DNAs through the next round of replication.

By extrapolation, it is easy to see how the Holliday Model can be applied to the case of an unequal crossover intermediate. With one resolution, unequal crossing over will result; with the alternative resolution, gene conversion can be initiated between non-allelic sequences. Remarkably, information transfer — presumably by means of gene conversion — can also occur across related DNA sequences that are even distributed to different chromosomes.

5.3.3 Gene families and superfamilies.

5.3.3.1 Origins and examples

Much of the functional DNA in the genome is organized within gene families and hierarchies of gene superfamilies. The superfamily term was coined to describe relationships of common ancestry that exist between and among two or more gene families, each of which contains more closely related members. As increasingly more genes are cloned, sequenced, and analyzed by computer, deeper and older relationships among superfamilies have unfolded. Complex relationships can be visualized within context of branches upon branches in evolutionary trees. All of these superfamilies have evolved out of combinations of unequal crossover events that expanded the size of gene clusters and transposition events that acted to seed distant genomic regions with new genes or clusters.

A prototypical small-size gene superfamily is represented by the very well-studied globin genes illustrated in Figure 5.6. All functional members of this superfamily play a role in oxygen transport. The superfamily has three main families (or branches) represented by the beta-like genes, the alpha-like genes and the single myoglobin gene. The duplication and divergence of these three main branches occurred early during the evolution of vertebrates and, as such, all three are a common feature of all mammals. The products encoded by genes within two of these branches — alpha-globin and beta-globin — come together (with heme cofactors) to form a tetramer which is the functional hemoglobin protein that acts to transport oxygen through the blood stream. The product encoded by the third branch of this superfamily — myoglobin — acts to transport oxygen in muscle tissue.

The beta-like branch of this gene superfamily has duplicated by multiple unequal crossing over events and diverged into five functional genes and two beta-like pseudogenes that are all present in a single cluster on mouse chromosome 7 as shown in Figure 5.6 (Jahn et al., 1980). Each of the beta-like chains codes for a similar polypeptide, which has been selected for optimal functionality at a specific stage of mouse development: one functions during early embryogenesis, one during a later stage of embryogenesis, and two in the adult. The alpha-like branch has also expanded by unequal crossing over into a cluster of three genes — one functional during embryogenesis and two functional in the adult — on mouse chromosome 11 (Leder et al., 1981). The two adult alpha genes are virtually identical at the DNA sequence level, which is indicative of a very recent duplication event or concerted evolution (see Section 5.3.3.3).

In addition to the primary alpha-like cluster are two isolated alpha-like genes (now non-functional) that have transposed to dispersed locations on chromosomes 15 and 17 (Leder et al., 1981). When pseudogenes are found as single copies in isolation from their parental families, they are called "orphons." Interestingly, one of the alpha globin orphons (Hba-ps3 on Chr 15) is intronless and would appear to have been derived through a retrotransposition event, whereas the other orphon (Hba-ps4 on Chr 17) contains introns and may have been derived by a direct DNA-mediated transposition. Finally, the single myoglobin gene on chromosome 15 does not have any close relatives either nearby or far away (Blanchetot et al., 1986; Drouet and Simon-Chazottes, 1993). Thus, the globin gene superfamily provides a view of the many different mechanisms that can be employed by the genome to evolve structural and functional complexity.

The Hox gene superfamily provides an alternative prototype for the expansion of gene number as illustrated in Figure 5.7. In this case, the earliest duplication events (which pre-date the divergence of vertebrates and insects) led to a cluster of related genes that encoded DNA-binding proteins used to encode spatial information in the developing embryo. The original gene cluster has been duplicated en masse and dispersed to a total of four chromosomal locations (on Chrs 2, 6, 11, and 15) each of which contains 9-11 genes (McGinnis and Krumlauf, 1992). Interestingly, because of the order in which the duplication events occurred — unequal crossing over to expand the cluster size first, transposition en masse second — an evolutionary tree would show that a single "gene family" within this superfamily is actually splayed out physically across all of the different gene clusters as shown in Figure 5.7. Some gene additions and subtractions within individual clusters have occurred by unequal crossing over since the en masse duplication so that differences in gene number and type can be seen within a basic framework of homology among the different whole clusters.

A final example of a gene super-superfamily is the very large set of genes that contain immunoglobulin-like (Ig) domains and function as cell surface or soluble receptors involved in immune function or other aspects of cell-cell interaction. This set includes the Ig gene families themselves, the major histocompatibility genes (called H2 in mice), the T cell receptor genes and many more (Hood et al., 1985). There are dispersed genes and gene families, small clusters, large clusters, and clusters within clusters, tandem and interspersed. Dispersion has occurred with the transposition of single genes that later formed clusters and with the dispersion of whole clusters en masse. Furthermore, the original Ig domain can occur as a single unit in some genes, but it has also been duplicated intragenically to produce gene products that contain two, three, or four domains linked together in a single polypeptide. The Ig superfamily, which contains hundreds (perhaps thousands) of genes, illustrates the manner in which the initial emergence of a versatile genetic element can be exploited by the forces of genomic evolution with a consequential enormous growth in genomic and organismal complexity.

5.3.3.2 Does gene order or localization matter?

Does the chromosome on which a gene lies matter to its function? Is gene clustering significant to function or is it simply a remnant of the fact that duplicated genes are most often generated by unequal crossover events? One can gain insight toward the answers to these questions by comparing the positions of homologous genetic information in different species: specifically mice and humans. Whole genome comparisons quickly demonstrate that the question of conservation to a particular chromosome only makes sense in the context of the X and Y. This is because every autosome from one species contains significant stretches of homology with two or more autosomes in the other species. Thus, the question of autosome conservation is meaningless. The X and Y chromosomes are a different story for three interrelated reasons. First, as a pair, they play a special role in sex determination. Second, they are the only chromosomes that can appear in a hemizygous state in normal genomes. Third, the X chromosome alone is subject to stable inactivation in all normal female mammals. With few exceptions, X-linked and Y-linked genes have remained in the same linkage groups throughout mammalian evolution as originally proposed by Ohno (1967), although various intra-chromosomal rearrangements have occurred (Bishop, 1992; Brown et al., 1992; Foote et al., 1992).

The second question asked at the head of this section can be re-stated as follows: do fine-structure genetic maps have functional significance? The answer is that in at least some cases, the integrity of genes within a clustered family is clearly important to function. This was first illustrated in the case of the beta-globin gene family with its five members arranged in a 70 kb array (Figure 5.6). Although beta-globin was used in the first transgenic experiments conducted in 1980 and many subsequent experiments, it was never possible for researchers to achieve full expression of the transgene at the same level as the endogenous gene. The problem was that all of the members of the endogenous gene family are dependent for expression on a locus control region (or LCR) that maps outside of the gene cluster and appears to play a role in "opening-up" the chromatin structure of the entire cluster in hematopoietic cells so that individual family members can then be regulated in different temporal modes (Talbot et al., 1989; Townes and Behringer, 1990). When transgene constructs are produced with the beta-globin LCR linked to the beta globin structural gene, full endogenous levels of expression can be obtained (Grosveld et al., 1987). In recent years, evidence has accumulated for the role of LCRs in the global control of other gene clusters as well.

There is not only a requirement for some genes to remain in their ancestral cluster, but in some cases, the precise order of genes is conserved as well. Actual gene order has been observed to play roles in two different patterns of expression. Transgenic experiments indicate that for the beta-globin cluster, the temporal sequence of expression appears to be directly encoded (to a certain extent) in the order in which the genes occur (Hanscombe et al., 1991). In the Hox gene clusters, the order of genes correlates with the pattern of spatial expression along the anterior-posterior axis of the developing embryo (McGinnis and Krumlauf, 1992 and Figure 5.7).

There are also a few examples of genes and clusters that are unrelated by sequence, but which map together in a small chromosomal region and have a common arena of function. The best example of this phenomenon is the major histocompatibility complex which contains various gene families that diverged from a common immunoglobulin-like domain ancestor but also unrelated genes that play a role in antigen presentation and other aspects of immune function. This conjunction of immune genes has been conserved in all mammalian species that have been examined. Is this significant? Farr and Goodfellow (1992) quote Sydney Brenner in likening gene mappers to astronomers boldly mapping the heavens and conclude that "Seeking meaning in gene order may be the equivalent of astrophysics — or it might be astrology". I think it is safe to bet that sometimes it will be one and sometimes it will be the other. The problem will be to distinguish between the two.

5.3.3.3 Tandem families of identical genetic elements

A limited number of multi-copy gene families have evolved under a very special form of selective pressure that requires all members of the gene family to maintain essentially the same sequence. In these cases, the purpose of high copy number is not to effect different variations on a common theme, but rather to supply the cell with a sufficient amount of an identical product within a short period of time. The set of gene families with identical elements includes those that produce RNA components of the cell’s machinery within ribosomes and as transfer RNA. It also includes the histone genes which must rapidly produce sufficient levels of protein to coat the new copy of the whole genome that is replicated during the S phase of every cell cycle.

Each of these gene families is contained within one or more clusters of tandem repeats of identical elements. In each case, there is strong selective pressure to maintain the same sequence across all members of the gene family because all are used to produce the same product. In other words, optimal functioning of the cell requires that the products from any one individual gene are directly interchangeable in structure and function with the products from all other individual members of the same family. How is this accomplished? The problem is that once sequences are duplicated, their natural tendency is to drift apart over time. How does the genome counteract this natural tendency?

When ribosomal RNA genes and other gene families in this class were first compared both between and within species, a remarkable picture emerged: between species, there was clear evidence of genetic drift with rates of change that appeared to follow the molecular clock hypothesis described earlier. However, within a species, all sequences were essentially equivalent. Thus, it is not simply the case that mutational changes in these gene families are suppressed. Rather, there appears to be an on-going process of "concerted evolution" which allows changes in single genetic elements to spread across a complete set of genes in a particular family. So the question posed previously can now be narrowed down further: how does concerted evolution occur?

Concerted evolution appears to occur through two different processes (Dover, 1982; Arnheim, 1983). The first is based on the expansion and contraction of gene family size through sequential rounds of unequal crossing over between homologous sequences. Selection acts to maintain the absolute size of the gene family within a small range around an optimal mean. As the gene family becomes too large, the shorter of the unequal crossover products will be selected; as the family becomes too small, the longer products will be selected. This cyclic process will cause a continuous oscillation around a mean in size. However, each contraction will result in the loss of divergent genes, whereas each expansion will result in the indirect "replacement" of these lost genes with identical copies of other genes in the family. With unequal crossovers occurring at random positions throughout the cluster and with selection acting in favor of the least divergence among family members, this process can act to slow down dramatically the continuous process of genetic drift between family members.

The second process responsible for concerted evolution is intergenic gene conversion between "non-allelic" family members. It is easy to see that different tandem elements of nearly identical sequence can take part in the formation of Holliday intermediates which can resolve into either unequal crossing over products or gene conversion between non-allelic sequences. Although the direction of information transfer from one gene copy to the next will be random in each case, selection will act upon this molecular process to ensure an increase in homogeneity among different gene family members. As discussed above, information transfer — presumably by means of gene conversion — can also occur across gene clusters that belong to the same family but are distributed to different chromosomes.

Thus, with unequal crossing over and interallelic gene conversion (which are actually two alternative outcomes of the same initial process) along with selection for homogeneity, all of the members of a gene family can be maintained with nearly the same DNA sequence. Nevertheless, concerted evolution will still lead to increasing divergence between whole gene families present in different species.

5.3.4 Centromeres and satellite DNA

In the early days of molecular biology, equilibrium sedimentation through CsCl₂ gradients was used as a method to fractionate DNA according to bouyant density. Genomic DNA prepared from animal tissues according to standard protocols is naturally degraded by shear forces into fragments that are, on average, smaller than 100 kb. When a solution of genomic DNA fragments is subjected to high-speed centrifugation in CsCl₂, each fragment will move to a position of equivalent density in the CsCl₂ concentration gradient that forms. DNA buoyant density is related to the molar ratio of G:C basepairs to A:T basepairs by a simple linear function. ³³ The greater the G:C content, the higher the density. When mouse DNA is subjected to CsCl₂ fractionation, the bulk of the DNA (90%) is distributed within a narrow bell-shaped curve having an average density of 1.701 g/cm³, equivalent to a G:C content of 42%.

In addition to this "main band" of DNA, a second "satellite band" was observed with an average density of 1.690 g/cm³ equivalent to a G:C content of 31% (Kit, 1961). Approximately 5.5% of the total mouse genome is found within this band and the DNA within this fraction was given the name "satellite DNA" (Davisson and Roderick, 1989). It was not until 1970 that Pardue and Gall used their newly invented technique of in situ hybridization to demonstrate the localization of satellite DNA sequences to the centromeres of all mouse chromosomes except the Y (Pardue and Gall, 1970). Centromeres are highly specialized structural elements that function to segregate eukaryotic chromosomes during mitosis and meiosis (Rattner, 1991).

When DNA recovered from the satellite band was subjected to renaturation analysis, as described earlier in this chapter, the C₀t_1/2 value obtained indicated a complexity of only ~200 nucleotides. This result showed that the satellite DNA fraction was composed of a simple sequence that was repeated over and over again, many times. Modern cloning and sequence analysis has demonstrated a basic repeating unit with a size of 234 bp (Hörz and Altenburger, 1981). ³⁴ One can calculate the copy number of this basic repeat unit by dividing the proportion of the genome devoted to satellite sequences (5.5% x 3 x 10⁹ bp = 1.65 x 10⁸ bp) by the repeat size (234 bp) to obtain 700,000 copies. If these copies were distributed equally among all chromosomes, each centromere would contain 35,000 copies having a total length of 8 mb.

Although the original definition of "satellite" DNA was based on a density difference observed in CsCl₂ gradients, the meaning of the term has expanded to describe all highly repeated simple sequences found in the centromeres of chromosomes from higher eukaryotes. In many species, satellite sequences do not have G:C contents that differ from that of the bulk DNA.

The M. musculus genome has a second family of satellite sequences present in only 50-100,000 copies (Davisson and Roderick, 1989). This "minor satellite" is also localized to the centromeres and appears to share a common ancestry with the major satellite. It is of interest that the relative proportion of the two satellites in M. spretus is the reverse of that found in M. musculus. The M. spretus genome has only 25,000 copies of the "major satellite" and 400,000 copies of the "minor satellite". This difference can be exploited to allow the determination of centimorgan distances between centromeres and linked loci in interspecific crosses as discussed in Section 9.1.2 (Matsuda and Chapman, 1991).

The satellite sequences in the distant Mus species M. caroli, M. cervicolor, and M. cookii ( Figure 2.2) have diverged so far from the musculus sequences that cross-hybridization between the two is minimal. This qualitative difference can be exploited, once again, by in situ hybridization, to mark differentially cells from each species in interspecific chimeras (Rossant et al., 1983). A satellite DNA marker is useful for cell lineage studies because it is easy to detect by hybridization of tissue sections and it is present in all cells irrespective of gene activity or developmental state.

The term satellite has been incorporated as a suffix into a number of other terms (microsatellite, minisatellite, midisatellite, etc.) that are used to describe DNA sequences formed from basic units that have become amplified by multiple rounds of tandem duplication. Some of these sequence classes are described in Section 5.4.5 and Chapter 8 (8.2.3, 8.3.6).