APPENDIX B: COMPUTATIONAL TOOLS AND ELECTRONIC DATABASES

Previous Next

Appendix B
Computational Tools and Electronic Databases

B1 ON-LINE ELECTRONIC DATABASES
B1.1 The World Wide Web
An ever-increasing amount of genetic information can be easily accessed and selectively retrieved over the Internet through the use of newly developed software protocols that break through the arcane language that was characteristic of Internet communications up until just a few years ago. It is now possible to access information over the Internet without knowing any computer commands and by barely touching the computer keyboard. The opening of the academic information superhighway was initiated with the development of a user interface called Gopher at the University of Minnesota. Gopher provided anyone sitting at an Internet-linked computer with the ability to "surf the net" by clicking through a series of menus to move from one Internet location to another, where it was often possible to search for and recover specific information. Although Gopher opened up the Internet to the masses, it is limited in the sense that it is essentially a text-based system (although it does allow one to download images as well). Those interested in the Gopher system of software should write to the Gopher development group at the following E-mail address: Gopher@boombox.micro.umn.edu.

By the summer of 1994, a new vehicle for Internet surfing had taken hold in the form of the World Wide Web, which is also known as WWW, W3, or simply the Web. WWW is a body of software and a set of protocols and conventions developed initially at CERN, a high-energy physics laboratory in Switzerland. WWW is a true multimedia system with a highly graphical interface containing imbedded hypermedia. WWW allows the flow not only of text, but also images, video animation, and sound over the Internet. Hypermedia referes to the fact that text and objects on the computer screen can be given life in the sense that pointing at them and clicking on a mouse will initiate software protocols that, among other possibilities, provide "gateways" that transport a user from one Internet site to another which may be halfway around the world. In this way, hypermedia allows information to be organized as an interconnected web of associations rather than as a linear chain.

In order to use the Web to the fullest extent, it is important to understand the meaning of three critical terms: server, client, and uniform resource locator (or URL). Any computer site on the Internet that has opened itself up to "browsing" and information retreival via WWW is called a server. A WWW server must run special server software to present its information in the form of hypermedia, but otherwise, there are few limitations to who may develop a server.

Surfing the Web is accomplished by linking into, and jumping between, WWW servers over the Internet. To begin WWW explorations, you must install special software on your computer known as client software (or a WWW client). Different types of client software have been developed for the Macintosh, IBM-compatible computers running Windows, and other computer platforms. The most popular client was developed by the National Center for Supercomputing Applications (NCSA) and is called Mosaic with separate versions for different computer types. For more information, make contact through the following E-mail address: mosaic-mac@ncsa.uiuc.edu. It is also possible to download Mosaic software directly by anonymous file transfer protocol (anonymous FTP) at the NCSA FTP server: ftp ftp.ncsa.uiuc.edu. Several othertypes of WWW client software have been developed independently including MacWeb of the Macintosh, and Cello and WinWeb for the Windows environment. For more information of MacWeb, write to the following E-mail address: macweb@einet.net. For more information on Cello, write to the following E-mail address: cellobug@fatty.law.cornell.edu. For more information on WinWeb, write to the following E-mail address: winweb@einet.net. To download either Cello or MacWeb client software directly by anonymous FTP, go to the following server: ftp.einet.net.

An understanding of the final critical term — URL — will allow you to find particular WWW servers of interest among the many thousands that exist on the Internet. The URL is a long file name that acts as an address for each WWW server as well as each packet of hypermedia available at that server. Once WWW client software is up and running on your computer, you can move to any WWW site by opening the File menu and choosing the "Open URL..." command. Then type the URL of interest into the box, hit return, and off you go. URLs of particular interest to mouse geneticists are described below. As this book is being written, WWW us still in its infancy and is expanding rapidly. It is a near-certainty that the data sources described below will be enriched and supplemented with new sources of information and modes for their retrieval. You should contact the sources of client software described above or your local computer advisor for up-to-date information on the most advanced system available at the time of this reading.

B1.2 Mouse Genetics

The central WWW server for mouse geneticists is located at the Jackson Laboratory and has the following URL: http://www.informatics.jax.org/. This server provides access to the Mouse Genome Database (MGD) which is a compendium of linked data sources that includes locus information, genetic mapping data, polymorphisms, probes, clones, PCR primers, citations, and homology data with other mammalian genomes. Included in this compendium is the online version of Margaret Green's "Catalog of mutant genes and polymorphic loci" (Green, 1989). The online version, called the Mouse Locus Catalog (or MLC) is updated regularly by staff members of the Jackson Laboratory led by Dr M.T. Davisson and Dr D.P. Doolittle. The entire database can be searched to retrieve a list of loci (with associated descriptions and citations) that contain a particular search word or combination of words anywhere within the locus record. For further information, the MGD staff can be reached at the following E-mail address: mgi-help@jax.org.

The Genome Center at the Whitehead Institute, MIT provides a highly specialized but very useful source of information through either an automated E-mail query and answer protocol or anonymous FTP. This Genome Center has characterized and mapped over 3,000 polymorphic mouse microsatellite markers (as of January, 1994) that are each defined by a unique pair of oligonucleotide primers. By filling out and transmitting a special E-mail query form, an investigator can retrieve an automatic response containing information about sets of microsatellite markers defined by various criteria, including chromosomal location and polymorphisms between particular strains. It is possible to retrieve primer sequences as well as a graphical representation of a chromosome map that shows all the microsatellites (in Macintosh PICT format suitable for incorporation into all Mac drawing programs). To receive a blank E-mail query form and instructions for its use, send an E-mail message with the single word help to genome_database@genome.wi.mit.edu.

Data can also be retrieved by anonymous ftp to genome.wi.mit.edu. For more information, contact Linclon Stein at the Whitehead Institute [Email address: lstein@genome.wi.mit.edu].

B1.3 Other WWW servers of interest to mouse geneticists

The central WWW server for human genetic information is maintained by the Genome Database group at the Johns Hopkins University and has the following URL: http://www.gdb.org/. A compendium of interlinked databases is available at this site where it is possible to recover information on human chromosomes, loci, maps, mutations, citations, cell lines, probes, polymorphisms, and other data items. Included in this compendium is the online version of Victor McKusick's Mendelian Inheritance of Man (called OMIM) which contains comprehensive textual information on all defined human loci. For further information, the GDB staff can be reached at the following E-mail address: Help@gdb.org, or the following address: GDB User Support, Genome Data Base, Johns Hopkins University, 2024 E. Monument Street, Baltimore, MD 21205-2100.

Another critical WWW server is maintained by the National Center for Biotechnology Information ( NCBI) which is the location of the GenBank database. The URL for this server is: https://www.ncbi.nlm.nih.gov/. GenBank searches can be conducted through this linkup. For further information, contact NCBI at the following E-mail address: info@ncbi.nlm.nih.gov.

There are numerous other databases that may be of interest to mouse geneticists including servers devoted to other experimental genetic systems. Many of the WWW servers described above provide gateways to these other servers. A very useful set of gateways to WWW servers of particular interest to biologists is called the Biologist's Control Panel and can be reached at the following URL: http://gc.bcm.tmc.edu:8088/bio/bio_home.shtml.

B2 MOUSE GENETICS COMMUNITY BULLETIN BOARD

An electronic bulletin board (called MGI-LIST) has been set up at the Jackson Laboratory with E-mail addresses of mouse geneticists throughout the world. The Mouse Genome Informatics Group at the JAX will use this bulletin board to make announcements of new software related to the Mouse Genome Data Base. In addition, all subscribers will be able to broadcast to, and receive messages of interest from, all other linked-up members of the community. If you would like to subscribe to this bulletin board, send a message that reads "subscribe mgi-list <your name>" to lyris@lists.informatics.jax.dundee.net. For further information or help, send an E-mail message to mgi-help@jax.org.

B3 COMPUTER PROGRAMS

B3.1 Linkage analysis

Numerous computer programs are available for the determination of linkage relationships among loci typed in large numbers of individuals. Most of these were developed for the analysis of complex human pedigrees with algorithms that incorporate Maximum Likelihood Estimation statistics (Elston and Stewart, 1971) and provide LOD score (Morton, 1955) output. Although these pedigree-based programs can be used for the analysis of mouse linkage data, they are usually less than optimal for this task. First, it is often not possible to derive unequivocal haplotype information for all individuals studied in a human pedigree, and as such, pedigree-based programs are not oriented toward this type of analysis which is so common in mouse linkage studies (see Figure 9.15). Second, for most studies, mouse geneticists will not need to avail themselves of the sophisticated Maximum likelihood/LOD score estimation tools required for human pedigree analysis. Two linkage programs developed specifically for mouse linkage studies — Map Manager and GENE-LINK — are described below. In addition, the most useful multi-locus pedigree-based program — Mapmaker — is also described. Other pedigree-based linkage analysis packages are listed and reviewed by Bryant (1991).

Map Manager is a Macintosh program that can be used for the analysis of linkage data obtained from a backcross, F₁ X F₁ intercross, or from a group of recombinant inbred strains (Manly, 1993). It was written by Dr. Kenneth F. Manly and is made available without charge from the author. Map Manager uses a standard Macintosh format and is both extremely versatile and extremely easy to use. It allows easy storage, display, and retrieval of information from mapping experiments. The strain distribution patterns (SDPs) obtained with newly typed loci can be evaluated rapidly to determine likely map positions relative to other loci in the database. The program is distributed with a large database of previously published RI strain distribution patterns which greatly facilitates the RI analysis of new loci. Map Manager has many other sophisticated features including a statistical evaluation package and various output options, as well as a means for importing and exporting data to and from spreadsheets or Mapmaker files (see below). It can be obtained from the author on disk or by anonymous ftp or gopher from several sites including (mcbio.med.buffalo.edu), (hobbes.jax.org), (ftp.bio.indiana.edu), and (ftp.embl-heidelberg.de). For further information, Dr. Manly can be contacted at the E-mail address Kmanly@mcbio.med.buffalo.edu or at Roswell Park Cancer Institute, Buffalo, New York 14263.

GENE-LINK is a DOS program that can be used for the analysis of linkage data from a backcross (Montagutelli, 1990). It was written by Dr. Xavier Montagutelli and has been made available to investigators without cost. It will provide best-fit map positions for entered strain distribution patterns. For further information, Dr. Montagutelli can be contacted at Institut Pasteur de Paris, 25 Rue du Docteur Roux, 75724 Paris, Cedex 15, France (gensoft@pasteur.fr).

Mapmaker and Mapmaker/QTL are a pair of pedigree-based programs written by Dr. Eric Lander and his colleagues for constructing linkage maps from raw genotyping and phenotyping data recovered from large numbers of loci (Lander et al., 1987). These programs use a highly efficient algorithm for "likelihood of linkage" computations. They can be run on UNIX-based workstations or VAX minicomputers running under the VMS operating system. The programs and a manual are available from the author for licensing to academic researchers. Mapmaker is most useful to mouse geneticists for the analysis of linkage data obtained from an F₁ X F₁ intercross. Mapmaker/QTL can be used for the analysis of quantitative traits. For further information, contact Dr. Eric Lander (lander@genome.wi.mit.edu), Whitehead Institute, 9 Cambridge Center, Cambridge, MA 02142.

B3.2 Mouse colony management software

Animal House Manager (AMAN) and MacAMAN are software packages for IBM-compatables and Macintosh computers, that can be used for keeping track of a breeding mouse colony with records for animals, litters, cages, and samples derived from them. These programs are described more fully in Chapter 3 (Silver, 1986; Silver, 1993b). They were written by the author of this book and can be licensed for use from Princeton University. For further information, the author can be reached at his E-mail address: Lsilver@Molbiol.Princeton.Edu, or at the Department of Molecular Biology, Princeton University, Princeton, NJ 08544-1014.

Previous Next