If you are on the QTL Quick Tour Click Here when you are done with this page!
Key to Interpreting the QTL Data Model Diagram:
Data Editor Form Layouts
This is a description of a conceptual model for representing QTL data using a relational data model format. I have structured the information in the form of an entity relation diagram to help in the mapping of the QTL data types onto the existing MGD schema.
(Note that some of the links are probably not appropriate if taken literally. For example, I use jnum as the link to the reference in MGD. This is a conceptual link. It means that there should be a link to the MGD bibliographic tables, not that the link should be explicitly made using the J#. )
The model was designed to support basic queries such as:
***"How many QTLs are associated with Resistance/Susceptibility to
viral infection?"
***"How many QTLs have been mapped to the region of chromosome 5
between markers D5Mityy and D5Mitxx?"
***"What genes have been mapped in the vicinity of QTLs associated with
diabetes?"
Although a bit tricky, the data model can support some intra-specific comparisons.....(the tricky part is that QTL locations are often highly uncertain)
***"Among all of the studies of alcohol avidity, are any QTLs found in all or most of the studies?"What this model does not do well is provide direct support for more complex queries that involve inter-specific comparisons
***"Are the QTL regions in mouse associated with bronchial hyperresponsiveness in regions of synteny with the human genome? If so, what genes mapped in humans in the syntenic region also have been mapped in mouse?"
Controlled vocabularies referenced throughout are accessible through my
web page: http://www/~cjb. There also are several QTL reports based on
my attempts to code QTL papers there. Sometimes looking at those
reports can help clarify what I mean by certain fields in the
conceptual model.
Carol Bult
April 1997
To browse examples of coded QTL-related papers, Click Here and look at the "Real World QTL Data: QTL Reports".
#######################
Phenotype_Class represents a controlled vocabulary of broad categories into which phenotypes can be binned. The purpose of PhenotypeClass is to provide a phenotypically-based entrance point into the QTL data that is general enough to capture most of the studies of interest to the user and specific enough to be useful for grouping studies together.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| pc_id | unique identifier for Phenotype Class | numeric | NULL |
| p_id | unique identifier for Phenotype | numeric | NULL |
| pc_name | name of Phenotype_Class | varchar | controlled vocab |
Form
Top
###################
Phenotype is the name of the complex trait described in a manuscript. This is not meant to be a controlled vocabulary. The data editor will need to paraphrase the phenotype name from the title or abstract of the paper.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| p_id | unique identifier for Phenotype | numeric | NULL |
| jnum | conceptual link to MGD BIB | numeric | NULL |
| m_id | unique identifier for PhenotypeMode | numeric | NULL |
| pc_id | unique identifier for PhenotypeClass | numeric | NULL |
| pheno_name | name of the Phenotype | varchar | Multigenic obesity |
| p_notes | comment field | varchar | For notes on treatment dosages and how traits were measured |
Form
Top
##############
Trait means the specific measured or observed attribute that is used to
define the phenotype. For example, multigenic obesity is defined or
measured by more than a dozen traits in one published study (see
J24238). Therefore, each phenotype may have
multiple traits. The traits used to measure a phenotype in one study
are not necessarily the traits used to measure the "same" phenotype in
another. Obesity is a good example again. One group may measure obesity
as a suite of 16 traits. Another group may simply use body weight or
percent body fat.
Trait evaluation is a controlled vocabulary to indicate how a trait was measured. If the raw or transformed scores for a trait are used directly it is a quantitative evaluation, if the raw scores are then lumped into a discrete set of categories (more than two) then the evaluation is semi-quantitative, if the trait is measured by binning into two categories (for example, "dead" and "not dead") then the evaluation is qualitative.
Heretibility is a measure of how much of the total variation in a trait is due to a genetic component. It will be rare for this information to be provided in a manuscript.
I use a linking table to connect phenotype and trait in this model so that the entire Phenotype record doesn't have to be duplicated for each trait.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| t_id | unique identifier for Trait | numeric | NULL |
| trait_name | name of the Trait | varchar | NULL |
| trait_eval | trait evaluation method | varchar | controlled vocab |
| heret | Heretibility | varchar | the estimate of the heretibility of a trait; it will be rare for this value to be reported |
Form
Top
#############
Mode describes whether the Phenotype is Induced, Spontaneous, or Hereditary. These divisions come from a review paper on QTL methodology written by Wayne Frankel (1995, TIG 11(12):471-477).
| Column | Comments | Element Type | Example |
|---|---|---|---|
| m_id | unique identifier for Mode | numeric | NULL |
| p_id | unique identifier for Phenotype | numeric | NULL |
| mode | mode name | varchar | controlled vocab |
| mode_notes | comment field | varchar | primarily for description of methods for "induced" traits |
Form
Top
################
Keyword is a type of synonymy table. I envision it as a means to link together studies that are about esentially the same complex phenotype but where the phenotypes are either slightly different or simply described differently. For example, a researcher that asks for all studies having to do with asthma would like to get back phenotypes like "bronchial hyperresponsiveness" in addition to those that have "asthma" explicitly in the complex trait name.
Because a Phenotype can have many Keywords I employ a linking table to join these two tables.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| kw_id | unique identifier for Keyword | numeric | NULL |
| keyword | the synonym or keyword for the Phenotype | varchar | see text description above |
Form
Top
###############
The Strain table is for linking information about the strain of the mice to the Phenotype. This is a skeleton table because the development of the vocabularies for these data types are under development by others in the MGI group.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| s_id | unique identifier for Strain | numeric | NULL |
| strain_name | name of the strain of mouse | varchar | BALB/c |
| s_notes | comment field | varchar | primarily for describing the phenotype of the strain |
Form
Top
Because multiple strains/progenitors can be associated with the genetic dissection of a Phenotype I used a linking table to connect the Strain and Phenotype tables.
############
Table: BIB
############
Top
#################
QTL Notes is for comments about a QTL paper that can't be formally coded into the data model structure. Often times these are points raised by the authors of a paper in the discussion.
Some examples of the kinds of data that are important to record, but which may not fit into any of the currently defined categories are:
| Column | Comments | Element Type | Example |
|---|---|---|---|
| jnum | conceptual link to BIB | numeric | NULL |
| qtl_var | comments | numeric | % phenotypic/genetic variation accounted for by all significant QTLs |
| qtl_notes | comments | varchar | NULL |
Form
Top
#######################
QTL Experiment is the initial link to the heart of a QTL paper. Each manuscript (i.e., jnum) can have multiple QTL Experiments associated with it. However there is a one to one relationship between QTL Experiment and the Progeny population used for the experiment.
So, if a manuscript describes the results for an F2 population from cross X and and F2 population from cross Y. Those would be two separate QTL Experiments associated with the same jnum.
To see an example of a single study with more than one QTL experiment, Click Here
| Column | Comments | Element Type | Example |
|---|---|---|---|
| qe_id | unique identifier for QTL_Experiment | numeric | NULL |
| prog_id | unique identifier for Progeny population | numeric | NULL |
| jnum | conceptual link to BIB | numeric | NULL |
| qe_name | name for QTL Experiment | varchar | QTL Experiment #1, QTL Experiment #2, etc. |
Top
################
The Progeny table is for information about the progeny population upon which a QTL experiment is based. There can be multiple Mapping Crosses associated with a single Progeny. But there can be only one QTL_Experiment linked to a Progeny.
The # of strains and # of animals fields are meant for the animals used in the mapping component of a QTL experiment, not for the total number of animals evaluated for the trait or traits used to measure a complex phenotype. Varchar element types are indicated for these fields because often the information is reported best in text form. For example, the # of animals is often described as "4 animals per strain" which conveys more information to the user than just reporting the total number.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| prog_id | unique identifier for Progeny | numeric | NULL |
| qe_id | unique identifier for QTL_Experiment | numeric | NULL |
| mc_id | unique identifier for Mapping Cross | numeric | NULL |
| prog_type | name of the progeny type | varchar | F2, backcross, N1, name of RI or RC set |
| *prog_name | name of the progeny population | varchar | B6D2F2 |
| # animals | no. of animals used for QTL mapping | varchar | NULL |
| # strains | no. of strains used (when RI or RC sets are used) for QTL mapping | varchar | NULL |
| prog_com | comment field | varchar | includes information such as "males only evaluated", or "equal numbers of males and females evaluated" |
Form
Top *sometimes a shorthand name is assigned to a specific progeny population, especially in cases where there is more than one progeny population described in a manuscript. The prog_name field is meant to handle these kinds of non-standard names applied to progeny populations.
Examples of: #####################
The Mapping Cross table provides links to the kinds of crosses used to generate the Progeny population. There can be multiple Mapping Crosses associated with a single progeny population. This generally is not the case, but the possibility exists. Representation of RI and RC sets is quite simple since controlled vocabs for those sets are in MGD already!
| Column | Comments | Element Type | Example |
|---|---|---|---|
| mc_id | unique identifier for MappingCross | numeric | NULL |
| prog_id | unique identifier for Progeny Population | numeric | NULL |
| fem_par | female parent | varchar | NULL |
| mal_par | male parent | varchar | NULL |
| cross_type | type of cross | varchar | controlled vocab |
| mc_notes | comment field | varchar | NULL |
Form
Top
(I didn't use a linking table is this case because it doesn't seem to be a very common phenomenon to have multiple mapping crosses associated with a single progeny population.)
##################
Marker_Map is for representing data about how the marker map for the QTL analysis was constructed. There can be multiple marker types used for a single progeny population. For example, a microsatellite map could be combined with a few specific RFLP markers and some observable phenotype markers. A consequence is that there can be multiple marker map "methods" associated with a progeny used in a QTL analysis.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| mm_id | unique identifier for Marker_Map | numeric | NULL |
| prog_id | unique identifier for Progeny population | numeric | NULL |
| coverage | the coverage will be relative to chromosome | varchar | controlled vocab |
| map_method | the lab method used for genotyping | varchar | controlled vocab |
| marker_num | the total number of polymorphic markers in the map | int | NULL |
| marker_interval | a text field for describing the average intermarker distance | varchar | NULL |
| map_notes | comment field | varchar | NULL |
Form
Top
Although there can be multiple marker map methods associated with a progeny, the majority of studies use a single method, microsatellites. Therefore, I have not used a linking table to connect the Marker_Map and Progeny tables.
######################
QTL_Detection is for representing the methods used to determine statistically significant associations between traits and markers. (The actual data are represented in the QTL_Marker table.) As with marker map methods, there can be multiple QTL detection methods used on a single progeny.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| qtld_id | unique identifier for QTL_Detection | numeric | NULL |
| prog_id | unique identifier for Progeny Population | numeric | NULL |
| t_id | unique identifier for Trait | numeric | NULL |
| qtl_id | unique identifier for Locus | numeric | NULL |
| qdetect_soft | name of the software used for the QTL detection method | varchar | controlled vocab |
| software_vers | version of the software used for QTL detection | float | use NULL if not reported in manuscript |
| qdetect_meth | the statistical method used for detecting QTLs | varchar | controlled vocab |
| threshold | the statistical cutoff used to determine significance of association | varchar | "LOD > 3.0", "P < 0.001" |
| detect_notes | comment field | varchar | NULL |
Form
Top Most of the time, a single QTL Detection method is used in a QTL Mapping study. Therefore, even though the possibility exists that there can be many QTL Detection methods associated with a Progeny I have not included a linking table between QTL_Detection and Progeny.
#######################
QTL_Estimation is for representing the methods used to estimate the location of QTLs.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| qtle_id | unique identifier for QTL_Estimation | numeric | NULL |
| prog_id | unique identifier for Progeny Population | numeric | NULL |
| qtl_id | unique identifier for Locus | numeric | NULL |
| t_id | unique identifier for Trait | numeric | NULL |
| qestimate_soft | name of the software used for the QTL estimation method | varchar | controlled vocab |
| software_vers | version of the software used for QTL estimation | float | use NULL if not reported in manuscript |
| qestimate_meth | the method used for estimating QTLs | varchar | controlled vocab |
| model_notes | comment field | varchar | used to describe the QTL estimation model assumptions |
| estimate_notes | comment field | varchar | used to describe aspects of the QTL estimation method that don't fit into any other field in the table |
Form
Top
##############
The Locus table is for providing basic information about the putative QTL. The trait id is in this table because the locus needs to be associated with a specific trait name when there are multiple traits described or measured for a phenotype.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| qtl_id | unique identifier for Locus | numeric | NULL |
| qm_id | unique identifier for markers most tightly associated with the QTL | numeric | NULL |
| t_id | unique identifier for Trait | numeric | NULL |
| loc_id | unique identifier for QTL Location | numeric | NULL |
| **qtld_id | unique identifier for QTL Detection | numeric | NULL |
| qtl_symbol | QTL symbol if assigned; unnamed if not assigned | varchar | curated |
| % pheno_var | maximum percent of phenotypic variation accounted for in the trait by the QTL | int | NULL |
| % gene_var | maximum percent of genetic variation accounted for in the trait by the QTL | int | NULL |
| qtl_effects | allele effects of a qtl | varchar | controlled vocab |
| cstrain | strain that contributes the allele described by the qtl_effects field | varchar | strain name |
| epi_effects | epistatic effects, if noted | varchar | Yes, No, NULL (not reported) |
| epi_loc | loci involved in epistatic interaction | varchar | NULL |
| epi_pval | probability value used for detecting epistatic interaction | varchar | NULL |
| pleio_effects | pleiotropic effects noted | varchar | Yes, No,NULL (if not reported) |
| pleio_com | description of pleiotropic effects | varchar | NULL |
| locus_notes | comment field | varchar | NULL |
Form
Top * ultimately this will be some kind of controlled vocabulary but for now it is a text field because the descriptions of the modes of gene action vary greatly from manuscript to manuscript. If i can create a robust controlled vocab it will likely be in 3 to 4 separate fields.
** this link is in here because the actual qtldetect_meth will need to be reported for Locus in those cases where more than one method is described in the paper.
To see an example of QT Locus data, Click Here
################
QTL_Marker is for representing the association scores between traits and map markers. In other words, the table is for storing the results of a QTL_Detection method. Links to the marker symbol, name, and chromosome location can probably be accessed through the existing MRK_Marker table in MGD.
Two fields for the probability value are provided for when a range of probabilities are reported instead of a single value. The same is true for the association score. In some cases the association scores are listed as a range.
While the QTL data model structure allows for any number of QTL_Markers to be described it is not practical to record every linkage reported in a manuscript. The suggested strategy is to record up to three of the most strongly associated (as measured by the probability values) markers with a particular QTL-trait combination.
The other table in this data model where marker-trait linkages are reported is the QTL_Location table. See the table description for more details.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| qm_id | unique identifier for QTL_Marker | numeric | NULL |
| qtld_id | unique identifier for QTL Detection | numeric | NULL |
| t_id | unique identifier for Trait | numeric | NULL |
| qtl_id | unique identifier for Locus | numeric | NULL |
| marker_key | conceptual link to MGD MRK_Marker Table | varchar | link to information about marker symbol, name and chromosome location |
| assoc_score1 | association score between marker and trait | float | NULL |
| assoc_score2 | association score between marker and trait | float | NULL |
| Pval1 | significance score | float | NULL |
| Pval2 | significance score | float | NULL |
| mtl_com | comment field | varchar | NULL |
Form
Top There usually will be many significant linkages to report as the outcome of a QTL_Detection event. The QTL_Marker and QTL_Detection tables are joined through a linking table.
#####################
QTL Location is for representing the predicted location for a QTL. This is a problem table because locality information is often very vague and the boundries of certainty are not clearly defined. For example, the flanking marker information is often not provided clearly so many of the locational data are purely text based descriptions.
Some Guidelines for Describing QTL Location:
Ideally there should be a way to translate the location information encoded in this table into a "minimap" that MGD currently uses as a graphical report on locus position. QTLs might require a slightly different graphical representation to accomodate the location uncertainty. Maybe some kind of shaded rectangle to show approximate boundries. Of course this is only possible where the data supports the location inference. I'm not sure how useful it would be to provide a graphic with the distal or proximal part of a chromosome entirely shaded.
Another idea for down the road is to have authors submit the graphical output for QTL estimation from MapMaker in a gif format. I bet it would be easy to pop these things up on the Web. (This was actually Wayne and Ben's idea. They thought we might be able to scan the MapMaker images in from the manuscript. But I thought that might be a nightmare of copyright entanglements, etc. However, if the journal editors knew that the authors were going to submit the figures to MGD from the time the manuscript was submitted for consideration of publication it might not be such a big deal.) The image_id placeholder in the proposed table structure is for remembering this idea.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| loc_id | unique identifier for Location | numeric | NULL |
| qtl_id | unique identifier for Locus | numeric | NULL |
| image_id | unique identifer for MapMaker image | graphic | gif file |
| support_interval | support interval used to define boundries of certainty for QTL location (not to be confused with a confidence interval!) | varchar | "1lod", "95%" |
| rep_lod_prox | reported name of proximal flanking marker for a support interval | varchar | name of marker reported in manuscript |
| replodprox_marker_key | conceptual link to MARKER Table in MGD | numeric | To get name, current symbol, location info for the reported marker |
| rep_lod_dist | reported name of distal flanking marker for a support interval | varchar | name of marker reported in manuscript |
| reploddist_marker_key | conceptual link to MARKER Table in MGD | numeric | To get name, current symbol, location info for the reported marker |
| rep_prox | name of proximal flanking marker for QTL region | varchar | NULL |
| repprox_marker_key | conceptual link to MARKER Table in MGD | numeric | To get name, current symbol, location info for the marker |
| rep_dist | name of distal flanking marker for QTL region | varchar | NULL |
| repdist_marker_key | conceptual link to MARKER Table in MGD | numeric | To get name, current symbol, location info for the marker |
| location_notes | comment field | varchar | used when specific marker boundries are not provided for QTL location; for example, "proximal half of chromosome 1." |
Form
Top
One of the main challenges here is how to use unspecific locality data, such as "the QTL falls in the proximal half of chromosome 1", to create a graphic image of the QTL region and to help describe all of the genes currently known to fall in the location.
#######################
CandidateGenes is for representing what the authors claim are possible candidate genes for their QTL of interest. This will be static data of course. It would be nice to have a pointer to the most current list of genes that have been mapped to a QTL region since the publication of a study.
The criteria for being included as a Candidate Gene includes some justification by the authors as to why a gene might be involved in the expression of a complex phenotype. A statistically significant linkage or association with a gene in the putative QTL region is not sufficient justification for inclusion as a Candidate Gene.
The symbol, name, and chromosomal location information for the Candidate Genes are accessed through the current MRK_Markers table in MGD.
| Column | Comments | Element Type | Example |
|---|---|---|---|
| marker_key | conceptual link to MRK_Markers table in MGD | varchar | link to information on Candidate Gene symbol, name, and chromosomal location |
| t_id | unique identifier for Trait | numeric | NULL |
| qtl_id | unique identifier for the QTL Locus | numeric | link to Locus table for QTL symbol and name |
| loc_id | unique identifier for the QTL Location | numeric | link to location information for a QTL Locus |
| reported_name | gene symbol as reported in manuscript | varchar | this symbol may be outdated because of gene nomenclature changes; including this field will allow for accurate legacy records for gene names used in the literature |
| gene_notes | comment field | varchar | NULL |
Form
Top
CandidateGenes and Location are joined by a linking table. There can be many candidate genes per QTL location.