If you are on the QTL Quick Tour Click Here when you are done with this page!


QTL Data Model Description



Click within the circle or box to find out the details of each table in the QTL Data Model


Key to Interpreting the QTL Data Model Diagram:

  1. A single solid line indicates a one to one relationship
  2. Solid double lines indicates a many to one relationship
  3. A dashed single line indicates that the relationships is usually one to one, but can be many to one.

Click Here for An Introduction to the QTL Data Model Click Here for Phenotype Class Information Click Here for Strain Information Click Here for Keyword Information Click Here for Trait Information Click Here for Phenotype Mode Information Click Here for Phenotype Information Click Here for QTL_Experiment Information Click Here for Information about the Bibliography Table Click Here for Information on Progeny Click Here for Mapping_Cross Information Click Here for Marker_Map Information Click Here for QTL_Detection Information Click Here for QTL_Estimation Information Click Here for QTL_Marker Information Click Here for QT Locus Information Click Here for QTL Location Information Click Here for Candidate_Gene Information
Data Editor Form Layouts


Introduction to Proposed QTL Data Model

This is a description of a conceptual model for representing QTL data using a relational data model format. I have structured the information in the form of an entity relation diagram to help in the mapping of the QTL data types onto the existing MGD schema.

(Note that some of the links are probably not appropriate if taken literally. For example, I use jnum as the link to the reference in MGD. This is a conceptual link. It means that there should be a link to the MGD bibliographic tables, not that the link should be explicitly made using the J#. )

The model was designed to support basic queries such as:
***"How many QTLs are associated with Resistance/Susceptibility to viral infection?"
***"How many QTLs have been mapped to the region of chromosome 5 between markers D5Mityy and D5Mitxx?"
***"What genes have been mapped in the vicinity of QTLs associated with diabetes?"

Although a bit tricky, the data model can support some intra-specific comparisons.....(the tricky part is that QTL locations are often highly uncertain)

***"Among all of the studies of alcohol avidity, are any QTLs found in all or most of the studies?"

What this model does not do well is provide direct support for more complex queries that involve inter-specific comparisons

***"Are the QTL regions in mouse associated with bronchial hyperresponsiveness in regions of synteny with the human genome? If so, what genes mapped in humans in the syntenic region also have been mapped in mouse?"

Controlled vocabularies referenced throughout are accessible through my web page: http://www/~cjb. There also are several QTL reports based on my attempts to code QTL papers there. Sometimes looking at those reports can help clarify what I mean by certain fields in the conceptual model.

Carol Bult
April 1997

To browse examples of coded QTL-related papers,
Click Here and look at the "Real World QTL Data: QTL Reports".


#######################

Table: Phenotype_Class

#######################

Phenotype_Class represents a controlled vocabulary of broad categories into which phenotypes can be binned. The purpose of PhenotypeClass is to provide a phenotypically-based entrance point into the QTL data that is general enough to capture most of the studies of interest to the user and specific enough to be useful for grouping studies together.

Column Comments Element Type Example
pc_id unique identifier for Phenotype Class numeric NULL
p_id unique identifier for Phenotype numeric NULL
pc_name name of Phenotype_Class varchar controlled vocab
Form Top

###################

Table: Phenotype

###################

Phenotype is the name of the complex trait described in a manuscript. This is not meant to be a controlled vocabulary. The data editor will need to paraphrase the phenotype name from the title or abstract of the paper.

Column Comments Element Type Example
p_id unique identifier for Phenotype numeric NULL
jnum conceptual link to MGD BIB numeric NULL
m_id unique identifier for PhenotypeMode numeric NULL
pc_id unique identifier for PhenotypeClass numeric NULL
pheno_name name of the Phenotype varchar Multigenic obesity
p_notes comment field varchar For notes on treatment dosages and how traits were measured
Form Top

##############

Table: Trait

##############

Trait means the specific measured or observed attribute that is used to define the phenotype. For example, multigenic obesity is defined or measured by more than a dozen traits in one published study (see J24238). Therefore, each phenotype may have multiple traits. The traits used to measure a phenotype in one study are not necessarily the traits used to measure the "same" phenotype in another. Obesity is a good example again. One group may measure obesity as a suite of 16 traits. Another group may simply use body weight or percent body fat.

Trait evaluation is a controlled vocabulary to indicate how a trait was measured. If the raw or transformed scores for a trait are used directly it is a quantitative evaluation, if the raw scores are then lumped into a discrete set of categories (more than two) then the evaluation is semi-quantitative, if the trait is measured by binning into two categories (for example, "dead" and "not dead") then the evaluation is qualitative.

Heretibility is a measure of how much of the total variation in a trait is due to a genetic component. It will be rare for this information to be provided in a manuscript.

I use a linking table to connect phenotype and trait in this model so that the entire Phenotype record doesn't have to be duplicated for each trait.

Column Comments Element Type Example
t_id unique identifier for Trait numeric NULL
trait_name name of the Trait varchar NULL
trait_eval trait evaluation method varchar controlled vocab
heret Heretibility varchar the estimate of the heretibility of a trait; it will be rare for this value to be reported
Form Top

#############

Table: Mode

#############

Mode describes whether the Phenotype is Induced, Spontaneous, or Hereditary. These divisions come from a review paper on QTL methodology written by Wayne Frankel (1995, TIG 11(12):471-477).

Column Comments Element Type Example
m_id unique identifier for Mode numeric NULL
p_id unique identifier for Phenotype numeric NULL
mode mode name varchar controlled vocab
mode_notes comment field varchar primarily for description of methods for "induced" traits
Form Top

################

Table: KeyWord

################

Keyword is a type of synonymy table. I envision it as a means to link together studies that are about esentially the same complex phenotype but where the phenotypes are either slightly different or simply described differently. For example, a researcher that asks for all studies having to do with asthma would like to get back phenotypes like "bronchial hyperresponsiveness" in addition to those that have "asthma" explicitly in the complex trait name.

Because a Phenotype can have many Keywords I employ a linking table to join these two tables.

Column Comments Element Type Example
kw_id unique identifier for Keyword numeric NULL
keyword the synonym or keyword for the Phenotype varchar see text description above
Form Top

###############

Table: Strain

###############

The Strain table is for linking information about the strain of the mice to the Phenotype. This is a skeleton table because the development of the vocabularies for these data types are under development by others in the MGI group.

Column Comments Element Type Example
s_id unique identifier for Strain numeric NULL
strain_name name of the strain of mouse varchar BALB/c
s_notes comment field varchar primarily for describing the phenotype of the strain
Form Top

Because multiple strains/progenitors can be associated with the genetic dissection of a Phenotype I used a linking table to connect the Strain and Phenotype tables.


############

Table: BIB

############

BIB is the placeholder for the bibliographic data tables already in MGD.
Top

#################

Table: QTL_Notes

#################

QTL Notes is for comments about a QTL paper that can't be formally coded into the data model structure. Often times these are points raised by the authors of a paper in the discussion.

Some examples of the kinds of data that are important to record, but which may not fit into any of the currently defined categories are:

Column Comments Element Type Example
jnum conceptual link to BIB numeric NULL
qtl_var comments numeric % phenotypic/genetic variation accounted for by all significant QTLs
qtl_notes comments varchar NULL
Form Top

#######################

Table: QTL_Experiment

#######################

QTL Experiment is the initial link to the heart of a QTL paper. Each manuscript (i.e., jnum) can have multiple QTL Experiments associated with it. However there is a one to one relationship between QTL Experiment and the Progeny population used for the experiment.

So, if a manuscript describes the results for an F2 population from cross X and and F2 population from cross Y. Those would be two separate QTL Experiments associated with the same jnum.

To see an example of a single study with more than one QTL experiment, Click Here

Column Comments Element Type Example
qe_id unique identifier for QTL_Experiment numeric NULL
prog_id unique identifier for Progeny population numeric NULL
jnum conceptual link to BIB numeric NULL
qe_name name for QTL Experiment varchar QTL Experiment #1, QTL Experiment #2, etc.
Top

################

Table: Progeny

################

The Progeny table is for information about the progeny population upon which a QTL experiment is based. There can be multiple Mapping Crosses associated with a single Progeny. But there can be only one QTL_Experiment linked to a Progeny.

The # of strains and # of animals fields are meant for the animals used in the mapping component of a QTL experiment, not for the total number of animals evaluated for the trait or traits used to measure a complex phenotype. Varchar element types are indicated for these fields because often the information is reported best in text form. For example, the # of animals is often described as "4 animals per strain" which conveys more information to the user than just reporting the total number.

Column Comments Element Type Example
prog_id unique identifier for Progeny numeric NULL
qe_id unique identifier for QTL_Experiment numeric NULL
mc_id unique identifier for Mapping Cross numeric NULL
prog_type name of the progeny type varchar F2, backcross, N1, name of RI or RC set
*prog_name name of the progeny population varchar B6D2F2
# animals no. of animals used for QTL mapping varchar NULL
# strains no. of strains used (when RI or RC sets are used) for QTL mapping varchar NULL
prog_com comment field varchar includes information such as "males only evaluated", or "equal numbers of males and females evaluated"
Form Top

*sometimes a shorthand name is assigned to a specific progeny population, especially in cases where there is more than one progeny population described in a manuscript. The prog_name field is meant to handle these kinds of non-standard names applied to progeny populations.

Examples of:

#####################

Table: Mapping_Cross

#####################

The Mapping Cross table provides links to the kinds of crosses used to generate the Progeny population. There can be multiple Mapping Crosses associated with a single progeny population. This generally is not the case, but the possibility exists. Representation of RI and RC sets is quite simple since controlled vocabs for those sets are in MGD already!

Column Comments Element Type Example
mc_id unique identifier for MappingCross numeric NULL
prog_id unique identifier for Progeny Population numeric NULL
fem_par female parent varchar NULL
mal_par male parent varchar NULL
cross_type type of cross varchar controlled vocab
mc_notes comment field varchar NULL
Form Top

(I didn't use a linking table is this case because it doesn't seem to be a very common phenomenon to have multiple mapping crosses associated with a single progeny population.)


##################

Table: Marker_Map

##################

Marker_Map is for representing data about how the marker map for the QTL analysis was constructed. There can be multiple marker types used for a single progeny population. For example, a microsatellite map could be combined with a few specific RFLP markers and some observable phenotype markers. A consequence is that there can be multiple marker map "methods" associated with a progeny used in a QTL analysis.

Column Comments Element Type Example
mm_id unique identifier for Marker_Map numeric NULL
prog_id unique identifier for Progeny population numeric NULL
coverage the coverage will be relative to chromosome varchar controlled vocab
map_method the lab method used for genotyping varchar controlled vocab
marker_num the total number of polymorphic markers in the map int NULL
marker_interval a text field for describing the average intermarker distance varchar NULL
map_notes comment field varchar NULL
Form Top

Although there can be multiple marker map methods associated with a progeny, the majority of studies use a single method, microsatellites. Therefore, I have not used a linking table to connect the Marker_Map and Progeny tables.


######################

Table: QTL_Detection

######################

QTL_Detection is for representing the methods used to determine statistically significant associations between traits and markers. (The actual data are represented in the QTL_Marker table.) As with marker map methods, there can be multiple QTL detection methods used on a single progeny.

Column Comments Element Type Example
qtld_id unique identifier for QTL_Detection numeric NULL
prog_id unique identifier for Progeny Population numeric NULL
t_id unique identifier for Trait numeric NULL
qtl_id unique identifier for Locus numeric NULL
qdetect_soft name of the software used for the QTL detection method varchar controlled vocab
software_vers version of the software used for QTL detection float use NULL if not reported in manuscript
qdetect_meth the statistical method used for detecting QTLs varchar controlled vocab
threshold the statistical cutoff used to determine significance of association varchar "LOD > 3.0", "P < 0.001"
detect_notes comment field varchar NULL
Form Top

Most of the time, a single QTL Detection method is used in a QTL Mapping study. Therefore, even though the possibility exists that there can be many QTL Detection methods associated with a Progeny I have not included a linking table between QTL_Detection and Progeny.


#######################

Table: QTL_Estimation

#######################

QTL_Estimation is for representing the methods used to estimate the location of QTLs.

Column Comments Element Type Example
qtle_id unique identifier for QTL_Estimation numeric NULL
prog_id unique identifier for Progeny Population numeric NULL
qtl_id unique identifier for Locus numeric NULL
t_id unique identifier for Trait numeric NULL
qestimate_soft name of the software used for the QTL estimation method varchar controlled vocab
software_vers version of the software used for QTL estimation float use NULL if not reported in manuscript
qestimate_meth the method used for estimating QTLs varchar controlled vocab
model_notes comment field varchar used to describe the QTL estimation model assumptions
estimate_notes comment field varchar used to describe aspects of the QTL estimation method that don't fit into any other field in the table
Form Top

##############

Table: Locus

##############

The Locus table is for providing basic information about the putative QTL. The trait id is in this table because the locus needs to be associated with a specific trait name when there are multiple traits described or measured for a phenotype.

Column Comments Element Type Example
qtl_id unique identifier for Locus numeric NULL
qm_id unique identifier for markers most tightly associated with the QTL numeric NULL
t_id unique identifier for Trait numeric NULL
loc_id unique identifier for QTL Location numeric NULL
**qtld_id unique identifier for QTL Detection numeric NULL
qtl_symbol QTL symbol if assigned; unnamed if not assigned varchar curated
% pheno_var maximum percent of phenotypic variation accounted for in the trait by the QTL int NULL
% gene_var maximum percent of genetic variation accounted for in the trait by the QTL int NULL
qtl_effects allele effects of a qtl varchar controlled vocab
cstrain strain that contributes the allele described by the qtl_effects field varchar strain name
epi_effects epistatic effects, if noted varchar Yes, No, NULL (not reported)
epi_loc loci involved in epistatic interaction varchar NULL
epi_pval probability value used for detecting epistatic interaction varchar NULL
pleio_effects pleiotropic effects noted varchar Yes, No,NULL (if not reported)
pleio_com description of pleiotropic effects varchar NULL
locus_notes comment field varchar NULL
Form Top

* ultimately this will be some kind of controlled vocabulary but for now it is a text field because the descriptions of the modes of gene action vary greatly from manuscript to manuscript. If i can create a robust controlled vocab it will likely be in 3 to 4 separate fields.

** this link is in here because the actual qtldetect_meth will need to be reported for Locus in those cases where more than one method is described in the paper.

To see an example of QT Locus data, Click Here

################

Table: QTL_Marker

################

QTL_Marker is for representing the association scores between traits and map markers. In other words, the table is for storing the results of a QTL_Detection method. Links to the marker symbol, name, and chromosome location can probably be accessed through the existing MRK_Marker table in MGD.

Two fields for the probability value are provided for when a range of probabilities are reported instead of a single value. The same is true for the association score. In some cases the association scores are listed as a range.

A note on the practicality of coding QTL_Marker information

While the QTL data model structure allows for any number of QTL_Markers to be described it is not practical to record every linkage reported in a manuscript. The suggested strategy is to record up to three of the most strongly associated (as measured by the probability values) markers with a particular QTL-trait combination.

The other table in this data model where marker-trait linkages are reported is the QTL_Location table. See the table description for more details.

Column Comments Element Type Example
qm_id unique identifier for QTL_Marker numeric NULL
qtld_id unique identifier for QTL Detection numeric NULL
t_id unique identifier for Trait numeric NULL
qtl_id unique identifier for Locus numeric NULL
marker_key conceptual link to MGD MRK_Marker Table varchar link to information about marker symbol, name and chromosome location
assoc_score1 association score between marker and trait float NULL
assoc_score2 association score between marker and trait float NULL
Pval1 significance score float NULL
Pval2 significance score float NULL
mtl_com comment field varchar NULL
Form Top

There usually will be many significant linkages to report as the outcome of a QTL_Detection event. The QTL_Marker and QTL_Detection tables are joined through a linking table.


#####################

Table: QTL_Location

#####################

QTL Location is for representing the predicted location for a QTL. This is a problem table because locality information is often very vague and the boundries of certainty are not clearly defined. For example, the flanking marker information is often not provided clearly so many of the locational data are purely text based descriptions.

Some Guidelines for Describing QTL Location:

Ideally there should be a way to translate the location information encoded in this table into a "minimap" that MGD currently uses as a graphical report on locus position. QTLs might require a slightly different graphical representation to accomodate the location uncertainty. Maybe some kind of shaded rectangle to show approximate boundries. Of course this is only possible where the data supports the location inference. I'm not sure how useful it would be to provide a graphic with the distal or proximal part of a chromosome entirely shaded.

Another idea for down the road is to have authors submit the graphical output for QTL estimation from MapMaker in a gif format. I bet it would be easy to pop these things up on the Web. (This was actually Wayne and Ben's idea. They thought we might be able to scan the MapMaker images in from the manuscript. But I thought that might be a nightmare of copyright entanglements, etc. However, if the journal editors knew that the authors were going to submit the figures to MGD from the time the manuscript was submitted for consideration of publication it might not be such a big deal.) The image_id placeholder in the proposed table structure is for remembering this idea.

Column Comments Element Type Example
loc_id unique identifier for Location numeric NULL
qtl_id unique identifier for Locus numeric NULL
image_id unique identifer for MapMaker image graphic gif file
support_interval support interval used to define boundries of certainty for QTL location (not to be confused with a confidence interval!) varchar "1lod", "95%"
rep_lod_prox reported name of proximal flanking marker for a support interval varchar name of marker reported in manuscript
replodprox_marker_key conceptual link to MARKER Table in MGD numeric To get name, current symbol, location info for the reported marker
rep_lod_dist reported name of distal flanking marker for a support interval varchar name of marker reported in manuscript
reploddist_marker_key conceptual link to MARKER Table in MGD numeric To get name, current symbol, location info for the reported marker
rep_prox name of proximal flanking marker for QTL region varchar NULL
repprox_marker_key conceptual link to MARKER Table in MGD numeric To get name, current symbol, location info for the marker
rep_dist name of distal flanking marker for QTL region varchar NULL
repdist_marker_key conceptual link to MARKER Table in MGD numeric To get name, current symbol, location info for the marker
location_notes comment field varchar used when specific marker boundries are not provided for QTL location; for example, "proximal half of chromosome 1."
Form Top

One of the main challenges here is how to use unspecific locality data, such as "the QTL falls in the proximal half of chromosome 1", to create a graphic image of the QTL region and to help describe all of the genes currently known to fall in the location.


#######################

Table: Candidate_Genes

#######################

CandidateGenes is for representing what the authors claim are possible candidate genes for their QTL of interest. This will be static data of course. It would be nice to have a pointer to the most current list of genes that have been mapped to a QTL region since the publication of a study.

The criteria for being included as a Candidate Gene includes some justification by the authors as to why a gene might be involved in the expression of a complex phenotype. A statistically significant linkage or association with a gene in the putative QTL region is not sufficient justification for inclusion as a Candidate Gene.

The symbol, name, and chromosomal location information for the Candidate Genes are accessed through the current MRK_Markers table in MGD.

Column Comments Element Type Example
marker_key conceptual link to MRK_Markers table in MGD varchar link to information on Candidate Gene symbol, name, and chromosomal location
t_id unique identifier for Trait numeric NULL
qtl_id unique identifier for the QTL Locus numeric link to Locus table for QTL symbol and name
loc_id unique identifier for the QTL Location numeric link to location information for a QTL Locus
reported_name gene symbol as reported in manuscript varchar this symbol may be outdated because of gene nomenclature changes; including this field will allow for accurate legacy records for gene names used in the literature
gene_notes comment field varchar NULL
Form Top

CandidateGenes and Location are joined by a linking table. There can be many candidate genes per QTL location.