Introduction
MeSHLinker is a tool that relates IGTC genes that have NCBI Entrez identifiers to Medical Subject Heading (MeSH; Coletti 2001) Disease descriptors using literature assignments through PubMed entries. Links between genes and MeSH descriptors are updated as part of the IGTC Pipeline. The results can be searched by gene name or MeSH descriptor, or browsed by traversing the MeSH hierarchy.
Descriptor Assignment
The National Library of Medicine oversees a process by which scientific literature is indexed with relevant descriptors from the MeSH controlled vocabulary. One descriptor category is Diseases. Links between this literature and NCBI Entrez genes are maintained in the Entrez database. MeSHLinker traverses this linkage path; every IGTC gene linked with a publication in Entrez is associated in MeSHLinker with all Disease descriptors indexed in that publication.
In addition, all MeSH descriptors have synonyms associated with them (denoted Terms in the MeSH vernacular). These synonyms expand the search functionality when looking for a particular descriptor, in some simple cases by pluralizing the descriptor, and in others by providing a true synonym ("AIDS" as a synonym for "Acquired Immune Deficiency Syndrome"). These synonyms likewise are associated with the IGTC gene and can be used to search MeSHLinker.
MeSHLinker Browser
Results can be viewed by choosing the MeSHLinker option in the IGTC Data Access menu. There are three ways to search MeSHLinker: searching by IGTC gene name, searching by MeSH descriptor or synonym name, and browsing the MeSH hierarchy.
Search by IGTC gene name:
This field allows the user to search for descriptors linked to a specific IGTC gene (the name is the MGI symbol assigned to the gene). This is similar to the functionality on the main IGTC page but uses MeSHLinker directly. The result is a list of descriptors with links to their individual pages.
Search by MeSH descriptor name:
This field allows the user to search for a particular disease (by descriptor or its synonym) and find genes referenced in the same publications as is the disease. A match for a synonym will return both the synonym and its descriptor. All descriptors link to their descriptor pages.
Browse the MeSH hierarchy:
All descriptor pages provide links to their child descriptors (subcategories) in the MeSH hierarchy, with the number of genes assigned to the subtree rooted at the child listed. In addition, the MeSHLinker home page displays the top level Disease descriptors as entryways for browsing the hierarchy.
Descriptor Page
The MeSHLinker browser is centered on descriptor pages. Each page contains information regarding the genes linked to the descriptor as well as its context in the MeSH hierarchy. A summary follows:
Lineage:
A list of all ancestors of this MeSH term is at the top of the page.
Descriptor information:
This lists the name of the descriptor, its tree number in the MeSH hierarchy, and the number of genes to which it is linked (this number does not include genes linked to its descendants in the hierarchy).
Child descriptor list:
This lists the descriptors that are children of the current descriptor, along with the number of genes assigned to the subtree rooted at each child. Each child is a link to its own descriptor page.
Assigned IGTC genes:
This is the heart of MeSHLinker, providing all genes associated with the descriptor linked a publication. Each assignment includes the name and description of the gene and the publication which provides the link. Genes are hyperlinked to the main IGTC site.
Limitations of MeSHLinker
Searching is limited by the MeSH ontology
We provide the ability to search for IGTC genes related to a particular disease. Our search algorithm uses the MeSH ontology to allow synonym searches as described above with the HIV example. However, if MeSH hasn't designated a synonym for a particular disease, a search run on that synonym will not return any results. For example, a search for "Myocardial Infarction" will return a list of genes annotated with that MeSH term, but a search for "Heart Attack" will return nothing, as this term has not been designated a synonym of this disease in the MeSH ontology.
The IGTC annotation procedure is limited by PubMed abstract curation
The method above describes the process by which MeSH descriptors are assigned to IGTC genes by using the curated PubMed literature. This method assumes that every gene linked to a PubMed abstract is relevant to that article. If a gene is only tangentially related to the article (such as in the case of large scale gene expression studies where potentially thousands of genes are linked to a single article), then any MeSH Disease descriptor assigned to the abstract may not be related to that gene in an informative way. An article may even show that a gene is not related to a disease, yet searching MeSHLinker for the disease will return that gene as a result. Thus, careful examination of the abstract that links the gene to the disease in MeSHLinker is required to understand the search results.
References:
Coletti, M.H. and Bleich, H.L. (2001) Medical subject headings used to search the biomedical literature. J. Am. Med. Inform. Assoc., 8, 317-323.