International Gene Trap Consortium
Tutorials: Locating A Cell Line

Locating a gene trap cell line in a gene/locus of interest

This tutorial demonstrates how to locate a gene trap cell line in a gene/locus of interest. For information on the gene trapping process, please see our overview tutorial.

The best way to determine if the IGTC has trapped your gene or locus of interest is to use the BLAST search function to align your sequence to our database of trapped genes and cell line sequences. If you do not know the sequence of your gene, you may perform searches based on keywords or expression profile, or browse the contents of the IGTC database.


To begin a search, select Data Access in the menu bar at the top of the page. Scroll down to select the type of search you wish to perform.


BLAST Search

Sequence searching with BLAST is the best way to find matches in the IGTC database. The BLAST results will find all matches with significant sequence similarity, regardless of annotation.

To start your BLAST search, select the Blast Search option from the Data Access menu located on the menu bar. It will take you to the following BLAST form:

Enter your the sequence of your gene of interest in Fasta format and click on the Quick Search button. Multiple sequences can be searched simultaneously, with results appearing in the same order the sequences were entered in the search field.

You can change the settings of your BLAST search if you do not wish to use the default settings. You can turn on or off low complexity filtering, which prevents matches to regions with highly repetitive sequence. You can increase or decrease the maximum allowable expect value, a term inversely correlated to the significance of a match, with the smallest numbers indicating the best matches. To alter the presentation of the results, you can turn on or off the graphical overview, increase or decrease the number of descriptons and alignments shown, or change the alignment view. See the NCBI BLAST Handbook or FAQ page for further details. Once you have made the adjustments to the settings, click on the Search button.

The default BLAST search for the NM_146108 sequence yields the following result: (alignments not pictured)

The cell line RRS545 is the best IGTC match to NM_145108 (the gene of interest for this example). Record the cell line number. Search the IGTC for the cell line to get additional information from the cell line annotation page.


Browse the IGTC

You can browse the IGTC database for your gene(s) of interest. You can also browse through the available IGTC cell lines. Select Browse from the Data Access menu to start browsing the IGTC database.

  • Browsing by Gene

    To start browsing by gene, click the circle next to Gene. The form will then give you options to sort the display of the available trapped genes:

    You can sort the genes by

    • Gene Description - Alphabetical order by gene description according to MGI. If a gene description is not available from MGI, the gene description is retrieved from either Entrez Gene or Ensembl.
    • Gene Symbol - Alphabetical order by MGI gene symbol.
    • Chromosome - Ordered by chromosome number.

    You can also choose how many entries you want to show per page: 1,000, 5,000, or all entries. Once you have selected your parameters, click on the Browse button to start browsing.

  • Browse by Cell Line

    To start browsing by cell line, click the circle next to Cell Line. The form will then give you options to sort the display of the available IGTC cell lines:

    You can sort the cell lines by

    • Cell Line - Alphabetical order by cell line name.
    • Source - Alphabetical order by gene trap resource.
    • Chromosome - Ordered by chromosome number.
    • Gene Description - Alphabetical order by gene description according to MGI. If a gene description is not available from MGI, the gene description is retrieved from either Entrez Gene or Ensembl.
    • Gene Symbol- Alphabetical order by MGI gene symbol.
    • Status - Alphabetical by identification status (see below).

    You can also limit the browsing by the status of the cell line and the source of the cell line.

    To limit the cell lines by identification status, select one of the following status terms:

    • Localized+Transcipt - Shows all cell lines where a single genomic locus has been identified by direct genome localization of the sequence tag derived from the cell line and a transcript matching the cell line sequence was found
    • Localized Only - Shows all cell lines where a single genomic locus has been identified only by direct genome localization of the sequence tag derived from the cell line.
    • Transcript Only - Shows all cell lines where a single genomic locus has been identified only by genome localization of a full-length mRNA transcript related to the cell line.
    • Conflict - Shows all cell lines where the genomic loci identified by sequence tag localization and transcript localization do not overlap
    • Unlocalized - Shows all cell lines where there is no genomic locus found for either the cell line sequence or any transcript associated with the cell line sequence. Although there is no localization, there may be mRNA transcripts found for an Unlocalized cell line.

    To limit the browsing by gene trap resource, select one of the following genetrap resources from the menu:

    You can also choose how many entries you want to show per page: 1,000, 5,000, or all entries. Once you have selected your parameters, click on the Browse button to start browsing.


Search the IGTC

You can search the IGTC database for a particular gene or cell line. Select Keyword/ID Search from the Data Access menu to start searching the IGTC database.

  • Search by Gene

    To start searching by gene, click on the circle next to Gene. This will prompt the form to give you options to define the parameters for your search by gene:

    You can search for a gene by

    • Keyword - Any word, phrase, identifier, or number in any field of the database.
    • Accession - The NCBI accession number. (e.g. NM_008255 or AK005015)
    • Gene Description - A keyword search of the name or description of a gene. (e.g. 'ADP-ribosylarginine hydrolase' would be found with a full-name search or 'hydrolase', 'ADP', 'ribo', etc.)
    • Gene Symbol - The MGI gene symbol. (e.g. Adam23 or Csad)
    • MGI ID - The MGI gene identifier. (e.g. MGI:1345162)
    • Entrez ID - The NCBI Entrez gene identifier. (e.g. 23792)
    • Ensembl ID - The Ensembl gene identifier. (e.g. ENSMUSG00000025964)
    • Microarray - A keyword search can be performed against the names of the probe sets in the IGTC database. However, if you do not know the name of the probeset, it is better to use the 'search expression data' tool.
    • Phenotype - A keyword search of phenotype information for a gene. This information is retrieved from the Mouse Genome Informatics web site.
    • Gene Ontology - A keyword search of Gene Ontology terms associated with the gene. Further Gene Ontology information can be retrieved using the 'browse Biological Pathways or Gene Ontology categories' tool.

    Once you select how you would like to search for your gene, enter the search term in the box labeled "Search Term".

    You can limit your search by chromosome, by selecting a chromosome number in the Chromosome field.

    You can also sort your search by

    • Gene Description - Alphabetical order by gene description according to MGI. If a gene description is not available from MGI, the gene description is retrieved from either Entrez Gene or Ensembl.
    • Gene Symbol - Alphabetical order by MGI gene symbol.
    • Chromosome - Ordered by chromosome number.

    Choose how many entries you want to show per page: 1,000, 5,000, or all entries. Once you have selected your parameters, click on the Search button to start your gene search.

  • Search by Cell Line

    To start searching by cell line, click on the circle next to Cell Line. The form will give you options to define the parameters for your search by cell line:

    You can search for a cell line by

    • Cell Line - The cell line name (e.g. CMHD-GT_85A8-3 or KST021)
    • Gene Description - A keyword search of the name or description of a gene. (e.g. 'ADP-ribosylarginine hydrolase' would be found with a full-name search or 'hydrolase', 'ADP', 'ribo', etc.)
    • Gene Symbol - The MGI gene symbol (e.g. Adam23 or Csad)

    Once you select how you would like to search for your cell line, enter the search term in the box labeled "Search Term".

    You can limit your search by chromosome, by selecting a chromosome number in the Chromosome field.

    You can sort your search by

    • Cell Line - Alphabetical order by cell line name.
    • Source - Alphabetical order by gene trap resource.
    • Chromosome - Ordered by chromosome number.
    • Gene Description - Alphabetical order by gene description according to MGI. If a gene description is not available from MGI, the gene description is retrieved from either Entrez Gene or Ensembl.
    • Gene Symbol- Alphabetical order by MGI gene symbol.
    • Status - Alphabetical by identification status (see below).

    You can also limit the search by the status of the cell line and the source of the cell line.

    To limit the cell lines by identification status, select the one of the following status terms:

    • Localized+Transcipt - Shows all cell lines where a single genomic locus has been identified by direct genome localization of the sequence tag derived from the cell line and a transcript matching the cell line sequence was found
    • Localized Only - Shows all cell lines where a single genomic locus has been identified only by direct genome localization of the sequence tag derived from the cell line.
    • Transcript Only - Shows all cell lines where a single genomic locus has been identified only by genome localization of a full-length mRNA transcript related to the cell line.
    • Conflict - Shows all cell lines where the genomic loci identified by sequence tag localization and transcript localization do not overlap
    • Unlocalized - Shows all cell lines where there is no genomic locus found for either the cell line sequence or any transcript associated with the cell line sequence. Although there is no localization, there may be mRNA transcripts found for an Unlocalized cell line.

    To limit the browsing by gene trap resource, select one of the following genetrap resources from the menu:

    Choose how many entries you want to show per page: 1,000, 5,000, or all entries. Once you have selected your parameters, click on the Search button to start your cell line search.


Browse Biological Pathways or Gene Ontology categories

The IGTC, in collaboration with GenMAPP.org, has mapped gene trap data to sets of biological pathways and GO terms. You can browse the MAPPs (Map Annotator and Pathway Profiler) and Gene Ontology (Gene Ontology) Pathways by selecting Biological Pathways from the Data Access menu.

Once you select "Biological Pathways", you have a choice to view either MAPPs or Gene Ontology Pathways. Selecting MAPPs will allow you to view trapped genes in biological pathways. Selecting Go Pathways will allow you to view genes associated with GO terms.

When you look at the pathways, if a gene trap exists for a particular gene, the name of the cell line appears next to the gene. For example, in this portion of the Gene Ontology Term cell cycle MAPP, you can see the name of the gene trap (if available) appears next to the gene name. The colors indicate how many gene traps exist for that particular gene. See the legend for more detailed information.

Once you find the gene that you are interested in, click on the gene name to get more information. The gene annotation page will display information from MGI, SwissProt, Ensembl, Affymetrix, UniGene, RefSeq, Entrez Gene, and Gene Ontology. It will also have expression profile information. The Expression Profile section lists all the gene traps associated with the gene.

When you find the gene trap you would like to order, record the name. Search the IGTC for the cell line to get additional information from the cell line annotation page.


Search Expression Data

You can use the IGTC website to search for trapped genes that exhibit a desired expression profile in specific mouse tissues. The site supports two kinds of searches. The first is a search for genes that are upregulated/downregulated in a particular tissue. The second is a search for genes that have a specific expression level in a single tissue. Expression data is provided by the GNF SymAtlas project, and the search is designed to provide relative comparisons of expression levels, rather than analysis of statistical significance.

To begin a search of expression data, select Expression Search from the Data Access menu to start searching the IGTC database.

When the Search Form loads, in the Category field, select either 'Gene' or 'Cell Line'. A 'Gene' search will return genes associated with any expression data that meets your search criteria. Selecting 'Cell Line' will return cell lines that have been mapped to genes associated with expression data that meets your search criteria.

Next, choose the type of search you wish to perform.

Searching for upregulated/downregulated genes

Searching for genes that are upregulated/downregulated in a tissue allows the user to select the mouse tissue of interest, and the desired expression level for the trapped genes. The expression is calculated by comparing the expression level of each gene in the selected tissue with the median expression for the gene across all tissues. All genes that match the entered criteria are returned. By using this search, users can find trapped genes that are expressed in a tissue-specific manner.

To perform a search for upregulated/downregulated genes, select "Upregulated/downregulated" from the expression search page.

Searching by expression level

You can also search for all genes that match a desired level in a single tissue. This search compares gene expression levels in the tissue of interest to the median expression for all genes in the selected tissue. This search is useful for finding genes expressed or not expressed in a selected tissue. Again, the analysis is based on relative expression and is not meant to be statistically significant.

To perform a search by expression level, select "Expressed at a set level" from the search type menu.

Next, select a tissue to search. Keep in mind that, although the microarray data is extensive, not all genes have been tested in all tissues.

Then, choose an expression profile for which to search. In order to normalize across different microarray data sets, all expression levels are measured in comparison to the median expression level for a tissue or gene. Choosing 'Above' will return any genes expressed at a level above the number you input times the median level for the tissue or gene. Likewise, choosing 'Below' will return genes expressed at a level lower than your input number times the median. The input number can be any positive real number.

Finally, choose how you would like the results to be sorted, and select the number of results per page to be returned.

As an example, a search has been performed for genes that are upregulated or downregulated at 20 times the median level in dorsal root ganglia tissue, with the results sorted by gene description.

This search returned 13 genes in the IGTC database. The genes are organized by gene name or description, symbol and chromosome.

Viewing trapped genes that match the expression search criteria

Click on a gene name in the results, in the case of our example, "fibroblast growth factor 1". This will connect you to the IGTC Gene Annotation page for the appropriate gene.

Toggling the arrows beside the Affymetrix Probe Sets and GNF Probesets (or selecting "Show All" under Additional Information) will display the Affymetrix or Novartis microarray chips containing the selected gene.

Clicking on the probe set name will connect you to the Affymetrix or Novartis website associated with that gene. As an example, the probe set "gnf1m11999_at" has been chosen, with the resulting Novartis page shown below. Please see the Novartis Frequently Asked Questions page for details regarding the use of the site.


Browse Ensembl or UCSC Genome Browsers

You can also browse for a gene trap cell line using either the Ensembl or the UCSC Genome Browsers.

  • Ensembl

    Go to http://www.ensembl.org, the Ensembl web site. Select the mouse genome. Once you are on the Mouse Genome Server page, shown below, either select a chromosome to browse through or enter a specific location on your chromosome of interest. A region on chromosome 13 has been entered as an example.
    Please click on the image to see a larger, clearer image.

    To view IGTC gene traps on a region of the genome, scroll down to the Detailed view section, and select GeneTrap from the list of DAS sources as shown below.
    Please click on the image to see a larger, clearer image.

    After selecting GeneTrap from DAS sources, click on the "close menu" option at the bottom of the menu. The page will automatically reload and the GeneTrap track will appear at the top of the Detailed view section.

    Here is the same location as shown previously with the GeneTrap track visible:
    Please click on the image to see a larger, clearer image.

    In this example, there is one gene trap cell line available for this particular area of the genome. Click on that cell line (RRI328 in this example) and select "DAS LINK:Genetrap info" to get more information about the gene trap. It will take you to the annotation page of the IGTC resource that created the gene trap. You can also get more information on that gene trap by recording the cell line name and searching the IGTC for the cell line.

  • UCSC Genome Browser

    Go to http://genome.ucsc.edu, the UCSC Genome Browser web site. Select Custom Tracks from the side bar on the left. Currently, only the subset of IGTC sequence tags that originate from BayGenomics tags are represented. Once you are on the Custom Annotation Tracks page, scroll down to the Mouse Genome section, and look for the BayGenomics listing. Each chromosome is listed separately, but clicking on any of them will activate the BayGenomics custom track for the whole genome. As chromosome Y is the smallest, it will be the quickest to load. Pictured below is the UCSC mouse genome browser, showing the BayGenomics knockout sequence tags on the Y chromosome.

closet