The SEED: an Annotation/Analysis Tool Provided by FIG
[ Subsystem Forum | Essentiality Data | FIG Tutorials | Peer-to-peer Updates | (New) Clearinghouse | SEED Control Panel | NMPDR | SEED Wiki] [GOLD | "Complete" Genomes in SEED | ExPASy | IMG | KEGG | NCBI | TIGR cmr | UniProt | Report "Bugz"]
|
![]() |
Gene essentiality data in SEED are integrated as gene attribute key-value pairs (see Help on Attributes). Each attribute Key corresponds to a single experimental data-set generated under uniform genetic and environmental conditions (briefly outlined in Table 1, follow the links to original publications for details). If two or more independent studies have been published for an organism (e.g., for E. coli, S. aureus, S. pneumoniae), several Attributes are associated with the corresponding genome in SEED. Note, that gene essentiality assertions ('Values') obtained for the same gene in different experiments ('Keys') may differ and even contradict each other. In addition, several derived Keys were generated by merging:
To facilitate comparative analysis of gene essentiality data in SEED, the original heterogeneous essentiality assignments have been converted to a unified format: 'essential' (E), 'nonessential' (N), with a default attribute 'undefined' (U) for all other genes. In several ambiguous cases an authors' notion of 'possibly essential gene' has been retained.
The notion of gene essentiality is meaningful only in the context of specific environmental and genetic conditions it was surveyed under (see Table 1). The specifics of technology used to generate each dataset influence gene essentiality assessments as well. The important distinction between the techniques is whether the growth of each mutant occurs clonally or in a mixed population. Although in both strategies gene 'essentiality' is deduced from the inability of a mutant cell to undergo a certain number of divisions, the passing threshold is much more stringent in mixed populations than in clonal studies. Thus, a mutant with substantially decreased fitness would be quickly selected against under the conditions of competitive outgrowth in planktonic culture, while it might still be capable of forming an isolated colony. In EGGS database E (essential gene) stands for 'essential for survival' for the datasets generated via clonal outgrowth and 'essential for fitness' for datasets generated via populational screens (Table 1).
I. Visualization and analysis of essentiality data in Subsystem context: Essentiality data can be visualized in the biochemical and phylogenetic contexts of a Subsystem (SS) spreadsheet. This type of analysis performed across 134 metabolic Subsystems has been published by Current Opinion in Biotechnology.
To view essentiality assessments of genes in the context of SS spreadsheet click
II. Essentiality of individual genes can be viewed from a gene/protein (PEG) page. A link Attributes is available near the bottom of every PEG page. Activating this link opens a list of various attribute Keys associated with the gene or its protein product (see Help on Attributes), including gene essentiality. Column 'Key' lists all the experiments (gene essentiality datasets), in which this gene has been scored. Column 'Value' shows essentiality assessments (contradicting at times) made in each experiment. Please, note specific environmental conditions and experimental details that might have influenced each essentiality call, outlined in Table 1 and specified in the original publication.
III. To view a complete list of essential (E) or nonessential (N) genes, as well as all essentiality assignments (E, N, and U) produced in a specific experiment, open Table 1 and click on a number (corresponding to the experiment of interest) that appears in one of the columns: Essentiality assessment: ORFs total, E, N, or U. The resultant output table(s) can be sorted by any of the columns (by clicking on a heading) or searched by typing key words into a search field provided.
Organism | SEED genome ID | Experiment | Mutagenesis | Mutant outgrowth | Essentiality assesment | Reference | |||||
Strategy | Mutation | Strategy | Environmental conditions | ORFs total | N | E | U | ||||
BS, EC, HI, HP, MG, MT, PA, SA, SP, ST | Essential_Gene_Sets_Bacterial | Combined nonredundant dataset, includes global gene essentiality data for 10 bacterial species (a single dataset per organism, labeled with a red star below) | |||||||||
M.genitalium | 243273.1 | *MG_essential_Hutchison_2006 | random | insertion | clones | Rich undefined medium SP4, 37°C, microaerobic growth in 5% CO2 | 482 | 100 | 382 | 0 | [14] |
S. aureus N315 | 158879.1 | SA_essential_Ji | random | antisense RNA | clones | Rich undefined medium TSA, aerobic growth | 2,600 | n/a | 1683 | n/a | [2] |
S. aureus N315 | 158879.1 | SA_essential_Forsyth | random | antisense RNA | clones | Rich undefined medium LB+0.2% glucose, 37°C, aerobic growth | 2,892 | n/a | 6584 | n/a | [3] |
S. aureus N315 | *SA_essential_merged_Forsyth_and_Ji | A combined nonredundunt dataset derived from the data obtained in two similar global gene essentiality screens in S. aureus [2, 3] | |||||||||
H. influenzae Rd | 71421.1 | *HI_contribute_to_fitness_Akerley | random | insertion | population | Rich undefined medium BHI, 37°C, aerobic growth | 1,657 | 602 | 670 | 385 | [5] |
S. pneumoniae R6 | 171101.1 | SP_essential_Thanassi | targeted | insertion | clones | Rich undefined medium Todd-Hewitt, 37°C, microaerobic growth in 5% CO2 | 2,043 | n/a | 1133 | 1,696 | [4] |
S. pneumoniae R6 | 171101.1 | SP_essential_Song | targeted | deletion | clones | Rich undefined medium Todd-Hewitt, 37°C, microaerobic growth in 5% CO2 | 2,043 | 560 | 1333 | 1,350 | [13] |
S. pneumoniae R6 | 171101.1 | *SP_essential_merged | A combined nonredundunt dataset derived from the data obtained in two similar global gene essentiality screens in S. pneumoniae [4, 13] | ||||||||
M. tuberculosis H37Rv | 83332.1 | *MT_contribute_to_fitness_Rubin | random | insertion | population | Rich defined medium OADC | 3,989 | 2,567 | 614 | 808 | [6] |
B. subtilis 168 | 224308.1 | *BS_essential_Kobayashi | targeted | insertion | clones | Rich undefined medium LB, 37°C, aerobic growth | 4,105 | 3,8305 | 2715 | 4 | [7] |
E. coli K-12 MG1655 | 83333.1 | EC_contribute_to_fitness | random | insertion | population | Rich undefined medium LB, 37°C, aerobic growth | 4,308 | 3,126 | 620 | 562 | [8] |
E. coli K-12 MG1655 | 83333.1 | EC_essential_Blattner | targeted | insertion | clones | Rich undefined medium LB, 37°C, aerobic growth | 4,308 | 2,001 | n/a | n/a | [12] |
E. coli K-12 BW25113 | 83333.1 | *EC_essential_Keio | targeted | deletion | clones | Rich undefined medium LB, 37°C, aerobic growth | 4,390 | 3,985 | 303 | 102 | [15] |
P. aeruginosa PAO1 | 208964.1 | PA_candidate_essential_Jacobs1 | random | insertion | clones | Rich undefined medium LB, room temp, aerobic growth | 5,570 | 4,783 | 787 | 0 | [9] |
P. aeruginosa PAO1 | 208964.1 | *PA_essential_PA14_PAO1_Liberati2 | random | insertion | clones | Rich undefined medium LB, aerobic growth | 5,688 | 4,469 | 3352 | 884 | [16] |
S. typhimurium LT2 | 99287.1 | *ST_essential_Knuth | random | insertion | clones | Rich undefined medium LB, 30°C, aerobic growth | 4,425 | n/a | 2573 | n/a | [10] |
H. pylori G27 | 85962.1 | *HP_candidate_essential_Salama1 | random | insertion | population | Rich undefined medium HB, 37°C, microaerobic growth in 10% CO2 | 1,576 | 1,178 | 344 | 54 | [11] |
Subsystems in SEED are developed and maintained by curators aiming to capture the current status of knowledge of specific biological processes (e.g. metabolic pathways or multipeptide complexes) in model species and to project this knowledge to other species via comparative genomics and metabolic reconstruction techniques (Overbeek et al., 2005). Populated subsystems are spreadsheets connecting relevant functional roles with annotated genes in hundreds of integrated genomes. Core metabolic subsystems often contain extensive notes and diagrams helping to understand topology and variations in subsystem implementation (functional variants) across a collection of diverse species. SEED Subsystem collection is available here. Examples of about 50 subsystems are available here and discussed in detail in (Overbeek et al., 2005).
Subsystem (SS) spreadsheet is used in SEED as a framework for integration of various types of data organized as gene attributes (including essentiality, gene clustering on a chromosome, virulence, microarray data, relevant publications, etc). Projection of experimentally determined essentiality assertions over a collection of subsystems in SEED opens new opportunities for data evaluation and functional interpretation: