The SEED: an Annotation/Analysis Tool Provided by FIG
[ Subsystem Forum | Essentiality Data | FIG Tutorials | Peer-to-peer Updates | (New) Clearinghouse | SEED Control Panel | NMPDR | SEED Wiki]
[GOLD | "Complete" Genomes in SEED | ExPASy | IMG | KEGG | NCBI | TIGR cmr | UniProt | Report "Bugz"]

SEED version cvs.1555556707 (3/17/2019 22:5:7) on

EGGS database: Essential Genes on Genome Scale

SEED maintains a database of microbial gene essentiality data experimentally obtained from published genome-scale gene essentiality screens (listed in Table 1). Comparative analysis of these data across multiple organisms in a rich genomic, biochemical, and phylogenetic contexts provided by the collection of annotated Subsystems greatly facilitates their interpretation and practical applications, such as, understanding of cellular networks, gene and pathway discovery, identification of novel drug targets, and strain engineering.

EGGS contents and structure

Gene essentiality data in SEED are integrated as gene attribute key-value pairs (see Help on Attributes). Each attribute Key corresponds to a single experimental data-set generated under uniform genetic and environmental conditions (briefly outlined in Table 1, follow the links to original publications for details). If two or more independent studies have been published for an organism (e.g., for E. coli, S. aureus, S. pneumoniae), several Attributes are associated with the corresponding genome in SEED. Note, that gene essentiality assertions ('Values') obtained for the same gene in different experiments ('Keys') may differ and even contradict each other. In addition, several derived Keys were generated by merging:

To facilitate comparative analysis of gene essentiality data in SEED, the original heterogeneous essentiality assignments have been converted to a unified format: 'essential' (E), 'nonessential' (N), with a default attribute 'undefined' (U) for all other genes. In several ambiguous cases an authors' notion of 'possibly essential gene' has been retained.

The notion of gene essentiality is meaningful only in the context of specific environmental and genetic conditions it was surveyed under (see Table 1). The specifics of technology used to generate each dataset influence gene essentiality assessments as well. The important distinction between the techniques is whether the growth of each mutant occurs clonally or in a mixed population. Although in both strategies gene 'essentiality' is deduced from the inability of a mutant cell to undergo a certain number of divisions, the passing threshold is much more stringent in mixed populations than in clonal studies. Thus, a mutant with substantially decreased fitness would be quickly selected against under the conditions of competitive outgrowth in planktonic culture, while it might still be capable of forming an isolated colony. In EGGS database E (essential gene) stands for 'essential for survival' for the datasets generated via clonal outgrowth and 'essential for fitness' for datasets generated via populational screens (Table 1).

How to use EGGS database

I. Visualization and analysis of essentiality data in Subsystem context: Essentiality data can be visualized in the biochemical and phylogenetic contexts of a Subsystem (SS) spreadsheet. This type of analysis performed across 134 metabolic Subsystems has been published by Current Opinion in Biotechnology.

To view essentiality assessments of genes in the context of SS spreadsheet click

II. Essentiality of individual genes can be viewed from a gene/protein (PEG) page. A link Attributes is available near the bottom of every PEG page. Activating this link opens a list of various attribute Keys associated with the gene or its protein product (see Help on Attributes), including gene essentiality. Column 'Key' lists all the experiments (gene essentiality datasets), in which this gene has been scored. Column 'Value' shows essentiality assessments (contradicting at times) made in each experiment. Please, note specific environmental conditions and experimental details that might have influenced each essentiality call, outlined in Table 1 and specified in the original publication.

III. To view a complete list of essential (E) or nonessential (N) genes, as well as all essentiality assignments (E, N, and U) produced in a specific experiment, open Table 1 and click on a number (corresponding to the experiment of interest) that appears in one of the columns: Essentiality assessment: ORFs total, E, N, or U. The resultant output table(s) can be sorted by any of the columns (by clicking on a heading) or searched by typing key words into a search field provided.

Table 1. Genome-scale experimentally determined bacterial gene essentiality data-sets available in SEED

back to top

OrganismSEED genome IDExperimentMutagenesisMutant outgrowthEssentiality assesmentReference
StrategyMutationStrategyEnvironmental conditionsORFs totalNEU
BS, EC, HI, HP, MG, MT, PA, SA, SP, STEssential_Gene_Sets_BacterialCombined nonredundant dataset, includes global gene essentiality data for 10 bacterial species (a single dataset per organism, labeled with a red star below)
M.genitalium243273.1*MG_essential_Hutchison_2006randominsertionclonesRich undefined medium SP4, 37°C, microaerobic growth in 5% CO24821003820[14]
S. aureus N315158879.1SA_essential_Jirandomantisense RNAclonesRich undefined medium TSA, aerobic growth2,600n/a1683n/a[2]
S. aureus N315158879.1SA_essential_Forsythrandomantisense RNAclonesRich undefined medium LB+0.2% glucose, 37°C, aerobic growth2,892n/a6584n/a[3]
S. aureus N315 *SA_essential_merged_Forsyth_and_JiA combined nonredundunt dataset derived from the data obtained in two similar global gene essentiality screens in S. aureus [2, 3]
H. influenzae Rd71421.1*HI_contribute_to_fitness_AkerleyrandominsertionpopulationRich undefined medium BHI, 37°C, aerobic growth1,657602670385[5]
S. pneumoniae R6171101.1SP_essential_ThanassitargetedinsertionclonesRich undefined medium Todd-Hewitt, 37°C, microaerobic growth in 5% CO22,043n/a11331,696[4]
S. pneumoniae R6171101.1SP_essential_SongtargeteddeletionclonesRich undefined medium Todd-Hewitt, 37°C, microaerobic growth in 5% CO22,04356013331,350[13]
S. pneumoniae R6171101.1*SP_essential_mergedA combined nonredundunt dataset derived from the data obtained in two similar global gene essentiality screens in S. pneumoniae [4, 13]
M. tuberculosis H37Rv83332.1*MT_contribute_to_fitness_RubinrandominsertionpopulationRich defined medium OADC3,9892,567614808[6]
B. subtilis 168224308.1*BS_essential_KobayashitargetedinsertionclonesRich undefined medium LB, 37°C, aerobic growth4,1053,830527154[7]
E. coli K-12 MG165583333.1EC_contribute_to_fitnessrandominsertionpopulationRich undefined medium LB, 37°C, aerobic growth4,3083,126620562[8]
E. coli K-12 MG165583333.1EC_essential_BlattnertargetedinsertionclonesRich undefined medium LB, 37°C, aerobic growth4,3082,001n/an/a[12]
E. coli K-12 BW2511383333.1*EC_essential_KeiotargeteddeletionclonesRich undefined medium LB, 37°C, aerobic growth4,3903,985303102[15]
P. aeruginosa PAO1208964.1PA_candidate_essential_Jacobs1randominsertionclonesRich undefined medium LB, room temp, aerobic growth5,5704,7837870[9]
P. aeruginosa PAO1208964.1*PA_essential_PA14_PAO1_Liberati2randominsertionclonesRich undefined medium LB, aerobic growth5,6884,4693352884[16]
S. typhimurium LT299287.1*ST_essential_KnuthrandominsertionclonesRich undefined medium LB, 30°C, aerobic growth4,425n/a2573n/a[10]
H. pylori G2785962.1*HP_candidate_essential_Salama1randominsertionpopulationRich undefined medium HB, 37°C, microaerobic growth in 10% CO21,5761,17834454[11]
The actual numbers of essential and nonessential genes in EGGS database might differ slightly from those published in each original study. These omissions are due to automatic gene IDs mapping, variances in ORF calling, and other potential mistakes and will be gradually corrected via manual curation.


back to top


back to top

Subsystems in SEED are developed and maintained by curators aiming to capture the current status of knowledge of specific biological processes (e.g. metabolic pathways or multipeptide complexes) in model species and to project this knowledge to other species via comparative genomics and metabolic reconstruction techniques (Overbeek et al., 2005). Populated subsystems are spreadsheets connecting relevant functional roles with annotated genes in hundreds of integrated genomes. Core metabolic subsystems often contain extensive notes and diagrams helping to understand topology and variations in subsystem implementation (functional variants) across a collection of diverse species. SEED Subsystem collection is available here. Examples of about 50 subsystems are available here and discussed in detail in (Overbeek et al., 2005).

Subsystem (SS) spreadsheet is used in SEED as a framework for integration of various types of data organized as gene attributes (including essentiality, gene clustering on a chromosome, virulence, microarray data, relevant publications, etc). Projection of experimentally determined essentiality assertions over a collection of subsystems in SEED opens new opportunities for data evaluation and functional interpretation: