FIGfams

FIGfams are sets of protein sequences that are similar along the full length of the proteins. Proteins are thought of as implementing one or more abstract functional roles, and all of the members of a single FIGfam are believed to implement precisely the same set of functional roles. For version history and statistics click on the link.

The FIGfams are based on the subsystems view, in which the cell is composed of a set of functional subsystems, and each active variant of a subsystem is thought of as a set of functional roles. Proteins implement one or more functional roles. The shallow hierarchy imposed by subsystems is induced by grouping sets of functional roles.

The FIGfam effort may be thought of as constructing the infrastructure needed to automatically project the manual annotations maintained within the subsystem collection. The construction of FIGfams is based on forming protein-sets in cases in which it can reliably be asserted that sequences implement identical functions. Currently, there are three cases in which we place different sequences into the same protein-set: families constructed from subsystems, families constructed from closely related genomes, and families constructed by comparison of chromosomal context. The actual FIGfams are constructed by inferring which pairs of genes must be placed in the same FIGfam using a set of rules, then forming the set of FIGfams as the maximum set of protein-sets consistent with the pairwise constraints. A post-processing step checks for the rare case in which a resulting FIGfam contains two protein sequences, each of them contained in subsystems, having differing functions. Such a case is evidence of an error in the curation of the relevant subsystems and is corrected manually.

The following graphic depicts the gene context of a case in which protein sequences from closely-related Bacillus anthracis genomes belong to the same FIGfams because they share similar gene context and share the same subsystems.

Closely-related genomes
FIGfams

Enter FIGfam, Keyword or a sequence FIG id


Or scan a fasta sequence
against the FIGfams