Subsystems: the Notion

by Ross Overbeek

I believe that there is a need to understand precisely what is meant by the term subsystem. The notion of metabolic reconstructions was originally formulated by Evgeni Selkov. There was no precise definition of what was meant, but the essential ideas he advanced have profoundly effected what followed. Perhaps, the first step was made when several systems started putting up metabolic maps and connecting genes to the enzymatic roles. In most cases, the connections were forged using EC numbers. The feeling was that it was adequate to group the genes into small sets connected to an EC (in cases in which there were multiple subunits or multiple copies of a single gene). It is clear to me that these early efforts were hugely productive and laid the foundation for what followed.

When we began the Project to Annotate 1000 Genomes (P1K), we introduced the notion of subsystems.

Definition: a subsystem is a collection of functional roles. A functional role is an atomic (undefined primitive) notion.

Loosely, it corresponds to the abstract function performed by a single protein sequence. For example,

are all functional roles used in existing subsystems.

We tend to think of "binding" genes to one or more functional roles as a central component of the annotation process. The functional role is an abstract platonic concept; binding a gene to the functional role amounts to an assertion that the given gene implements the designated functional role. <

Definition: a subsystem spreadsheet is a spreadsheet in which the rows are genomes, the columns are functional roles, and each cell includes IDs of genes/gene products that are believed to implement the functional role in the specific genome.

The spreadsheet represents precisely which genes in which organisms implement each of the functional roles.

Definition: a populated subsystem is a subsystem plus a subsystem spreadsheet.

So, how does this differ from older systems like PUMA, WIT, KEGG, ERGO, MetaCyc, and PUMA-2? There are two basic differences:
  1. The primary difference is the type of thing to which genes are bound. In the case of these early systems, genes are bound to reactions or enzymatic functions (this is not completely true anymore, but in essence we believe that it still holds in most cases). In the case of subsystems, genes are bound to functional roles. This gives one more precision in the case of subsystems. In the case of a complex, there will usually be distinct functional roles for each subunit. In the case on nonenzymatic processes, one has complete freedom in formulating the text string to represent the functional role.
  2. The second primary difference relates to diagrams. In the case of these earlier systems, diagrams played a central role. In effect, genes were bound to objects in a diagram. In the case of subsystems, there is normally an associated diagram, and that diagram usually includes an object corresponding to a functional role; however, there is no need to supply a diagram for the subsystem -- the thing to which genes are bound is an abstract notion of functional role.


We believe that the notion of subsystem is a significant advance. By providing the additional preciksion offered by functional roles, it becomes possible to separate things like Different subsystems might include either or both of these functional roles.