Subsystems: the Notion
by Ross Overbeek
I believe that there is a need to understand precisely what is meant by the term subsystem.
The notion of metabolic reconstructions was originally
formulated by Evgeni Selkov. There was no precise definition of what
was meant, but the essential ideas he advanced have profoundly
effected what followed. Perhaps, the first step was made when several
systems started putting up metabolic maps and connecting genes to the
enzymatic roles. In most cases, the connections were forged using EC
numbers. The feeling was that it was adequate to group the genes
into small sets connected to an EC (in cases in which there were
multiple subunits or multiple copies of a single gene). It is clear
to me that these early efforts were hugely productive and laid the
foundation for what followed.
When we began the Project to Annotate 1000 Genomes (P1K), we
introduced the notion of subsystems.
Definition: a subsystem is a collection of functional roles. A
functional role is an atomic (undefined primitive) notion.
Loosely,
it corresponds to the abstract function performed by a single protein
sequence. For example,
- Holliday junction DNA helicase RuvB
- 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7)
- ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92)
- SSU ribosomal protein S1p
are all functional roles used in existing subsystems.
We tend to think of "binding" genes to one or more functional roles as
a central component of the annotation process. The functional role is
an abstract platonic concept; binding a gene to the functional role
amounts to an assertion that the given gene implements the designated
functional role.
<
Definition: a subsystem spreadsheet is a spreadsheet in which the
rows are genomes, the columns are functional roles, and each cell
includes IDs of genes/gene products that are believed to implement the
functional role in the specific genome.
The spreadsheet represents precisely which genes in which organisms
implement each of the functional roles.
Definition: a populated subsystem is a subsystem plus a subsystem
spreadsheet.
So, how does this differ from older systems like PUMA, WIT, KEGG,
ERGO, MetaCyc, and PUMA-2? There are two basic differences:
- The primary difference is the type of thing to which genes are
bound. In the case of these early systems, genes are bound to
reactions or enzymatic functions (this is not completely true anymore,
but in essence we believe that it still holds in most cases). In the
case of subsystems, genes are bound to functional roles. This gives
one more precision in the case of subsystems. In the case of a
complex, there will usually be distinct functional roles for each
subunit. In the case on nonenzymatic processes, one has complete
freedom in formulating the text string to represent the functional
role.
- The second primary difference relates to diagrams. In the case
of these earlier systems, diagrams played a central role. In effect,
genes were bound to objects in a diagram. In the case of subsystems,
there is normally an associated diagram, and that diagram usually
includes an object corresponding to a functional role; however, there
is no need to supply a diagram for the subsystem -- the thing to which
genes are bound is an abstract notion of functional role.
We believe that the notion of subsystem is a significant advance. By
providing the additional preciksion offered by functional roles, it
becomes possible to separate things like
- Threonine dehydratase, catabolic (EC 4.3.1.19) and
- Threonine dehydratase (EC 4.3.1.19)
Different subsystems might include either or both of these functional
roles.