What Is Meant by a SEED Function?
A functional role is a basic, undefined concept meaning roughly
"an atomic role that could be played by a gene". Sorry.
A function or alternatively a gene function expresses
the connection between a PEG and one or more functional roles. In
addition, we have grafted on the ability to attach key-value pairs to
a function; these are thought of as attributes of the gene.
In this document, we formulate a set of conventions that we will try
to use consistently. In some cases, we have historically already
violated these conventions, but not too seriously (we hope).
The Case of a Multi-domain, Multifunctional Gene
When a gene has a sequence of domains that play distinct functional
roles, we express the function of the gene as
Role1 / Role2 / ... / RoleN
That is, we use " / " as an operator separating distinct functional
roles. Note that the slash must have spaces on both sides. This
seems to be a common usage that we have noticed in both Swissprot and
UniProt functions.
The Case of a Single Domain Playing Multiple Roles
In many cases we have a gene that has a broad specificity. and we
think of the gene as multifunctional. In this case, we use
Role1 @ Role2 @ ... @ RoleN
That is, we use " @ " as an operator separating distinct functional
roles. Note that the "@" must have spaces on both sides.
The Case of an Ambiguous Function
Consider a gene that is believed to be either a malate dehydrogenase
or a lactate dehydrogenase, or maybe both. In this case we need to
express this gene is one of these functions, but I cannot
disambiguate them. To express this, we use
Role1; Role2; ...; RoleN
Note that we require a space following the semi-colon, but not preceeding
it. The distinction between the use of " @ " and "; " is that with
the former we are asserting the presence of all of the functional
roles; with the semicolon we are simply saying that any of the
functional roles may or may not be present (a much weaker notion).
Attaching Key-Value Pairs in the Expression of Function
Consider the case in which we have a gene that includes an intein. To
be concrete, let's talk about
Translation initiation factor B ! intein-containing
Using " ! " allows one to attach a key-value pair to the feature.
When discussing boolean key-value pairs, we allow ommision of the
value (thus, intein-containing is thought of as a key
expressing a boolean condition; when it is used, it means "this gene
is intein-containing". Technically, when the value is omitted,
it defaults to 1.
When you actually want to append a key-value pair with a value other
than 1, use a " ^ " to separate the key and value. Thus,
Translation initiation factor C ! intein-containing ^ 3
attaches the key-value pair (intein-containing,3) to the gene
for which the function is being asserted. The SEED includes other
mechanisms for attaching key-value pairs to features (adding key-value
pairs to files in the Attributes subdirectory), but these are not easily
used by annotators; they are for making "batch" assertions.
This mechanism is used solely as a convenience for inputting key-value
pairs. When the function of the gene is displayed, it will not
include the list of pairs. By convention, the protein page
used for displaying genes gives the attached key-value pairs as a
separate part of the display. These key-value pairs can be used to
express attributes like essentiality, virulence, expression values,
and so forth.
Technical Note:
the assertion of key-value pairs in gene functions overrides the
"batch sources". That is, a key has at most one value for a given
PEG, and that value will come from the assigned_functions file, if
present. Finally, the use of
Role ! K ^
has the effect of removing the key-value pair with a key of K.