What Is Meant by a SEED Function?

A functional role is a basic, undefined concept meaning roughly "an atomic role that could be played by a gene". Sorry.

A function or alternatively a gene function expresses the connection between a PEG and one or more functional roles. In addition, we have grafted on the ability to attach key-value pairs to a function; these are thought of as attributes of the gene.

In this document, we formulate a set of conventions that we will try to use consistently. In some cases, we have historically already violated these conventions, but not too seriously (we hope).

The Case of a Multi-domain, Multifunctional Gene

When a gene has a sequence of domains that play distinct functional roles, we express the function of the gene as
     Role1 / Role2 / ... / RoleN

That is, we use " / " as an operator separating distinct functional roles. Note that the slash must have spaces on both sides. This seems to be a common usage that we have noticed in both Swissprot and UniProt functions.

The Case of a Single Domain Playing Multiple Roles

In many cases we have a gene that has a broad specificity. and we think of the gene as multifunctional. In this case, we use
     Role1 @ Role2 @ ... @ RoleN

That is, we use " @ " as an operator separating distinct functional roles. Note that the "@" must have spaces on both sides.

The Case of an Ambiguous Function

Consider a gene that is believed to be either a malate dehydrogenase or a lactate dehydrogenase, or maybe both. In this case we need to express this gene is one of these functions, but I cannot disambiguate them. To express this, we use
     Role1; Role2; ...; RoleN

Note that we require a space following the semi-colon, but not preceeding it. The distinction between the use of " @ " and "; " is that with the former we are asserting the presence of all of the functional roles; with the semicolon we are simply saying that any of the functional roles may or may not be present (a much weaker notion).

Attaching Key-Value Pairs in the Expression of Function

Consider the case in which we have a gene that includes an intein. To be concrete, let's talk about
     Translation initiation factor B ! intein-containing

Using " ! " allows one to attach a key-value pair to the feature. When discussing boolean key-value pairs, we allow ommision of the value (thus, intein-containing is thought of as a key expressing a boolean condition; when it is used, it means "this gene is intein-containing". Technically, when the value is omitted, it defaults to 1. When you actually want to append a key-value pair with a value other than 1, use a " ^ " to separate the key and value. Thus,
     Translation initiation factor C ! intein-containing ^ 3

attaches the key-value pair (intein-containing,3) to the gene for which the function is being asserted. The SEED includes other mechanisms for attaching key-value pairs to features (adding key-value pairs to files in the Attributes subdirectory), but these are not easily used by annotators; they are for making "batch" assertions.

This mechanism is used solely as a convenience for inputting key-value pairs. When the function of the gene is displayed, it will not include the list of pairs. By convention, the protein page used for displaying genes gives the attached key-value pairs as a separate part of the display. These key-value pairs can be used to express attributes like essentiality, virulence, expression values, and so forth.

Technical Note: the assertion of key-value pairs in gene functions overrides the "batch sources". That is, a key has at most one value for a given PEG, and that value will come from the assigned_functions file, if present. Finally, the use of

     Role ! K ^

has the effect of removing the key-value pair with a key of K.