Ambiguity and the Choice of Functional Roles

I am writing this to try to capture the essence of a discussion that occurred between Carol Bonner, Roy Jensen, Andrei Osterman, myself (Ross Overbeek), and Veronika Vonstein as we try to start a subsystem to capture the work of Carol and Roy relating to TyrA homologs. The discussion is worth capturing, since the problem being exposed re-occurs frequently, and other annotators have faced these same issues with somewhat inconsistent responses. The central issues relating to how to capture specificity, cofactors and uncertainty are coming up in many cases, and we should seek a more-or-less consistent policy on how to handle them. We use the TyrA example just because it exposed these issues so wonderfully.

The Overall Set of Reactions

As in many cases, we began by trying to describe the scope of the subsystem. I argued for embedding the discussion of TyrA within a more general discussion if Phenylalanine, Tyrosine and p-aminophenylpyruvate synthesis, since the TyrA homologs are embedded within these three biosynthesis pathways. So, let me start with trying to layout the relevant reactions. For now, I am going to ignore p-aminophenylpyruvate synthesis and focus on just phenylalanine and tyrosine synthesis; we can add the third pathway, if we can get these two done correctly. I believe that the relevant reacions are as follows:

Two Paths to Phenylalanine
To KEGGReactionCatalyzed By
R01373Prephenate => Phenylpyruvate + H2O + CO2Prephenate dehydratase (EC 4.2.1.51)
R00688Phenylpyruvate + NH3 + NADH => L-Phenylalanine + H2O + NAD+Phenylalanine dehydrogenase (EC 1.4.1.20)
R00694Phenylpyruvate + L-Glutamate <=> L-Phenylalanine + 2-OxoglutarateL-Phenylalanine:2-oxoglutarate aminotransferase
***
R01731L-Aspartate + Prephenate <=> Oxaloacetate + L-ArogenateAromatic-amino-acid aminotransferase (EC 2.6.1.57)
R03120L-glutamate + Prephenate <=> 2-Oxoglutarate + L-ArogenateAromatic-amino-acid aminotransferase (EC 2.6.1.57)
R00691L-Arogenate => L-Phenylalanine + H2O + CO2Arogenate dehydratase (EC 4.2.1.91)


Two Paths to Tyrosine
To KEGGReactionCatalyzed By
R01731L-Aspartate + Prephenate <=> Oxaloacetate + L-ArogenateAromatic-amino-acid aminotransferase (EC 2.6.1.57)
R00732L-Arogenate + NAD+ => L-Tyrosine + CO2 + NADHArogenate dehydrogenase, NAD specific (EC 1.3.1.43)
R00733L-Arogenate + NADP+ => L-Tyrosine + CO2 + NADPHArogenate dehydrogenase, NADP specific (EC 1.3.1.43)
***
R01728Prephenate + NAD+ => 4-hydroxyphenylpyruvate + CO2 + NADH + H+Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)
R01730Prephenate + NADP+ => 4-hydroxyphenylpyruvate + CO2 + NADPH + H+Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
R007344-hydroxyphenylpyruvate + L-Glutamate <=> L-Tyrosine + 2-OxoglutarateTyrosine aminotransferase (EC 2.6.1.5)

The TyrA Aspect of the Problem

The proteins that catalyze reactions R00732, R00733, R01728, and R01730 are extremely hard to disambiguate. Further, many single proteins catalyze several of these reactions. This leads to two distinct approaches to choosing functional roles for the subsystem.


The following table illustrates the first approach:

GeneFunctional Role
NADTyrAaArogenate + NAD specific dehydrogenase (EC 1.3.1.12)
NADPTyrAaArogenate + NADP specific dehydrogenase (EC 1.3.1.12)
NAD(P)TyrAaArogenate specific + NAD(P) dehydrogenase (EC 1.3.1.12)
xTyrAaArogenate specific + NAD(P) unknown specificity dehydrogenase
NADTyrAcCyclohexadienyl broad specificity + NAD dehydrogenase (EC 1.3.1.12)
NADPTyrAcCyclohexadienyl broad specificity + NADP dehydrogenase (EC 1.3.1.12)
NAD(P)TyrAcCyclohexadienyl broad specificity + NAD(P) dehydrogenase(EC 1.3.1.12)
xTyrAcCyclohexadienyl broad specificity + NAD(P) unknown specificity dehydrogenase(EC 1.3.1.12)
NADTyrApPrephenate + NAD specific dehydrogenase (EC 1.3.1.12)
NADPTyrApPrephenate + NADP specific dehydrogenase (EC 1.3.1.12)
NAD(P)TyrApPrephenate specific + NAD(P) dehydrogenase (EC 1.3.1.12)
xTyrApPrephenate specific + NAD(P) unknown specificity dehydrogenase (EC 1.3.1.12)
NADTyrAxSubstrate specificity unknown + NAD dehydrogenase (EC 1.3.1.12)
NADPTyrAxSubstrate specificity unknown + NADP dehydrogenase (EC 1.3.1.12)
NAD(P)TyrAxSubstrate specificity unknown + NAD(P) dehydrogenase (EC 1.3.1.12)
xTyrAxSubstrate specificity unknown + NAD(P) specificity unknown dehydrogenase (EC 1.3.1.12)


Using this approach, every gene has a function that connects to a single functional role that conveys the level of ambiguity in specificity and current knowledge.


The second approach utilizes the "/", "@", and ";" connectives when specifying the gene function and includes only four functional roles:

Functional Role
Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)
Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)
Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)
Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)


Under this approach one would makes assignments of the form as shown in the following table:

GeneGene Function
NADTyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43)
NADPTyrAaArogenate dehydrogenase, NADP specific (EC 1.3.1.43)
NAD(P)TyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)
xTyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)
NADTyrAcArogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)
NADPTyrAcArogenate dehydrogenase, NADP specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
NAD(P)TyrAcArogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
xTyrAccannot be expressed
NADTyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13)
NADPTyrApPrephenate dehydrogenase, NADP specific (EC 1.3.1.13)
NAD(P)TyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
xTyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
NADTyrAxArogenate dehydrogenase, NAD specific (EC 1.3.1.43); Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)
NADPTyrAxArogenate dehydrogenase, NADP specific (EC 1.3.1.43); Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
NAD(P)TyrAxcannot be expressed
xTyrAxArogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)



I find the above table a little unsettling. Since this is the approach that I advocated, I find it doubly unsettling. When one adds the fact that we have a similar situation with the aminotransferases, the difficulties involved in specifying the roles of the whole subsystem (in a way that would be transparent and make obvious sense to most biologists) become pretty forbidding.

I also feel that I need to add a few comments made by Roy in order to set the stage properly for this discussion:


The Hybrid Alternative

One proposal that has arisen, based on things a number of annnotators have been doing to avoid some of the difficulties, would go as follows:

Thus, we would use the following functional roles:

Functional Role
Arogenate dehydrogenase, NAD specific (EC 1.3.1.43)
Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)
Prephenate dehydrogenase, NAD specific (EC 1.3.1.13)
Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity


This leads to the following ways to express gene function:


GeneGene Function
NADTyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43)
NADPTyrAaArogenate dehydrogenase, NADP specific (EC 1.3.1.43)
NAD(P)TyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43) @ Arogenate dehydrogenase, NADP specific (EC 1.3.1.43)
xTyrAaArogenate dehydrogenase, NAD specific (EC 1.3.1.43) ; Arogenate dehydrogenase, NADP specific (EC 1.3.1.43); Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NADTyrAcCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NADPTyrAcCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NAD(P)TyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
xTyrAcCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NADTyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13)
NADPTyrApPrephenate dehydrogenase, NADP specific (EC 1.3.1.13)
NAD(P)TyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13) @ Prephenate dehydrogenase, NADP specific (EC 1.3.1.13)
xTyrApPrephenate dehydrogenase, NAD specific (EC 1.3.1.13) ; Prephenate dehydrogenase, NADP specific (EC 1.3.1.13); Cyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NADTyrAxCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NADPTyrAxCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
NAD(P)TyrAxCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity
xTyrAxCyclohexadienyl dehydrogenase of unknown substrate and cofactor specificity


These are not completely satisfactory, but I believe they are pretty close to what we want. In a few cases, information is lost. For these very few, I believe that the curator can maintain external records until things clarify.

Anyone reading the spreadsheet can get an accurate grasp of what is known with high probability and what issues remain murky.

The Issue of How Much Detail to Include in Functional Roles

The previous discussion focused on how to represent uncertainty in a functional role. It is, perhaps, worth noting that Roy and Carol decided to go ahead with their original choices of functional roles. Any of the above solutions would lead to conflicts with existing subsystems containing the role Prephenate dehydrogenase (EC 1.3.1.13). In effect, when the TyrA subsystem is added to our collection, it will force a resolution with the existing subsystems. Given that the choice to go with the sixteen separate functional roles that each contain both specificity and uncertainty information, this will mean replacing a single functional role in a number of subsystems with sixteen distinct columns.

A similar issue will arise as we consider the chorismate mutase, which occurs in a closely related piece of metabolism. In this case, there are three distinct, nonhomologous forms of the enzyme. The question here is Should we have a single chorismate mutase column (representing comments relating to form as either notes, annotations, or attributes), or should we have three distinct columns?" Veronika puts all three forms in a single column, while a number of us have adopted the convention of placing nonhomologous alternatives in separate columns.

Veronika feels (obviously correctly) that as the spreadsheet becomes huge, we lose an ability to maintain an overview. The detail swamps the representation of the essential.

I am exploring the issue of how well we can have both detail and overview by proper use of subsets. I will report on my experiments at the April meeting, undoubtedly using aromatic metabolism as a setting for exploring the issues.

Before leaving this topic, let us consider the position that including the cofactors in the functional role obscures the situation rather than clarifying it. We have chosen (in most cases) to leave properties like thermostable out of the functional role. Why include cofactors? Which cofactors are needed is important, but we consciously have chosen not to keep all important aspects of the function within the actual functional role. Might it not be better to save the cofactor elsewhere? I am reluctant to do so, since I would like to capture the actual reactions. On the other hand, we can attach multiple reactions to a single functional role. This view would lead to the following version:


Functional Role
Arogenate dehydrogenase (EC 1.3.1.43)
Prephenate dehydrogenase(EC 1.3.1.13)


Uncertainty or ambiguity would be represented using the ";" and "/" operators. There is a certain appeal to this brevity.

Veronika believes that there should be three functional roles: the two above and Cyclohexadienyl dehydrogenase which would cover both broad specificity or uncertainty.


This is where the discussion stands at the moment. I am actually feeling extremely satisfied that these issues are being cast so vividly in a form we must address.