Defining gene names

But Not

There are several occasions where you are sure what something is not but you don't know what it is. For example, there is a protein in your spreadsheet that is a duplicate, and you are sure that it is the wrong one because it is not a cluster. To remove it, you can add "But Not" to the end of the annotation. This is very useful as you can see what things have been checked and shown not to be something!

Replace names.

  1. Suppose that you wish to rewrite all assignments of the form homoserine kinase with Homoserine kinase (EC
  2. To do this, you first go to some fields in the initial page that have a button just beneath them called "Generate Assignments via Translation". You fill out three fields:

    1. About 1.5 inches up there is a field called "Save as user". You put your user name (without master: in front of it) in that box
    2. The From box gets "homoserine kinase"
    3. The To box get "Homoserine kinase (EC"

    You click on the "Generate Assignments via Translation". This builds something called an "assignments set" under your user name. Note that no assignments have yet been made.

  3. Then, you need to process the assignments set. You do this by going to the search form about 1.5 inches below where you were to were it says "Process Saved Assignment Sets". You fill in your user name (WITH the master:) and click on "Process Assignment Sets".
  4. You will see the assignment sets that have been saved, you select one, and you "process it". Processing an assignment set amounts to

Edit variants

A very useful tool for managing the variants of your subsystems.


IMPORTANT: Close your subsystem before using this tool to change the variants codes.

Make trees

Here is how to make a tree and use it to edit annotations, as described by Ross.

  1. Pick a column in one of your subsystems that contains, say, 20-50 PEGs.
    For myself, I am choosing TcuB: works with TcuA to oxidize tricarballylate to cis-aconitate in Tricarballylate Utilization.
  2. Go down to For sequences in a column (i.e., role): and pick the appropriate column. Then click on Align Sequences in Column
    This may run for a while, since the program must try to construct one or mor multiple sequence alignments.
  3. When it comes back, go to the very end of the subsystem stuff, and you should see one or more trees shown. They all have numbers. Mine is "2.1" where the "2" means "role 2" and the ".1" means the first alignment (of possibly several) constructed from the sequences in the column.
  4. Now go up to Realign subgroup within a column (adding homologs):
  5. Type in your tree number (in my case, 2.1)
  6. In the box labeled Include homologs that pass the following threshhold: and type in "1.0e-20" (which means look for sequences that are similar to those in the column at a cutoff of 1.0e-20)
  7. Now click on Realign Sequences in Column
  8. When the page comes back, go to the bottom, select the tree you made (in my case "2.1"), and click on use_tree.
  9. You are now in an environment where you can look at the phylogenetic tree, you can reroot it, or you can make assignments using it.

Here are some things you can do with the tree