The Initial Exposure to Subsystems
The initial exposure to subsystems should take approximately 5-7
weeks. The goals of this section will be
- to understand one subsystem in depth,
- to gain familiarity with SEED, KEGG, and other tools needed to
explore the subsystem, and
- to produce a detailed analysis of at least the SEED and KEGG
assessments of which genes actually play each role in the
sequenced genomes.
This can be summarized by thinking of the goal as having the student
work with an existing subsystem and to write a detailed review of what
is known of the subsystem. Obviously, the quality of the review is
limited by the time available. It should lay the foundation for a
more complete review (to be done over an entire semester), should thde
student and professor select this project as a follow-on. The review itself
should be an HTML document, although many of the detailed notes should
also be included in the notes section of the updated subsystem.
I believe that the instructor should probably meet once per week for a
lecture and be available for questions throughout the week. Students
should sign up for the seed-users mailing list, and some advice
can be gained via the subsystems forum. Students are
encouraged to ask questions. If we get to many to handle properly, we
will try some other mechanism.
Students need to compile a set of comments, criticisms, proposed
alterations, and conjectures all relating to their subsystems. The
final "product" of the course will be an enhanced subsystem with
copious notes.
The First Week
During the first meeting, students must
- learn how to access whatever version of the SEED they are to use,
- pick or be assigned a subsystem to work on,
- learn how to make a copy of the subsystem for themselves, and
- begin an analysis of the subsystem.
The first three points need to be covered in a handout prepared by the
instructor (and the handout should probably be a URL given to the
students at the start of the first lecture).
The actual lecture should cover the overall goals of the course,
assignments of subsystems, and then a period in which the instructor
goes through the steps in an on-line demo.
I believe that the students should be given fairly substantial, but
very well-known, metabolic subsystems that are not overly complex.
Feel free to ask me for advice, and I urge instructors to let me know
which they choose (so I can try to avoid having the same subsystem
being reworked in multiple classes -- remember, the goal will be for the best
efforts to be extended to full reviews).
I suggest that the instructor pick an existing subsystem and practice
copying and deleting it. You might try
one of mine, since I try to keep my latest copies published
(so there is no chance that you can do anything too awful to them -- I
can always just download a current version).
Once the students have seen how to copy a subsystem (to create their
version with a distinct name), they should be given
several specific goals to work on:
- They should locate the section of the KEGG maps covered by the
subsystem, and they should make drawings (or print copies of it) that
can be used to make notes upon.
- They should create an editable file showing the specific
reactions, and each reeaction should be marked as either reversible or
irreversible. I believe that this information is probably accessible
via the EMP Database, but there
may be many ways (including chatting with a good biochemist) to get
the data.
- They should begin compiling literature references to reviews and
to papers in which enzymes of the pathway have been characterized.
- They should learn how to access their copy, make a list of the
variant codes, and attempt to figure out the meaning of each code.
It is the responsibility of the author of the subsystem to assign
meaningful codes, but this is often not done. The student needs to
formulate what constitutes an operational variant of the
subsystem.
This seems to me to be about what one might be able to accomplish in
the first week.
The Second and Third Week
The goals of the second and third weeks are roughly as follows:
-
The student will need to learn how to access his subsystem and to
understand the information that is included in it. The instructor
will need to go through many of the fields and options. Gradually,
the student will learn how to open up separate windows to keep track
of distinct genes/proteins. A minimal introduction should be given,
and then the instructor and students should move to specific questions
(and let these drive the exploration).
- I suggest that much of these two weeks be focused on cells of the
spreadsheet that contain duplicate genes (if you find yourself with a
subsystem that has few duplicates, so be it -- add more organisms to
the spreadsheet). First ask the students to compile a list of
spreadsheet cells that contain multiple gene IDs. Then, ask them to
move through them seeing whether or not the duplicates make sense.
Each duplicate warrants a discussion. Exploring the duplicates can
consume an arbitrary amount of time, and in most cases the results
will not be clear. However, this does offer a framework to explore
the functionality of the SEED, and in many cases the analysis will
reveal errors in the subsystems.
Perhaps the most valuable tool in analysis of duplicates will be the
use of clustering on the chromosome. One might also consider
introducing otrher techniques, but clustering is essential. The
student should be asked to formulate a rough idea of which organisms
show clustering. In cases with duplicates, the student should record
which of the genes, if any, are in clusters.
The Fourth Week
The fourth week should be used to explore "missing genes"; that is, to
focus on spreadsheet cells which probably should contain a gene, but
do not. There are many possible explanations, and the student should
move through the spreadsheet cells trying to determine what the most
likely explanation is in each case.
The student will need to learn how to
- determine whether or not the genome is missing pieces,
- whether the missing gene occurs in a number or organisms,
- whether it might be in the genome, but just not "called",
- whether the pathway might not really be active in the organism, or
- whether there is probably a new form of the enzyme that has not yet
been characterized.
The Fifth and Sixth Week
The fifth and sixth week should be used to summarize results and
polish the subsystem. The student should compose consistent variant
codes and document them. A list of genes that should have been
called, but were not, should be added to the notes. Outstanding
issues requiring wet lab conformation should be summarized. If there
exist genomes which should be added to the spreadsheet, the student
should begin adding them.
Summary
These comments are intendced only as loose guidelines.
Instructors are urged to keep detailed notes on how the overall
structure of the class could be improved, what problems to watch out
for, and so forth. As this component evolves we will try to update
this all too brief discussion.