Documentation read from 04/17/2019 22:07:24 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_CS_pipeline.
Generate data needed to support close-strain analysis.
------
Example:
mkdir Data.Strep svr_CS_pipeline -d Data.Strep -g Streptococus or fill in Data.kmers, rep.genomes and genome.names and use svr_CS_pipeline -d Data.Streptococcus or fill in Data.kmers, rep.genomes, genome.names, Seqs, PegLocs, and PegDNA and use svr_CS_pipeline -d Data.Streptococcus or fill in Data.kmers, rep.genomes, genome.names, Seqs, PegLocs, PegDNA, families.all and use svr_CS_pipeline -d Data.Streptococcus
This is an extended Data directory (what Bob might call a "close strain workspace"). It includes a Data.kmers directory that is used by kmer_guts to annotate PEGs, a "rep.genomes" and "genme.names" files that identify the genomes to be included, s set of derived protein families and a set of derived files used to support comparative analysis of the genomes.
This is the fraction used by Gary's representative_sequences when choosing representative genomes
Output is added to the extended Data directory. The key files are
families.all [the protein families underlying everything] FamilyID - an integer Function - function assigned to family SubFunction - the Function and an integer (SubFunction) together uniquely determine the FamilyID. Another way to look at it is a) each family is assigned a unique ID and a function b) multiple families can have the same function (consider "hypothetical protein") c) the Function+SubFunction uniquely determine the FamilyID PEG LengthProt - the length of the translated PEG Mean - the mean length of PEGs in the family StdDev - standard deviation of lengths for family Z-sc - the Z-score associated with the length of this PEG labeled.tree [a rooted labeled newick tree] readable.tree [an ascii version of labeled.tree] placed.events [adjacency shifts placed on the tree] Each line describes an event that occurred on an arc. The format used to encode the events is as follows: ancestral node node [the event occurred on the arc from the ancestor to the node] family:direction [thus, 1206:upstream meand the event occurred as a change of the protein family upstream of family 1206] ancestral-adjacency [family:strand of the adjacent family at ancestral node] node-adjacency [family:strand of adjacent family at the child] where.shifts.occurred [where families were gained/lost on arcs] describes where families were gained or lost ancestral node node (child of ancestor) family abcestral value node value
These are the files that drive the "What Changed?" application.