Documentation read from 04/17/2019 22:07:27 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_make_pan_genome_prot_families.
Construct the protein families needed to study Pan Genomes
The study of pan genomes focuses on protein families composed of corresponding proteins. This program takes an input file that defines where to find the genomes, what proteins each contains, locations for the proteins, and functions for the proteins.
The program is invoked using
svr_make_pan_genome_prot_families [options] < FileDefiningGenomes > ProteinFamilies The genomes can be identified by a genome ID from P-SEED, a SEED/RAST directory, or a triple of files (fasta,tbl,assigned_functions). Each line of the input file describes one of these three sources of a genome.
Directory used to store the binary correspondences
Minimum identity used in forming binary correspondences (defaults to 80)
Use -bbhs=1 to force connections to be bidirectional best hits (BBHs). Defaults to 1, so use -bbhs=0 to get a looser matching procedure.
Minimum number of genes in context that can be paired (defaults to 5)
Minimum number of the pairs in context that contain matching functions. (Defaults to min(2,# genes in context)).
Maximum p-score required in correspondences (defaults to 1.0e-10)
Fraction of each gene in a pair that must be within the region of similarity if the pair are to be considered as "corresponding" (defaults to 0.7)
Number of computations of correspondences that can be run in parallel.
The output files defines the resulting protein families. Each line contains
[SetNumber,ProteinID,AssignedFunction]