Documentation read from 04/17/2019 22:07:26 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_get_rep_genomes.
Get a set of representative genomes using heuristics and the NCBI taxonomy
------
Example:
svr_get_rep_genomes -n 80 -f taxonomies -t Proteobacteria -c 1 -m 2000000
would produce a 5-column table. The first column would contain KBase IDs for the selected genomes, the second column would have the SEED ID, the third column is the size of the genome, the fourth column is the number of contigs, and the fifth is the NCBI taxonomy.
-n says "get 80 genomes" -f taxonomies indicates a file that should contain the NCBI taxonomies (built by running this program with the name of a file that does not exist, causing the program to build it) -t Proteobacteria says "get the 80 genomes from the taxonomic grouping Proteobacteria" -c 1 says "give me only genomes with a single contig" -m 2000000 says "get only genomes that are at least 2M in size
------
You may or may not get exactly that number
If the file does not exist, running the program builds it (and it may take a few minutes).
Scan the tax-file if you are not sure of the NCBI names of taxonomic groups
The standard output is a tab-delimited file. It consists of the following fields:
the KBase ID of a selected genome the SEED ID of a selected genome the size of the genome (in bp) the number of contigs in the genome a representation of the NCBI taxonomy with consecutive groups separated by ": "