Documentation read from 04/17/2019 22:07:26 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_img_analysis.

svr_img_analysis

svr_img_analysis

    svr_img_analysis <directory>

Read an IMG genome directory and compare it to the corresponding Sapling genomes (if any). The single positional parameter is the IMG genome directory name. Note that the last level of the directory name must also be the IMG genome number. In other words, if the directory name is ~/genomes/IMG/637000001, then the genome name must be 637000001.

This method imports an IMG genome into memory and then performs a gene-to-gene comparison between it and each Sapling genome with the same contigs. It produces a report on how many genes are found in both, which genes are only found in the Sapling genome, and which genes are only found in the IMG genome.

The key files in the IMG directory are the *.fna file, which is a FASTA file containing the contigs, and the *.genes.tab.txt file, which is a tab-delimited file describing the genes. An MD5 identifier is produced for each of these genes and the MD5s are used to map the genes between the IMG and Sapling genomes.

Currently, this is all done in memory, which may be a strain for eukaryotic genomes.

The report is produced to the standard output.

Command-Line Options

url

The URL for the Sapling server, if it is to be different from the default.

recursive

If this option is specified, then the command-line parameter is treated as a directory of IMG genome directories instead of a single IMG genome directory. Use this option to batch-process a large number of genomes.

terse

If this option is specified, then only statistical information will be output. Detailed descriptions of which genes and proteins do not match will not be output.