Documentation read from 04/17/2019 22:07:27 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_mapped_genomes.

svr_mapped_genomes

svr_mapped_genomes

Get maps between a reference genome and a set of genomes to which you wish to compare the reference genome.

------

Example:

    svr_mapped_genomes -g 83333.1 -d Maps < genomes.to.compare.against

would construct a directory of mappings between genes 83333.1 and the genomes read from standard input. The maps would come back as files in the directory "Maps" (which would get created if necessary).

------

The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the genome for which functions are being requested. If some other column contains the genomes, use

    -c N

where N is the column (from 1) that contains the genome in each case.

Command-Line Options

-c Column

This is used only if the column containing PEGs is not the last.

-g genome

This designates the reference genome

-d directory

This designates the directory into which maps are written. It will be created if it does not already exist

Output Format

The output is written as "maps" in the designated directory. Each map is a file of 18 fields, tab-separated:

1 The ID of a PEG in genome 1. 2 The ID of a PEG in genome 2 that is our best estimate of a "corresponding gene". 3 Count of the number of pairs of matching genes were found in the context. 4 Pairs of corresponding genes from the contexts. 5 The function of the gene in genome 1. 6 The function of the gene in genome 2. 7 Comma-separated list of aliases for the gene in genome 1 (any protein with an identical sequence is considered an alias, whether or not it is actually the name of the same gene in the same genome). 8 Comma-separated list of aliases for the gene in genome 2 (any protein with an identical sequence is considered an alias, whether or not it is actually the name of the same gene in the same genome). 9 Bi-directional best hits will contain "<=>" in this column; otherwise, "->" will appear. 10 Percent identity over the region of the detected match. 11 The P-score for the detected match. 12 Beginning match coordinate in the protein encoded by the gene in genome 1. 13 Ending match coordinate in the protein encoded by the gene in genome 1. 14 Length of the protein encoded by the gene in genome 1. 15 Beginning match coordinate in the protein encoded by the gene in genome 2. 16 Ending match coordinate in the protein encoded by the gene in genome 2. 17 Length of the protein encoded by the gene in genome 2. 18 Bit score for the match. Divide by the length of the longer PEG to get what we often refer to as a "normalized bit score".