Documentation read from 04/17/2019 22:07:25 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_cluster_pegs.

svr_cluster_pegs [-m MaxDist ] < PEGs > +[ClusterID,Location] 2> singletons

svr_cluster_pegs [-m MaxDist ] < PEGs > +[ClusterID,Location] 2> singletons

Cluster PEGs that are close on the contig


The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain a PEG, but you can specify what column the PEG IDs come from.

If some other column contains the PEGs, use

    -c N

where N is the column (from 1) that contains the PEG in each case.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output and standard error. Clusters containing multiple genes go to STDOUT, while singletons go to STDERR.

Command-Line Options

-c Column

This is used only if the column containing PEGs is not the last.

Output Format

PEGs that can be clustered are written to STDOUT. Two columns are added at the end of each line in STDOUT -- a ClusterID (an integer uniquely clustering a set of PEGs) and a Location. The location will be in the form GID:Contig_Start[+-]Length. For example,


would designate a gene in genome 10226.1 on contig NC_003888 that starts at position 3766170 (positions are numbered from 1) that is on the positive strand and has a length of 612.

When a PEG does not cluster, the original line (with no added columns) is written to STDERR.