Documentation read from 04/17/2019 22:07:25 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_cluster_locations.

svr_cluster_locations < LOCs > CLUSTERs

svr_cluster_locations < LOCs > CLUSTERs

Cluster locations on the chromosome

------

Example:

    svr_all_features 3702.1 peg | svr_fids_to_locations | svr_cluster_locations -m 3000 -n 3

would produce a 3-column table. The first column would contain PEG IDs, the second the PEG locations, and the third cluster IDs. The file would be sorted on the second column. ------

The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the LOC for which clusters are being requested. If some other column contains the LOCs, use

    -c N

where N is the column (from 1) that contains the location in each case.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output.

Command-Line Options

-c Column

This is used only if the column containing LOCs is not the last.

-m Maximum Gap between LOCs in a cluster [default is 3000]

Clusters are thought of as "runs with gaps less then or equal to this value". A run can include genes in either (or both) orientations.

-n Minimum Size of Cluster [default is 2]

Kept clusters will contain at least this many locations. runs of size less than this will not show up in the output (use 1 if you want to keep all input lines).

Output Format

The standard output is a tab-delimited file. It consists of the input file with an extra column added (the Cluster IDs)