Documentation read from 04/17/2019 22:07:28 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_with_close_blast_hits.

svr_with_close_blast_hits -d DB [-m MaxDist ] [-p MaxPsc] < PEGs > +[PegLocation,HitLocation] 2> no.hits

svr_with_close_blast_hits -d DB [-m MaxDist ] [-p MaxPsc] < PEGs > +[PegLocation,HitLocation] 2> no.hits

Determine which of the input PEGs have blastX hits to a given DB "close".

------

The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain a PEG, but you can specify what column the PEG IDs come from.

If some other column contains the PEGs, use

    -c N

where N is the column (from 1) that contains the PEG in each case.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output and standard error. Genes in which there is a close blastX hit against a given protein DB are written to STDOUT (with three appended columns: the Gene location, the hit location, and the other gene that had the best blast score). Genes that fail to hit anything are written to STDERR.

Command-Line Options

-c Column

This is used only if the column containing PEGs is not the last.

-d BlastDB

This is the name of a protein blast DB. It is assumed that formatdb has already been run to properly format it.

-m MaxDist

This is the distance used to snip out a section of DNA centered on the PEG. The snipped DNA will be of length ((2 * MaxDist) + length of PEG). Default is 2000.

-p MaxPsc

This is the maximum Psc used to determine whether or not there was a significant similarity

Output Format

PEGs that can be clustered are written to STDOUT. Three columns are added at the end of each line in STDOUT -- the location of the PEG, the location of a significant blast hit, and the other PEG that generated the hit. The locations will be in the form GID:Contig_Start[+-]Length. For example,

    100226.1:NC_003888_3766170+612

would designate a gene in genome 10226.1 on contig NC_003888 that starts at position 3766170 (positions are numbered from 1) that is on the positive strand and has a length of 612.

When a PEG has no hit, the original line (with no added columns) is written to STDERR.