Documentation read from 04/17/2019 22:07:28 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_with_close_blast_hits.
Determine which of the input PEGs have blastX hits to a given DB "close".
------
The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain a PEG, but you can specify what column the PEG IDs come from.
If some other column contains the PEGs, use
-c N
where N is the column (from 1) that contains the PEG in each case.
This is a pipe command. The input is taken from the standard input, and the output is to the standard output and standard error. Genes in which there is a close blastX hit against a given protein DB are written to STDOUT (with three appended columns: the Gene location, the hit location, and the other gene that had the best blast score). Genes that fail to hit anything are written to STDERR.
This is used only if the column containing PEGs is not the last.
This is the name of a protein blast DB. It is assumed that formatdb has already been run to properly format it.
This is the distance used to snip out a section of DNA centered on the PEG. The snipped DNA will be of length ((2 * MaxDist) + length of PEG). Default is 2000.
This is the maximum Psc used to determine whether or not there was a significant similarity
PEGs that can be clustered are written to STDOUT. Three columns are added at the end of each line in STDOUT -- the location of the PEG, the location of a significant blast hit, and the other PEG that generated the hit. The locations will be in the form GID:Contig_Start[+-]Length. For example,
100226.1:NC_003888_3766170+612
would designate a gene in genome 10226.1 on contig NC_003888 that starts at position 3766170 (positions are numbered from 1) that is on the positive strand and has a length of 612.
When a PEG has no hit, the original line (with no added columns) is written to STDERR.