Documentation read from 04/17/2019 22:07:25 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_big_repeats.

svr_big_repeats [-i MinIdentity] [-l MinLength] [-g Genome] [-f FastaContigs] [-t Features] [-b BlastDB] > repeats

svr_big_repeats [-i MinIdentity] [-l MinLength] [-g Genome] [-f FastaContigs] [-t Features] [-b BlastDB] > repeats

Find regions that appear to be big repeats (at the DNA level). This can be done by looking for multiple copies of identical DNA within a single genome or looking for instances of large repeats maintained as a Blast DB.

------ =head2 Command-Line Options

If neither the -g or the -f option are specified, contigs will be read from STDIN.

-i MinIdentity

To be considered a repeat, the blast must show identity values greater than this parameter. (defaults to 95).

-l MinLength

This is the minimum length of an identified region of similarity (default is 100)

-g Genome

Run the program on contigs from this genome

-f FastaContigs

A file containing the contigs for a genome in fasta format.

-t Features

If this is specified it names a file that contains feature IDs and locations. The right way to get such a file is to concatenate the tbl files from a RAST/myRAST/SEED directory.

-b BlastDB

If this is specified, a repeat is defined as a similarity against an entry in this DB (unlike the more normal case in which it is computed from multiple occurrences in a single genome).

Output Format

The output is a 6-column table of the form

    [LengthOfRepeat,Identity,Contig1,Beg1,End1,Contig2,Beg2,End2]

If the -t option is specied, you will get extra lines listing the features (and their locations) that occur in the similar regions.