Documentation read from 04/17/2019 22:07:25 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_coregulated_by_correspondence.

svr_coregulated_by_correspondence [-m MinPCC] [-f] [-n MaxConn] [G1 G2 G3 ...]

svr_coregulated_by_correspondence [-m MinPCC] [-f] [-n MaxConn] [G1 G2 G3 ...]

Get genes that have evidence of coexpression indirectly (i.e., it seems to exist between corresponding genes in one or more other genomes with expression data).

------

Example:

    svr_all_features 83333.1 peg | svr_coregulated_by_correspondence -m 0.8 83333.1

would produce a 3-column table. The first column would contain PEG IDs for genes occurring in genome 83333.1, the second would give "relevant evidence from genes in 83333.1, and the third would give the PEG that seems to have a similar expression profile. Note that this would probably be an enormous file for reasons we will explain below. Unless you use the -n option (say, -n 30), you should probably run only a small set of genes as input.

The notion of "relevant evidence" is composed of a number of entries separated by semi-colons (one entry per genome with corresponding genes correlated by expression data). Each such entry is a triple of

     "Gene1,PCC,Gene2"

------

The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the PEG for which functions are being requested. If some other column contains the PEGs, use

    -c N

where N is the column (from 1) that contains the PEG in each case.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output.

Command-Line Options

-c Column

This is used only if the column containing PEGs is not the last.

-m MinPCC

Minimum value for the Pearson correlation coefficient

-b

Show only the best indirect correlation

-f

Requests a full display. This produces 1 line per item of supporting evidence. It is the "expanded" format with functions of PEGs displayed. Do not use it for more than a relatively small set of PEGs (or you may get flooded in output).

-n MaxConnections [default is 50]

Often two genes have a common expression pattern just because they are both "on" in all experiments or both are "off" all the time. When you use indirect evidence from other organisms, this can balloon the output. This parameter says "Consider only genes that have correlation coefficients above 0.9 for MaxConnections genes or less.

Output Format

The standard output is a tab-delimited file. It consists of lines from the input file that are for PEGs that have Pearson correlation coefficients that indicate potential correlation. The lines will have two appended columns: the relevant evidence and the functionally PEG that appears to have a correlated profile.