Documentation read from 04/17/2019 22:07:27 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_protein_assertions.

svr_protein_assertions

svr_protein_assertions

    svr_protein_assertions <gene_ids.tbl >assertion_data.tbl

Get a list of Annotation Clearinghouse assertions for the specified proteins.

The standard input should be a tab-delimited file with IDs in the last column. The IDs should be prefixed protein or gene IDs (e.g. uni|AYQ44, fig|360108.3.peg.1041, md5|4a+6lQzFY8hRkQyWPliFjw). For each of these identifiers, this script will search for an identifier in the Annotation Clearinghouse with an identical protein sequence that has an associated functional assignment. For that identifier, the following fields will be returned.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output.

  1. The identifier found.
  2. The scientific name of the associated genome (if any).
  3. 1 if we believe the identifier corresponds to the exact gene identified by the input identifier, else 0. If the input identifier does not specify a particular gene, this column will always be 0.
  4. The functional assignment associated with the protein ID.
  5. The source of the assignment.
  6. 1 if the assignment is considered expert, else 0.

The net effect is that for each identifier, we find the assignments for protein-equivalent identifiers in the annotation clearinghouse. Because there are many identifiers that produce the same protein sequence, each input line will generate multiple output lines.

Command-Line Options

url

The URL for the Annotation Clearinghouse server, if it is to be different from the default.