Documentation read from 04/17/2019 22:07:28 version of /vol/public-pseed/FIGdisk/FIG/bin/svr_summarize_protein_families.
Write out three simple reports relating to a proposed set of protein families.
If you are constructing a set of protein families for a set of genomes (say, a set of genomes that are the input of an attempt to form a "pangenome"), it is useful to get some summaries on how well the genes were separated into families. This little program writes three reports: Report-on-Sets summarizes the number of sets of different sizes, Report-on-Genome is a report that allows you to see the distribution of set sizes containing genes from each genome, and Report-on-Intersections shows the number of sets in common between all pairs of genomes.
The outpt files have the following formats:
Set-Report is a two-column table containing ['Set','Size-of-Set'] Report-on-Genomes is a 3-column table: ['Genome','Size of Set','Number Sets'] Report-on-Intersections' is a 3-column table: ['Genome1','Genome2','Number of Common Sets']
These are meant to be used as input to a spreadsheet or some other tool for trying to analyze how well the correspondences could be formed.