Virus Orthologous Groups

This release of Virus Orthologous Groups (VOGs) has been calculated with all virus genomes from NCBI RefSeq. Find a brief outline of the VOG construction workflow and summary statistics of a previous version (based on RefSeq 77) in the poster linked at the right side of this page. The manuscript for publication is in preparation. For all questions, suggestions and problem reports please contact us: contact.cube@univie.ac.at

From now on, VOG release numbers are equivalent to the RefSeq release numbers they are based on.

Latest version: VOG80 (February 2017; RefSeq release 80)

Web portal: under construction; expected early 2017
File download: http://fileshare.csb.univie.ac.at/vog/
File names and formats are similar to those used in eggNOG 4.5:

vog_functional_categories.txt
Text file listing the lettercodes of functional categories. These consist of "X" (unused in NCBI COG functional categories), followed by a lower case character indicating the functional category.

vog.species.list
Tab separated file of genomes used for VOG construction.
Columns: species name|taxon id|phage/nonphage|source|source version

vog.proteins.all.fa.gz
FASTA formatted file of all proteins from the genomes in vog.species.list. Protein IDs encode the taxonomy id of the genome and the RefSeq protein id. For peptides from polyproteins also the correspondin protein id of the polyprotein (CDS) is given.

vog.genes.all.fa.gz
FASTA formatted file of all gene sequences from the genomes in vog.species.list. Same IDs as in the protein file are used. For polyprotein genes the partial gene sequences of the peptides as well as the complete gene sequences of the polyprotein are contained.

vog.faa.tar.gz
Compressed archive of FASTA formatted files of the proteins contained in each VOG.

vog.raw_algs.tar.gz
Compressed archive of multiple sequence alignments for each VOG.

vog.hmm.tar.gz
Compressed archive of the HMMER3 compatible Hidden Markov Models obtained from the multiple sequence alignments for each VOG.

vog.members.tsv.gz
Tab separated file of VOGs and the comma separated lists of their member protein ids.
Columns: GroupName|ProteinCount|SpeciesCount|FunctionalCategory|ProteinIDs

vog.annotations.tsv.gz
Tab separated file of VOGs and their consensus functional annotations (preferrably from Swissprot annotations, if not available then the annotations from RefSeq were used).
Columns: GroupName|ProteinCount|SpeciesCount|FunctionalCategory|ConsensusFunctionalDescription

vog.lca.tsv.gz
Tab separated file of VOGs and the taxonomic lineage of the last common aencestor (LCA) of member genomes. Genomes with unclassified taxonomic lineages have not been used for LCA determination. The numbers of genomes per VOG and LCA, as well as the total numbers of genomes in the LCA are given.
Columns: GroupName|LastCommonAncestor|GenomesInGroupAndLCA|GenomesTotalInLCA

vog.virusonly.tsv.gz
Tab separated file of VOGs and their specificic occurrence in virus genomes. For this purpose the homology of all member proteins to cellular genomes from eggNOG 4.5 have been determined with three different stringencies:
High stringency: blastp e-Value <=1e-04 and hits in maximal 2 cellular genomes
Medium stringency: blastp e-Value <=1e-10 and hits in maximal 3 cellular genomes
Low stringency: blastp e-Value <=1e-15 and hits in maximal 4 cellular genomes
"Only in viruses" has been set true if members matched not more than the maximal number of genomes at the e-Value threshold for each stringency level.
Columns: GroupName|Only in viruses (high stringency)|Only in viruses (medium stringency)|Only in viruses (low stringency)
1=True; 0=False
This file is useful to extract virus-specific markers from all VOGs, based on your preferred level of stringency.