ConsPred – a rule-based (re-)annotation framework for prokaryotic genomes

ConsPred is a prokaryotic genome annotation framework that performs various intrinsic gene predictions, homology searches, predictions of non-coding genes, and complex features and integrates all evidence into a consensus annotation. ConsPred achieves high-quality and comprehensive annotations based on rules and priorities, similar to decision-making in manual curation. Parameters controlling the annotation process are configurable by the user.

ConsPred can be easily extended and adapted to specific needs. ConsPred generates genome annotations in formats ready for submission to public sequence archives.

ConsPred is implemented in Java, Perl, and Shell and is freely available under the Creative Commons license from https://sourceforge.net/projects/conspred/ or as an Amazon Machine Image for cloud computing. ConsPred database files (about 60GB) are updated once per month (usually on first Fridays).

ConsPred workflow: Coding sequences (CDS) are predicted by combining different ab initio gene predictions, and conserved open reading frames (ORFs) detected by homology search against the NCBI nr database. Database entries from closely related taxa are excluded to prevent possible misannotations. Putative pseudogenes are exported for user inspection. From all predicted non-protein-coding elements (NCE) those that cannot overlap with CDS are considered blocking NCE. CDS overlapping with blocking NCE are removed. Consensus CDS are obtained from predicted CDS and conserved ORFs by using predefined weights and rules. Consensus CDS are functionally annotated and then merged with the NCE into the final annotation files.