Re-annotation Session 19.03.2013, 6.00 pm, CBRS meeting, San Antonio



Discussion 1: Intro and automatic annotation

Drawbacks of annotations in current databases were shown up and illustrated with examples. The approach proposed for the re-annotation project was presented.


Q: What approach is used in ConsPred? Which tools are used?

A: A combination of intrinsic and extrinsic methods; multiple tools (Glimmer, GeneMark, Prodigal, Critica) for intrinsic gene prediction; tRNAscan-SE, rnammer and cmsearch for prediction of tRNAs, rRNAs and ncRNAs; GFF (gene feature format) used as exchange format; evidences are weighted individually in order to prevent obvious false positive predictions (e.g. ORFs in rRNA gene loci);


Q: How does usage of sequence similarity work?

A: ORFs having significant homology outside close relatives are integrated into gene prediction; close relatives are thus excluded from the Blast database to avoid false positives resulting from synteny;


Q: Does the state of annotation change?

A: Principles for prokaryote gene prediction are rather stable now; pipeline can be adapted and extended by additional tools/ newer tool versions;


Q: Does the annotation pipeline also look for genes in intergenic regions?

A: Genes in intergenic regions which were overlooked by the intrinsic prediction methods are found by sequence similarity if the aminoacid sequence is conserved;



Discussion 2: Nomenclature

Proposed rules for nomenclature used in the re-annotation were shown.


Q: How will differentially named genes be handled (e.g E.coli, Yersinia)?

A: Use synonyms for conflicting genes; create lists for synonymous gene names; contact SwissProt to integrate synonyms into their annotation; establish separate session on gene names at future CBRS meetings;


Q: What threshold for “putative” functions.

A: A threshold according to existing annotation protocols will be used;



Discussion 2: Manual refinement

Proposed framework for manual refinement based on Excel-tables was explained.


Q: How to deal with deleted genes (pseudogenes, stop-codons)?

A: Pseudogenes will be highlighted as such in the Genbank file; orphans (false-positives) will be indicated as “genes to be deleted”;


Q: How are conflicting data / different naming from different groups dealt with?

A: Three rounds of manual refinement; converge different entries if necessary; multiple gene names will be supported;


Q: How to split annotation effort?

A: Groups only annotate experimentally characterized proteins from their research area using existing literature (Pubmed IDs); annotation software will transfer these annotations to orthologous genes;


Q: Number of papers that should/can be entered?

A: At least one paper per function; the paper should to be the original paper in which the annotation has been established;


Q: Is position of functional domains added in the annotation?

A: Functional domains are annotated in ConsPred;


Q: How to deal with information from large-scale OMICS projects?

A: Results from large-scale screens should be integrated into SwissProt (mainly into the “Protein existence” tags); but will also be used for the ConsPred annotation if suitable;


Q: How to deal with “conserved hypothetical proteins”?

A: Will be integrated into the naming scheme/nomenclature; Conspred is flexible w.r.t. the taxonomic level which determines the conservation;



Discussion 4: Submission/publication

Details of the submission and benefits of the publication were illustrated;


Q: Additional side results?

A: List of synonymous gene names as supplementary information; documentation of re-annotation methodology, efforts and problems;


Q: Positive effects / benefits?

A: Not only genome record but also the SwissProt database will benefit from manually refined annotations; every contributor is co-author on the re-annotation publication; updated annotations of all chlamydial genomes can be used by the community; future annotation projects should follow to maintain the annotation;