Re-annotation Session 19.03.2013, 6.00 pm, CBRS meeting, San Antonio

 

 

Discussion 1: Intro and automatic annotation

Drawbacks of annotations in current databases were shown up and illustrated with examples. The approach proposed for the re-annotation project was presented.

 

Q: What approach is used in ConsPred? Which tools are used?

A: A combination of intrinsic and extrinsic methods; multiple tools (Glimmer, GeneMark, Prodigal, Critica) for intrinsic gene prediction; tRNAscan-SE, rnammer and cmsearch for prediction of tRNAs, rRNAs and ncRNAs; GFF (gene feature format) used as exchange format; evidences are weighted individually in order to prevent obvious false positive predictions (e.g. ORFs in rRNA gene loci);

 

Q: How does usage of sequence similarity work?

A: ORFs having significant homology outside close relatives are integrated into gene prediction; close relatives are thus excluded from the Blast database to avoid false positives resulting from synteny;

 

Q: Does the state of annotation change?

A: Principles for prokaryote gene prediction are rather stable now; pipeline can be adapted and extended by additional tools/ newer tool versions;

 

Q: Does the annotation pipeline also look for genes in intergenic regions?

A: Genes in intergenic regions which were overlooked by the intrinsic prediction methods are found by sequence similarity if the aminoacid sequence is conserved;

 

 

Discussion 2: Nomenclature

Proposed rules for nomenclature used in the re-annotation were shown.

 

Q: How will differentially named genes be handled (e.g E.coli, Yersinia)?

A: Use synonyms for conflicting genes; create lists for synonymous gene names; contact SwissProt to integrate synonyms into their annotation; establish separate session on gene names at future CBRS meetings;

 

Q: What threshold for “putative” functions.

A: A threshold according to existing annotation protocols will be used;

 

 

Discussion 2: Manual refinement

Proposed framework for manual refinement based on Excel-tables was explained.

 

Q: How to deal with deleted genes (pseudogenes, stop-codons)?

A: Pseudogenes will be highlighted as such in the Genbank file; orphans (false-positives) will be indicated as “genes to be deleted”;

 

Q: How are conflicting data / different naming from different groups dealt with?

A: Three rounds of manual refinement; converge different entries if necessary; multiple gene names will be supported;

 

Q: How to split annotation effort?

A: Groups only annotate experimentally characterized proteins from their research area using existing literature (Pubmed IDs); annotation software will transfer these annotations to orthologous genes;

 

Q: Number of papers that should/can be entered?

A: At least one paper per function; the paper should to be the original paper in which the annotation has been established;

 

Q: Is position of functional domains added in the annotation?

A: Functional domains are annotated in ConsPred;

 

Q: How to deal with information from large-scale OMICS projects?

A: Results from large-scale screens should be integrated into SwissProt (mainly into the “Protein existence” tags); but will also be used for the ConsPred annotation if suitable;

 

Q: How to deal with “conserved hypothetical proteins”?

A: Will be integrated into the naming scheme/nomenclature; Conspred is flexible w.r.t. the taxonomic level which determines the conservation;

 

 

Discussion 4: Submission/publication

Details of the submission and benefits of the publication were illustrated;

 

Q: Additional side results?

A: List of synonymous gene names as supplementary information; documentation of re-annotation methodology, efforts and problems;

 

Q: Positive effects / benefits?

A: Not only genome record but also the SwissProt database will benefit from manually refined annotations; every contributor is co-author on the re-annotation publication; updated annotations of all chlamydial genomes can be used by the community; future annotation projects should follow to maintain the annotation;