Re-annotation Session 19.03.2013, 6.00 pm, CBRS meeting, San Antonio
Discussion 1: Intro and automatic annotation
Drawbacks of annotations in current databases were shown up and illustrated with examples. The approach proposed for the re-annotation project was presented.
Q: What approach is used in ConsPred? Which tools are used?
A: A combination of intrinsic and extrinsic methods; multiple tools (Glimmer, GeneMark, Prodigal, Critica) for intrinsic gene prediction; tRNAscan-SE, rnammer and cmsearch for prediction of tRNAs, rRNAs and ncRNAs; GFF (gene feature format) used as exchange format; evidences are weighted individually in order to prevent obvious false positive predictions (e.g. ORFs in rRNA gene loci);
Q: How does usage of sequence similarity work?
A: ORFs having significant homology outside close relatives are integrated into gene prediction; close relatives are thus excluded from the Blast database to avoid false positives resulting from synteny;
Q: Does the state of annotation change?
A: Principles for prokaryote gene prediction are rather stable now; pipeline can be adapted and extended by additional tools/ newer tool versions;
Q: Does the annotation pipeline also look for genes in intergenic regions?
A: Genes in intergenic regions which were overlooked by the intrinsic prediction methods are found by sequence similarity if the aminoacid sequence is conserved;
Discussion 2: Nomenclature
Proposed rules for nomenclature used in the re-annotation were shown.
Q: How will differentially named genes be handled (e.g E.coli, Yersinia)?
A: Use synonyms for conflicting genes; create lists for synonymous gene names; contact SwissProt to integrate synonyms into their annotation; establish separate session on gene names at future CBRS meetings;
Q: What threshold for “putative” functions.
A: A threshold according to existing annotation protocols will be used;
Discussion 2: Manual refinement
Proposed framework for manual refinement based on Excel-tables was explained.
Q: How to deal with deleted genes (pseudogenes, stop-codons)?
A: Pseudogenes will be highlighted as such in the Genbank file; orphans (false-positives) will be indicated as “genes to be deleted”;
Q: How are conflicting data / different naming from different groups dealt with?
A: Three rounds of manual refinement; converge different entries if necessary; multiple gene names will be supported;
Q: How to split annotation effort?
A: Groups only annotate experimentally characterized proteins from their research area using existing literature (Pubmed IDs); annotation software will transfer these annotations to orthologous genes;
Q: Number of papers that should/can be entered?
A: At least one paper per function; the paper should to be the original paper in which the annotation has been established;
Q: Is position of functional domains added in the annotation?
A: Functional domains are annotated in ConsPred;
Q: How to deal with information from large-scale OMICS projects?
A: Results from large-scale screens should be integrated into SwissProt (mainly into the “Protein existence” tags); but will also be used for the ConsPred annotation if suitable;
Q: How to deal with “conserved hypothetical proteins”?
A: Will be integrated into the naming scheme/nomenclature; Conspred is flexible w.r.t. the taxonomic level which determines the conservation;
Discussion 4: Submission/publication
Details of the submission and benefits of the publication were illustrated;
Q: Additional side results?
A: List of synonymous gene names as supplementary information; documentation of re-annotation methodology, efforts and problems;
Q: Positive effects / benefits?
A: Not only genome record but also the SwissProt database will benefit from manually refined annotations; every contributor is co-author on the re-annotation publication; updated annotations of all chlamydial genomes can be used by the community; future annotation projects should follow to maintain the annotation;