Q(uestion) : Hi, When will the record of this webinar be available online?
A(answer): Webinar recording(s) will be made available after the last one on June 16th, 2020
Q: Is there a maximum number of genes that IMG can save in a cart/workspace?
A: There is a limit to the amount of data in Workspace. It is 5Gb per user, the combined size of all sets and files being generated, like fasta files for export. The limit on the carts is 20,000 (20,000 genomes, 20,000 genes, 20,000 scaffolds, 20,000 functions). Note that you may have to change your preferences by going to MyIMG -> Preferences to increase the display limits to the maximum number of objects.
Q: Can you share the link for the public query?
A: Here it is: https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=Workspace&page=public_genome_search_history
And you can view the Advanced Genome Search Webinar for more information re. genome search: https://bit.ly/IMGwebinar20GenomeSearch
Q: Why is only this module selected and not the others?
A: It’s relevant to the demo biological example that was outlined in the slides. She is interested in seeing if N2 fixation and nodulation functions are present across all Bradyrhizobia.
Q: I missed what the difference between function vs genomes and genomes vs function was.
A: Selecting one or the other will determine rows vs columns orientation. Function (rows) vs genomes (columns) or vice versa
Q: Is the search based on annotations metadata or sequences?
A: It’s based on sequence annotations. Protein CDS are searched against (for example) Pfam, COG, or Kegg Orthology databases – you could do the same search using other annotations as well. You are then querying the results of these annotations using IDs or search terms.
Q: Is it possible to not find a gene due to outdated annotation from several years ago?
A: Yes, it is a possibility. Some very old genomes in IMG were annotated using an older pipeline, those genomes do not undergo structural re-annotation (that is, there are no updates of ORF finding and CDS calling). However, there are updates of functional annotations of existing CDS – even for older genomes (isolates only) – Pfam, Tigrfam, COG, KO, etc assignments are updated with the latest versions of these databases.
Q: And the other explanation might be due to the horizontal transfer of those genes which do not share the characteristics of the rest of the genome
A: Yes, agree. Horizontally transferred regions would be also missed by the binning process, and so will be other regions with deviant oligonucleotide composition, like ribosomal operons, if they are found on a separate contig, and not assembled with a large fragment of core genome.
Q: Would you please explain more why the sequences from a single genome share similar coverage?
A: Coverage is interpreted as a proxy for “number of copies of the genome in the original sample”. While the sequencing process will include a fragmentation of these original templates in short “reads”, the number of these short reads should be consistent across the whole genome. The only exceptions would be actively replicating genomes, in which the chromosomal regions surrounding the origin of replication will be present in more than 1 copy. For instance, in an exponentially growing E. coli, the regions around the origin of replication may be present in 6 or even 8 copies. However, exponentially growing bacteria are rare in environmental communities. In contrast, plasmids can exist in multiple copies per cell, i.e. for one copy of the main chromosome, you could have multiple copies of the plasmid. That means their coverage will sometimes be different from the one of the main chromosome. Another more simplistic way to think about it – if you have 2 genomes in a sample – one is present at very high abundance in the sample, the other genome represents a minor fraction of the sample – assuming both are sampled and sequenced uniformly – you should see a consistent and more or less uniform higher coverage for the abundant genome and low coverage of the minor genome. But as stated above, there can be natural variances that confound this.