Q(uestion): Are MIMAG values provided through IMG? Or do we need to calculate these values ourselves based on info available on IMG?
A(nswer): IMG provides CheckV completeness and contamination and RNA counts for each bin, and indicates the corresponding quality tier following the MIMAG standard
Q: Do you check MIMAG compliance of bins loaded into IMG?
A: We do, and we only load bins with medium or high quality (i.e. low quality bins will not be available through the IMG interface).
Q: Any control for “strain heterogeneity”?
A: Not directly, i.e. a genome bin is a “population genome”, and may include genome fragments from several closely related strains
Q: If one searches for genomes with Find Genomes –> Genome by Taxonomy, they can find not only genome from isolates but also MAGs. What is the overlap of those MAGs with the bins found via the path showed now in the presentation?
A: MAGs available through “Find Genomes” were obtained from individual users or legacy MAGs imported from NCBI. These are genome bins from which the corresponding metagenome may or may not be in IMG, and for which we expect the submitter did some level of curation. Our policy is to not import MAGs from GenBank at present given the variable quality of this data. The genome bins that IMG generates and displays through the “Search Bins” are automatically created by the IMG pipeline and not curated, so are not included in the “Genome Search” but only available through this specific “Bin Search”.
Q: Does the quality category mostly refer to the completeness, or some other metrics?
A: The quality is a combination of completeness and contamination per the MIMAG standards (go here to read the paper).
Q: Are all the bins available in IMG created with the same pipeline? e.g. same versions of tools etc?
A: The pipeline is essentially the same in a sense that it uses the same MetaBat, CheckM and GTDB-tk. However, their versions may be different. This is why the pipeline information is listed for each bin.
Q: Are bins representing only the core genome of the population? I guess that this population would correspond to a species but not to higher taxonomic levels, is it right?
A: Correct, bins will tend to represent the core genome of the population, and “accessory” regions of the genomes can be missed, i.e. would be unbinned scaffolds/contigs.
Q: I noticed some metagenomes have different assembly versions. Often there’s a newer version of the assembly made with SPADES. It seems the bins are added only for this SPADES assembly version. Do you plan to add bins for the other assembly versions – such as done with Megahit, or combined assemblies
A: At this point, there is no plan to go back to older datasets and bin these, as we prioritize binning newly submitted metagenomes.
Q: What’s the maximum number of scaffolds/bins that can be selected at once?
A: The limits are set by IMG preferences in MyIMG menu. By default they are set at 1,000, but you can increase it.
Q: I don’t quite get the difference between adding the scaffolds to adding the genomes to your cart
A: One thing that may not be intuitive is that, for IMG, a “Genome bin” is a list of scaffolds, but it is not a genome. That is the reason why bins do not appear in “Genome Search”, and why users cannot add genome bins to a “Genome Cart”. For bins, adding to the “Genome Cart” would instead add the corresponding metagenome to the cart.
Q: Can you reduce the bins by taxonomy to only search across select metagenomes, i.e. only the metagenomes in my cart? What if we are interested in just a set of metagenomes ?
A: This is not possible with the “browse” visualization, but is available through the “Bin Search” page.
Q: The Genome Portal has places it allows regex. Can you use regex in these searches/any IMG search box?
A: You can’t use regex in the “Quick Search” field on top of IMG page (as far as I know). However, specific fields in the Bin Search (and other advanced search tools) do allow for regex. The way you should format your regex is detailed at the top of the “Advanced Search Builder”. Note that I use “regex” loosely here, what is available is mostly partial text search and “range” search for numeric fields. In addition, IMG allows regex filtering of most tables, i.e. you could use regex on the result table you would get from a search. Please refer to the previous webinar for details.
Q: What is the difference between bin lineage and gtdbtk lineage given the fact that the img pipeline uses gtdbtk to infer phylogeny?
A: IMG pipeline usees its own taxonomic classification tool which is not gtdbtk. The two classifications are different in terms of the tool used to infer taxonomic placement, and in the underlying taxonomy as well. You can see the GTDB classification here. IMG “lineage” is instead based on NCBI taxonomy.
Q: Can I BLAST a particular gene sequence within a specific set of binned metagenomes?
A: No, right now BLAST is available only against the entire metagenome. However, you can filter BLAST results to retain only hits to scaffolds within bins using tools in the Scaffold Sets tab in Workspace, like Set Operation.
Q: Is there an API version for filtering and exporting data similar to the GUI?
A: At this point, there is no API which allows to browse and search IMG genome bins as done through the GUI. But
Q: How can you download IMG bins?
A: You would add the scaffold from the bin you are interested in to your cart or set, and you can download the scaffolds from there, as shown in the previous webinar.
Q: How can I export the specific metagenome bins after I’ve got the ones I want? Or export all the genes (possibly tens or hundreds of thousands) out of the specific bins ? I’m pretty sure you can’t do more than 20,000 genes from the IMG GUI
A: You can add them as Scaffold Sets to your Workspace and then use Export there to generate a nucleotide fasta file. Note that this is limited by the size of your workspace, so indeed will be limited to 20,000. You can also add all genes on these scaffolds as Gene Set in your Workspace and export their protein fasta from there. For large-scale studies, the easier would indeed be to retrieve the metagenome of interest through the genome portal, and then use the list of scaffold exported for each bin to “reconstruct” bins on your side.
Q: Is blast search as you described for Martha the only way to challenge metagenomes with specific nucleotide sequences? In other words, is that the only method to do an “in silico” primer search?
A: Correct, this would be the way to do this type of primer search using IMG.
Q: I want to search for some species of bacteria within a set of 3k metagenomics. The easier way to go is through IMG/MER > Bins by taxonomy or to so “Compare Genomes” > Genomes within Metagenomic?
A: Since only medium and high quality are included in IMG, in order to detect species in metagenomes with some sensitivity, I would suggest using “Compare Genomes”. But as Emiley mentioned in the webinar, it mostly depends on the analysis you would like to run: i.e. is it more comparative genomics, or detection of a species across metagenomes. For the latter, as a quick search to see that the species is present at all, we would recommend IMG BLAST->RNA with 16S or 23S rRNA sequences. Don’t forget to change e-value to e-50 or so, or else you will get too many hits.
Q: Can you download scaffolds like you can download genes/genomes from your genome cart (like last webinar)?
A: Correct, once you have added the scaffolds to your cart, or the genes to your cart, download would work the same way as presented in the previous webinar
Q: And is it possible to add bins from multiple metagenomes in the scaffold cart and download them all together?
A: Yes, you can add multiple bins to Scaffold Cart or copy them to your Workspace Scaffold Sets and export all of them at once.
Q: There’s a cart limit that is quickly reached in these fragmented in the number of scaffolds = so is there a Globus version?
A: IMG is not the place for batch downloads. However, you can export bins in the form of scaffold ids (again, preferably from your Workspace and not Scaffold Cart), and then extract the corresponding scaffolds/contigs from the nucleotide fasta file you’ve downloaded via Genome Portal.
Q: I tried to export a set of rRNA genes (20K) from a metagenomic, however I was able to add only 999 rRNA genes to the gene card. How do I go about that?
A: Two things can help here: 1. Change your IMG preferences to increase the limits of gene lists. This is under MY IMG menu: go to prefs, and set gene list to 20000 max or 2. try creating a gene set in Workspace and export it from there.