Q(uestion): Is it possible to also find the raw/clean reads of metagenomes in IMG?
A(nswer): For JGI-generated datasets, reads can be accessed through the Genome Portals site. Portals Website: https://genome.jgi.doe.gov/portal/
Q: For the functional annotations, are the annotations updated en masse? I.e., for PFAMs, for an organism that was annotated 5 yrs ago, if I want to compare it to one that I just finished, are the PFAM data ‘equivalent’ for the two organisms?
A: For isolate genomes, yes, there is a backfill process to update Pfam assignments. For metagenomes annotations are not updated. However, specifically for Pfams, the majority of them have been stable for the last 5 years or so.
Q: Is there some place to find the date of the backfill, for publication purposes?
A: Please check the genome add date in the Genome Details page. We generated the download files when we loaded the genome into IMG. The log of system-wide annotation updates (e. g. Pfam refresh) can be found in the IMG version page and also users are notified via IMG user forum.
Q: When I download the bacterial genome I have three files (taxonomy, genes and intergenic). Because I dowloaded the bacterial genome to be used as a reference in an assembly of my isolates, my question is whether the taxonomy FASTA file contain the genes that is in a separated file.
A: The standard IMG tarball does not include the taxonomy data. We may be mistaken, but you probably refer to the nucleotide fasta file of the genome assembly, which indeed has the name of the genome (taxonomy) in the fasta header. If we understood you correctly, then the answer is – yes, this fasta file includes the entire nucleotide sequence of the genome, both the genes and intergenic regions.
Q: In gff files of some genomes, there would be more than one IDs of product=16S. Are the sequences of the corresponding IDs always the same?
A: You’re right – some isolate genomes have multiple copies of 16S. In IMG each of them will be assigned a unique numerical gene id. Even if their sequences are identical, their locations on the chromosome are different. However, some genomes have multiple divergent copies of 16S. This is very rare in Bacteria, but not that uncommon in Archaea. The divergence of 16S copies in the same archaeal genome may be as high as 5%.
Q: Hi, what would be the easiest way to find if the dataset was published?
A: IMG stores this data in the “Is published” field. You can access this information in multiple places: you can use Advanced Genome Search with the field “Is published” (in “Sequence Assembly Annotation” category) to select only published or only unpublished genomes by setting the filter to “Yes” or “No”, respectively. You can add the column “Is published” to your search results or in your Genome Cart. Lastly, you can go to the Genome Details page of your dataset of interest. If you see a “Genome Publication” field with publication details (normally under the Project Map), this dataset is published.
Q: To be sure: if this is IMG data I can’t not use it?
A: It depends on whether the data is public or private, and for public data it depends on whether it’s a JGI- generated dataset (“Sequencing Center” field contains “DOE Joint Genome Institute (JGI)”) or not. Non-JGI generated public data can be used freely. JGI-generated published data can be used freely with an appropriate citation. Use of the JGI-generated unpublished data (“Is published” field is “No”) requires that you ask permission from the PI of the dataset (the information stored in the fields “Contact Name” and “Contact email”).
Q: What about to use data from gene neighborhoods? Where does that fall?
A: This is a pretty specialized topic, and we are not sure how many participants are interested in it. In general, IMG has several places where conserved neighborhoods are shown: on the Gene Details page of your gene of interest you can click on one of the “Conserved Neighborhoods” links (next to the chromosome neighborhood graphic display). You can also find homologs of your gene of interest either in an entire IMG (use “Top IMG Homologs” on Gene Details page) or in your set of genomes of interest (use “IMG Genome BLAST” on Gene Details page). Then add the homologs to the Gene Cart, and view gene neighborhoods in the “Neighborhoods” tab.
You can also search for conserved cassettes, which are based on the protein families of interest (rather than specific gene of interest). This option can be found in the Find Genes menu. Lastly, IMG-ABC allows you to search for co-occurrences of Pfams via the ClusterScout tool. Even though the tool is currently available only in IMG-ABC, it actually searches through the entire genomes, not just biosynthetic gene clusters. Please refer to IMG-ABC publications and IMG-ABC site for details.
Q: Can we save a set of genomes that we did from a search to later do a cassette search?
A: Yes, you add them to your Genome Cart – however there are limits on #s of genomes for cassette search.
Q: Is there a way to get around the # limitation? Or another way to look?
A: you can change Preferences in MyIMG to increase the limit. Alternatively, you can save them in your Workspace.
Q: Do you mind walking me over how to extend that limitation?
A: Please go to “My IMG” menu item, and then select “Preferences”. There are limits on the number of search results list, gene neighborhoods, and so on. You can increase them (even to the upper limit) and then save preferences.
Q: I have noticed that for a dataset I was searching for the metagenomes available were less through the JGI genome portal. Why is that so?
Q: *compared to the IMG search results
A: There are multiple reasons why the number of datasets for the same sample may be different between IMG and Genome Portal: some of them may have been made “obsolete” based on request from P.I. or Project Manager and no longer visible in IMG, while the data still shows up in the portal. Alternatively, the data in the Portal may have been moved to the “hidden” status. In addition, there may be some data in the portal that wasn’t loaded into IMG, again based on the requests of the PI or Project manager.
Q: Is there a limit to the number of genomes one can download at once?
A: Yes, there are limits on the content of IMG Carts including Genome Cart (20,000). So you can’t request a download of more than 20,000 genomes at a time. However, irrespective of IMG limits, if you request download of tens of thousands of datasets, JGI Genome Portal may not be able to fulfill the request due to the sheer size of the data. In general, downloads work for tens to hundreds of datasets, not for thousands.
Q: Can I use as a reference a bacterial genome sequenced by DOE JGI to assembly my bacterial genome and publish the paper?
A: in theory yes, see the details of the policy here: https://jgi.doe.gov/user-programs/pmo-overview/policies/
You may or may not have to contact the PI of the genome depending on whether the genome is published or not.
Q: Is it possible to find the same metagenome/genome in IMG and Genome Portal by using the IMG Genome ID or Genome Portal uses different ID numbers?
A: Yes, IMG taxon ID is searchable in Genome Portals
Q: Is there a statute of limitations on how long a PI could restrict use of “unpublished” data? E.g. 5 years? 10 years?
A: Please refer to the JGI data usage policy: https://jgi.doe.gov/user-programs/pmo-overview/policies/
Briefly, for datasets generated for proposals accepted prior to Nov 2018, there is no statute of limitations and they remain in reserved status indefinitely until the PI publishes a paper. For datasets generated for proposals accepted after Nov 2018 the datasets transition to unreserved status 2 years after being made public.
Q: Can you export the files after this last table with Public data?
A: Yes you can, anything you reconfigure and redisplay is exportable.
Q: What is the reason that some genomes are not available?
A: Please see the document on data downloads in the IMG Help Section. Briefly, IMG data is virtually identical to the GenBank record and therefore IMG redirects you to GenBank for downloads. Or alternatively a private submitter has opted out from providing the data via Genome Portal.
Q: can she show us again how she eliminated the 43 non-public genomes from the cart (to get 26 remaining)
A: It will be on YouTube. Also please refer to the “IMG Tips and Tricks for table configuration and filtering” here:
Q: I would like to search some genes instead of the whole genome sequences
A: IMG provides multiple ways to search for genes of interest including keyword search in Quick and Advanced Gene Search, sequence similarity searches and export of genes retrieved by Function Profile. If you are interested all or any of them please respond to the webinar survey and indicate your interest in these topics in the Suggestions section.
Q: I’m interested in advanced gene searches (eg finding/downloading orthologs)
A: Please mention that in the feedback form. For now, please check out either Find Genes > Gene Search > Advanced Search Builder and feel free to ask us follow up Q using https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=Questions
Q: Why do I have more than one 16S rRNA?
A: Isolate genomes can have anywhere from one to 15 or more copies of 16S rRNA. IMG treats them as individual genes and assigns unique identifiers even if their sequences are identical.
Q: Because Natalia started with only 1K genes, she might be missing some other genes with that specific name?
A: yes, you’re right, she would have needed to change her preferences first and start over
Q: what if you want to search for genes across multiple genomes?
A: you can use Gene Search. Alternatively, you can use function profile in Function Cart. Briefly – add genomes/metagenomes to genome cart first, after that you can add Functions (e.g., Pfam, Tigrfam to Function cart – and then use “profile & Alignment” tab from Function cart
Q: Is BLAST the best way to do this?
A: If your query does not have an protein family affiliation (e.g., Pfam, COG, tigrfam, etc), then you will have to resort to BLAST instead, and cannot use Profile & alignment
Q: Are all GenBank microbial genomes in IMG, or do the owners have to choose to have them included in IMG?
A: we do not have all GenBank genomes – we try to maintain a broad “diversity” of genomes – so you will not see all possible strains of M.tb for example. If you are interested to see any particular GenBank genomes in IMG, please contact us through the question link.
Q: Are items like “Transmembrane helices” regularly recalculated (e.g calc’d on the fly) or are they locked in to the original annotation
A: TMs are precomputed at the time of genome loading. We didn’t see large changes when we switched to IMG pipeline v.5+, which uses a newer version of TmHMM.