mlballon, Author at MGM Workshops

IMG Webinar: Statistical Analysis Tool

June 16, 2020 By mlballon

Q(uestion): Can I record this webinar?
A(answer): The webinar is being recorded at our end, and will be made available on this playlist in our Youtube channel (we will send the direct link to this webinar). You will not be able to record on your end.

Q: Is it possible to do a statistical analysis by comparing the abundance of 16S or 18S genes in a metagenomic dataset?
A: No. The reason being we typically discourage using 16S or 18S gene information from metagenomes. Since these genes are highly conserved between different microbes, they are poorly recovered from metagenomes, so not the ideal markers to use.
Instead, comparison of taxonomic composition can be done using the taxonomic assignment of protein-coding genes on contigs in the Stats tool (see features under “By taxonomy”).

Q: is there a possibility to make such comparisons using metatranscriptomes?
A: It is possible to use Stats Tool with metatranscriptomes, although we don’t have accurate coverage (read depth) estimates for ALL of them. IMG currently treats metatranscriptomes (mostly) like metagenomes (with a few exceptions).

Q: In the case of metagenomes, were they assembled or not?
A: The vast majority of metagenomes on IMG are assembled. The current example presented in the webinar is also assembled and included read depth (i.e. coverage) information. However there are a few legacy metagenomes with unassembled data as well. We recommend NOT using the Stats Tool with these unassembled legacy metagenomes.

Q: Are there also phylogenetic aware methods available?
A: If by “phylogenetic aware” you mean methods based on an actual phylogenetic tree (e.g. UniFRAC), then no, not currently. We do have a tool in development that will use general linear models to use taxonomic (phylogenetic) composition as a fixed variable, but that will be a later release..

Q: Can we compare our 16S amplicon data with the 16S genes derived metagenomes from the database in this tool based on taxonomic information?
A: This is doable using Find Genes > BLAST > RNA and choosing 16S rRNA assembled metagenomes database from the pulldown. See: https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=WorkspaceBlast&page=16form

Q: Can this tool be used to compare two or more groups of genomes to infer the extent to which functions are maintained or lost among genomes? Example: I have reduced genome size endosymbiont genomes, and I wanted to discover functions that are lost or significantly reduced compared to a set of free-living relatives genomes.
A: Yes, it can be used to compare groups of genomes. Endosymbionts vs free-living relatives is a great use case for this tool.

Q : Can I use IMG for functional analysis of minION generated data?
A: Yes, IMG includes genomes and metagenomes sequenced with MiniON sequencers, however if the assemblies are not properly polished using e.g. short reads or very high coverage, then the resulting assemblies and protein prediction may be suboptimal, which will impact the quality of the comparison.

Q: Does the ANOVA test assume that you have previously tested the homogeneity of variances or normality?
A: Yes, it does. The pipeline does not check for normality of distribution, it is assumed. See details for each test in the user guide under “Default statistical method.”

Q : Can analysis by taxonomy be done on metatranscriptomes?
A: In principle, yes, comparison of taxonomy can be used for analysis of metatranscriptomes, but with the caveat that metatranscriptomes are often amplified during library prep, meaning that the abundances will be biased.

Q: Can estimated gene copies be used on metatranscriptomes?
A: </>Yes, estimated gene copies are available for most recently added metatranscriptomes, but not all.

Q: For taxonomic assignment of a scaffold, what is the rationale of at least 50% of the genes being required to have a hit?
A: We did benchmark the taxonomic assignment of scaffolds based on the majority rule (>=50% of genes with the hits to the same lineage). It produces very few false positives (defined as an assignment to a wrong lineage), but quite a few false negatives (defined as an assignment to a correct higher-level lineage, such as phylum or class, when in principle an assignment to a lower-level lineage, such as order or family, was possible). We prefer to err on the side of caution with this rather conservative threshold for assignment.

Q Is the taxonomic affiliation similar to GTDB?
A: No, the taxonomy in IMG is NCBI, not GTDB.

Q: Is GTDB taxonomy available for genomes?
A: GTDB taxonomy is available only for metagenome bins, not isolate genomes (see Metagenome Bins webinar for more information).

Q: Does “carefully curate” (in the presentation) mean high quality genomes and scaffolds in each set?
A: Curate means multiple things: high-quality genomes, metagenomes with similar levels of coverage (avoid comparing metagenomes with drastically different total assembled length and or gene count). But it also means curating the “metadata”: e.g. if you want to compare freshwater vs marine genomes for a specific taxon, curating would be making sure that the genomes in the “freshwater” group are really from freshwater environments, etc. Descriptions could be inadequate or misleading in some cases – it’s always better to either refer to a publication for the samples or contact the P.I. for additional details that may not be captured.

Q: For a publication, can I do a statistical comparison like the demo example with 2 samples with two different sample sizes? 30 and 14?
A: Yes, you can use samples with different sizes (although ideally within the same order of magnitude)

Q: What should the minimum number of samples be in each set for good statistics?
A: There is no real “minimum” number of samples, you could technically do a comparison between 2 groups of 2 (meta)genomes each. The important part would be to not over-interpret the results, i.e. if your groups are small (e.g. less than 10, or even less than 5), then any results should be carefully presented as only based on a very small sample size.

Another consideration: the “minimum” number of samples depends on how different are the samples you’re comparing (in a sense of a statistical power, please see https://en.wikipedia.org/wiki/Power_of_a_test for a discussion of the relationship between sample sizes and magnitude of the effects). Generally speaking, if you compare human gut to marine samples, you’re likely to find significant differences even with 2-3 samples per group. On the other hand, if you’re comparing soil samples from different plots, even 50 samples per group may not be enough to find statistically significant differences. In such cases you may want to download the “Full Results” file generated by the tool and try more powerful tests better suited for your particular example.

Q: Can a genome can eliminated from genome set
A: Yes you can edit a genome set. View the list of genomes in the set. Select the genomes you want to save (i.e. all the genomes except the one you would like to eliminate). Click the “save” tab. In “save to workspace” choose “replace” option.

Q: If we have submitted our metagenome sequences in NCBI database, is it possible to pull the study into IMG?
A: IMG does not automatically import metagenomes from NCBI, however you can submit your (assembled) metagenomes to IMG to get it annotated and have it available for this type of statistical comparisons we are showing today. Please view our webinar on the topic of data submission.

Q: Using the Mann-Whitney test, I get a p-value of 0.0005 for many features comparing two metagenome groups…but the false discovery rate (FDR) adjusted p value is increased to 0.5….what does it mean? What is most important – p -value or adjusted p-value?
A: “Adjusted p-value” is the one you should be looking at (it is adjusted for the number of comparisons performed). Here’s something to read about false discovery rate and why it’s important for genome and metagenome analysis: https://en.wikipedia.org/wiki/False_discovery_rate

Q: Mean here means…..mean number of scaffolds?
A: “mean” is the mean count of the feature being compared. For instance, if it’s one of taxonomic levels (like class or phylum), then it’s the number of scaffolds assigned at this level, multiplied by their coverage if the “estimated gene copies” option was selected.

Q: Can you do this analysis at the genus level?
A: yes, you can. Just keep in mind that there will be fewer contigs with taxonomy assignments at the genus level than at higher levels; meaning that the results may be skewed (e. g. if you’re comparing communities with many populations that don’t have close relatives in the isolate genome reference database that is used for scaffold lineage prediction).

Q: Can we perform these analyses with central log ratio transformations?
A: You can’t do this directly in IMG, however the “Full results” file you download will include the input matrix of feature counts, which you could then use in your favorite statistical analysis software to try more transformation and/or tests.

Q: Is it possible to restrict features to a specific taxon. For example, can I compare KOs between my groups of samples only for scaffolds annotated as Archaea?
A: This feature is being beta-tested at present. We hope to make this type of comparison possible in the near future!

Q: So what if I am interested in organisms that are not present in the IMG database, such as nematodes or fungi?
A: IMG is very prokaryote-centric. We don’t load eukaryotic genomes other than as a reference for metagenome analysis.

Q: KEGG mentioned that using their resources requires a licence. Does this hold when using them through IMG?
A: IMG has a KEGG licence which enables us to provide it to our user, so you don’t have to worry about having a licence on your side.

Q: Once approved, do you know how long it would take to annotate an assembled metagenome?
A: Rule of thumb would be a few weeks for an average sized submission, however it could also take several months, depending on the current load of the system, the size of the backlog, and the size of your dataset.

Q: How efficient is this stats tool for comparing novel species that may contain novel functions not represented in these functional annotation databases (like KO, Pfam, COG, Tigrfam, etc)?
A: For highly divergent or novel groups, there is a possibility of missing out on additional differentially abundant conserved hypothetical proteins that are not captured by any of these databases. On average, about 80% of total CDS of bacterial genomes are assigned to a function annotation database.

Q: How does assembly impact the relative gene/function counts relative to unassembled metagenomes?
A: That is what “estimated gene copies” measurement tries to address – by multiplying the feature count by the average read depth of the scaffold. However please ascertain that all the samples in all your groups do indeed have coverage information available to calculate estimated gene copy. If even one sample in your input is missing this information, the stats tool will report an error. While most of the JGI-sequenced metagenomes do possess this information, many others do not.

Q: Can you graphically summarize the results of the stat analysis in this set-up such as the ternary plot example from Vigneron et al 2017 Sci Rep.?
A: For advanced visualization, we recommend downloading the full results set and using external statistical software (e.g. R).

IMG Webinar: Analysis Carts & Workspaces

June 9, 2020 By mlballon

Q(uestion) : Hi, When will the record of this webinar be available online?
A(answer): Webinar recording(s) will be made available after the last one on June 16th, 2020

Q: Is there a maximum number of genes that IMG can save in a cart/workspace?
A: There is a limit to the amount of data in Workspace. It is 5Gb per user, the combined size of all sets and files being generated, like fasta files for export. The limit on the carts is 20,000 (20,000 genomes, 20,000 genes, 20,000 scaffolds, 20,000 functions). Note that you may have to change your preferences by going to MyIMG -> Preferences to increase the display limits to the maximum number of objects.

Q: Can you share the link for the public query?
A: Here it is: https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=Workspace&page=public_genome_search_history
And you can view the Advanced Genome Search Webinar for more information re. genome search: https://bit.ly/IMGwebinar20GenomeSearch

Q: Why is only this module selected and not the others?
A: It’s relevant to the demo biological example that was outlined in the slides. She is interested in seeing if N2 fixation and nodulation functions are present across all Bradyrhizobia.

Q: I missed what the difference between function vs genomes and genomes vs function was.
A: Selecting one or the other will determine rows vs columns orientation. Function (rows) vs genomes (columns) or vice versa

Q: Is the search based on annotations metadata or sequences?
A: It’s based on sequence annotations. Protein CDS are searched against (for example) Pfam, COG, or Kegg Orthology databases – you could do the same search using other annotations as well. You are then querying the results of these annotations using IDs or search terms.

Q: Is it possible to not find a gene due to outdated annotation from several years ago?
A: Yes, it is a possibility. Some very old genomes in IMG were annotated using an older pipeline, those genomes do not undergo structural re-annotation (that is, there are no updates of ORF finding and CDS calling). However, there are updates of functional annotations of existing CDS – even for older genomes (isolates only) – Pfam, Tigrfam, COG, KO, etc assignments are updated with the latest versions of these databases.

Q: And the other explanation might be due to the horizontal transfer of those genes which do not share the characteristics of the rest of the genome
A: Yes, agree. Horizontally transferred regions would be also missed by the binning process, and so will be other regions with deviant oligonucleotide composition, like ribosomal operons, if they are found on a separate contig, and not assembled with a large fragment of core genome.

Q: Would you please explain more why the sequences from a single genome share similar coverage?
A: Coverage is interpreted as a proxy for “number of copies of the genome in the original sample”. While the sequencing process will include a fragmentation of these original templates in short “reads”, the number of these short reads should be consistent across the whole genome. The only exceptions would be actively replicating genomes, in which the chromosomal regions surrounding the origin of replication will be present in more than 1 copy. For instance, in an exponentially growing E. coli, the regions around the origin of replication may be present in 6 or even 8 copies. However, exponentially growing bacteria are rare in environmental communities. In contrast, plasmids can exist in multiple copies per cell, i.e. for one copy of the main chromosome, you could have multiple copies of the plasmid. That means their coverage will sometimes be different from the one of the main chromosome. Another more simplistic way to think about it – if you have 2 genomes in a sample – one is present at very high abundance in the sample, the other genome represents a minor fraction of the sample – assuming both are sampled and sequenced uniformly – you should see a consistent and more or less uniform higher coverage for the abundant genome and low coverage of the minor genome. But as stated above, there can be natural variances that confound this.

IMG Webinar: Sequence Similarity Searches

May 22, 2020 By mlballon

Q(uestion): Is it possible to have access to the first part of this Webinar series?
A(nswer): Yes, IMG conducted a pilot webinar series with four topics – these individual webinar recordings can be found here in the “PAST WEBINARS” section.

Q: Is it free to use the Workspace?
A: Yes, it’s free. Workspace is only available in IMG/MER version however, for which you need to request an IMG account, which is free too. All IMG tools and services are free to users.

Q: Are there any IMG tools to create consensus sequences?
A: Sort of. You can generate a Clustal based alignment of genes (nucleotide or protein) residing in your gene cart using the “Sequence Alignment” tab, and clicking on “consensus” in the resulting alignment viewer. However this is not the ideal strategy, we recommend using alternate tools that are designed explicitly for this purpose and provide a full range of functionality for editing and creating multiple sequence alignments and consensus. If you’re interested in online resources in particular, we recommend: http://www.phylogeny.fr/index.cgi

Q: What is the différence between the blast and psi-blast?
“PSIBLAST uses position-specific scoring matrices (PSSMs) to score matches between query and database sequences, and assigns higher weight and larger scores to highly conserved positions in the alignments to multiple subject sequences. Searching with PSSM can detect hits to more distant sequences (<20% sequence identity), as long as they have these conserved positions. In contrast, BLAST uses position-independent scores, i. e. the same scores for all positions in the alignment, which can be found in scoring matrices such as BLOSUM62, regardless of whether they are highly conserved or highly variable. Please read further: https://www.ncbi.nlm.nih.gov/books/NBK2590/

Q: In a BLAST pairwise protein alignment, what does the + represent?
A: “+” symbol denotes similar “chemical property” – the two amino acids are in the same “class” and can possibly replace each other without affecting the overall function/structure of the protein.

Q: Does JGI offer services for human stool sample or oral sample’s 16s or metagenomics or metabolome sequencing?

A: No it does not, JGI is funded by the U.S. Department of Energy (DOE) and DOE focus research areas are restricted to biogeochemical cycles, bioremediation, biofuels and such. For JGI’s product portfolio, please see: https://jgi.doe.gov/our-science/product-offerings/. To gain access to these capabilities, please review our user program pages and submit a proposal.

Q: Does JGI IMG database contain data derived from human samples?
A: Yes, it does. IMG does have numerous isolate genomes as well as environmental metagenomes arising from human host-associated environments (such as human microbiome project (HMP)).

Q: If I have 250 genes in my gene cart and I want to get (using blastp) only the sequence of the top hit isolate (top blastp hit) for each gene (not all alignments, only the sequence of the best hit/alignment), is there a way to automate that search? (instead of going gene by gene)
A: You would have to break your list of query sequences into multiple batches and submit searches using Find Genes > BLAST > All Isolates and set the “number of hits” to 1. This will produce a results table with the top hit per query sequences. The query limit is 10,000 characters (including headers) – so assuming an average protein query length of 300 aa, you can submit about 30 query sequences in each batch and repeat about 10 times to get your results for 250.

Q: Is there a way to find the top isolate hit with the gene neighborhoods option

Not at present, one workaround (IF your query gene is assigned to a COG or Pfam only) would be to use Find Genes > “Cassette search”. Alternatively, you can use “Top IMG Homologs” option to retrieve a set of homologs, add the best isolate hit (or multiple hits) to Gene Cart and use “Neighborhoods” tab to visualize their neighborhood conservation.

Q: What’s the difference between LAST and BLAST?
A: Citing http://last.cbrc.jp/: “The main technical innovation is that LAST finds initial matches based on their multiplicity, instead of using a fixed length (e.g. BLAST uses 11-mers). To find these variable-length matches, it uses a suffix array (inspired by Vmatch). To achieve high sensitivity, it uses a spaced suffix array (or subset suffix array), analogous to spaced seeds (or subset seeds)”

Q : How did you get to the Advanced Genome Search page?
A: https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=GenomeSearch&page=searchForm

Q: How do you see the list of pre-loaded public advanced genome search queries?
A: Public list is here: https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=Workspace&page=public_genome_search_history

Q: Does registration allow for analysis of >1000 bacterial genomes? if yes, Is there a roof to genomes that can be analysed in parallel?
A: There are various comparative genomic tools in IMG with varying “limits” – some of these limits can be extended by using Workspace functionality. If you mean BLAST search of 1000 genomes – “All Isolates” BLAST runs against the entire IMG database, which is significantly larger than 1000 genomes. However, there are limits placed on the size of the query (up to 10,000 characters including the headers – so multiple query sequences may be submitted) as well as the number of hits (<=500), i. e. you may not be able to see hits from all genomes. If you need an exhaustive search to find hits in each and every genome out of your set, you should use Find Genes -> BLAST -> Selected Isolates (up to 100 genomes/metagenomes in one batch) OR Workspace -> Genome Sets -> BLAST (up tp 500 genomes/metagenomes in one batch); in this case the limit of <=500 applies to each individual genome/metagenome, not the total count of hits displayed.

Q: On what basis was Cultivation metadata selected?
A: Please view this webinar on Advanced Genome Search that explains functionality and organization of underlying metadata.

Q : Can we upload and analyse our own genomes or metagenomes using IMG?
A: Yes, please visit our submission page and follow guidelines. You can also view our Data Submission and Management webinar

Q : Does IMG have all the prokaryotic sequence If we compare with NCBI
A: No, not all NCBI data. There is a small backlog at present. Also our objective is to have a broad diversity represented – so not every available strain of every genus and species will be loaded (especially for cases like Mycobacterium tuberculosis or Staphylococcus aureus with 1000s of available strains), but at least one strain of every available species should be included. Please contact us if you find specific genomes are missing and would like to see it added to IMG.

Q: So IMG has what they sequence from JGI and external submitters?
A: Yes, IMG contains all JGI-generated datasets, as well data from external submitters and NCBI. For isolate genomes, as stated above, we do import all public data from NCBI (e. g. IMG has only a small subset of strains from human pathogens). For metagenomes as well, there is a large number of non-JGI-generated datasets (for eg., HMP, TARA oceans), but it does not contain every dataset in the public domain.

Q: Does IMG do metagenomic binning?
A: This is provided as a service for JGI sequenced projects primarily. At present, we have auto-generated 89,672 bins from 11,061 public metagenomes. Please visit our Metagenome Bins page and view the webinar on this topic.

Q: Are the databases available for running a command line BLAST? Does IMG have an FTP resource?
A: All JGI-generated datasets in IMG can be downloaded in bulk using the Genome Portals pages. However, the downloads include only fasta files, from which BLAST databases can be generated, not the databases themselves. We don’t provide BLAST databases for download, since generating them is trivial if you have a fasta file.

Q: do we prefer draft genome while running similarity search or complete sequence
A: Depends on your objective. Since draft genomes could have physical or sequencing gaps, “absence of evidence” against such a genome does not confirm “evidence of absence”.

Q: do you have database of non coding RNA?
Yes, we have the databases for non-coding RNAs. For genomes we have a combined database for all non-coding RNAs (all rRNAs, tRNAs, as well as others, such as tmRNA, RNase P RNA component, etc) and 16S rRNA database. For metagenomes we have specialized databases for 5S, 16S, 23S, 18S, 23S, 28S and other non-coding RNAs (including tRNAs, tmRNAs, etc). These are generated by collecting the respective sequences from assembled metagenomes and metatranscriptomes.

Q: Do you allow submission of MAGs
A: In principle, we do accept submission of MAGs in 2 ways: a small set of high-quality MAGs can be submitted as genomes, while for large sets of variable quality we prefer that you submit a metagenome and these MAGs as bins (see IMG Metagenome Bins as an example of this format). Please contact us via “Contact Us” link in Help section for additional information.

Q: What is the difference between IMG search and NCBI BLAST?
A: The IMG Gene Search by BLAST uses the BLAST program same as in GenBank, but the database that is searched is different. Differences in databases arise from different content in IMG versus GenBank.

Q: Also could we use that tool for plant genomes?
While plant genomes sequences are included in IMG, they may not be completely updated with all public genomes. JGI has the Phytozome portal dedicated to plant genomic analyses.

Q: Are the TARA oceans metagenomes and metatranscriptomes in the IMG database?
A: Yes, they are.

Q: In IMG, can we work on 18S rRNA metagenome amplicon analysis
A: IMG does not contain amplicon data, however 16S or 18S rDNA genes are predicted (to the extent possible) from metagenome assemblies, and can be analyzed further or downloaded. In addition, you can run BLAST against IMG collection of 18S sequences collected from metagenomes using Find Genes -> BLAST -> RNA. You can use a multi-fasta file with multiple sequences as a query, but keep in mind that the query size is limited to 10,000 characters

Q : What is the minimum cut off for percent identity if 16S for same species
A: The species %identity cutoff reported in many publications is 97%, but that is not a hard and fast rule.

Q : does it provide sync to R stasts
A: We are not sure we understand the question. BLAST results can be exported as tab delimited files, and then used with any software package that understands tab-delimited format. IMG does not provide BLAST via API calls.

Q: Can we compare more than 3 genomes on this platform?
A: Yes you can. A variety of options are available under “Compare Genomes” and elsewhere. Please stay tuned for upcoming webinars.

Q: When I BLAST a non-16S sequence query against a metagenome, I often wonder what correct thresholds of % identity are appropriate, or even if this question is appropriate.
A: If you mean thresholds for taxonomic assignments, please review: https://academic.oup.com/nar/article/42/8/e73/1076763

Q : Can we search for a specific sequence against multiple metagenomes via BLASTP?
A: Yes. Add your metagenomes to the cart. Go to Find Genes > Blast > Selected Genomes to launch your BLAST job. Only 100 genomes/metagenomes can be searched at one time. IF you want to search up to 500 metagenomes at once, use the Genome Set BLAST functionality in the workspace.

Q: While filtering a table, can you do a negative filter (not plants)?
A: Yes, table filters do allow regular expression searches – please review this webinar at minute 26:39 on how to do this. Also more details were provided in the “Tips and Tricks” document linked to the Q&A transcript of that webinar.

Q: Does finding hits from an aquatic metagenome really means that the bacteria lives in this environment? Or could it be DNA washed from the soil and then recovered from water?
A: Yes, sporadic occurrences are always a possibility, even though the sequences that end up in IMG metagenomic rRNA databases are usually those of relatively abundant lineages. But it would be better to assert a claim based on multiple instances from multiple datasets or studies.

Q : I have my set of genomes saved and I want to find a specific gene but I just have its function. So I went to “find function tool” – gene product name. I got a table with all the hits throughout my genomes. Is there a way to download the alignment and also see the gene neighbors for all genomes I am working with?
A: To view an alignment of your genes, you need to add them all to the gene cart and use the “Sequence alignment” tab to generate one. For the gene neighborhood, see the answer to the next question (below).

Q: can we run the same homolog search like this NifH gene search by using a gene cluster (neighbor genes)?
A: It depends on what kind of information you expect to find as a result of such search. If you want to see neighborhood conservation (i. e. gene order, orientation, and functions), you can use “Top IMG Homologs” to find best hits of your query gene, then add them to Gene Cart and use “Neighborhood” tab to review the conservation of chromosomal neighborhoods. This viewer doesn’t show best hits, instead it shows assignment of neighborhood genes to protein families (specifically COGs). If you want to make sure that the neighborhood genes have hits in the same genomes within the same % identity range, you can use Find Genes -> BLAST -> Selected Genomes or Workspace -> Genome Sets -> BLAST with sequences of several neighbor genes as a query. On the other hand, if you are looking for all genomes in which two genes of your interest are found next to each other (or within a certain distance of each other), we are in the process of developing a “Cluster scout” tool to find conserved clusters of genes – at present we have Find Genes > “Cassette search”- you would need to use COG or Pfam (for NifH and its neighbor).

Q : A question just out of curiosity. Have you collaborated with NCBI (or any other database) to get all these sequences and information?
A: No, we don’t collaborate with NCBI on this particular task. We do collaborate with NCBI on the issues of taxonomy and genome submission. In addition, we import NCBI isolate genomes and process public data from SRA to include in IMG (when we have the bandwidth).

Q: When %identity is high but with a small coverage there is the possibility of only a conserved shared domain. Is filtering with both %identity and coverage possible?
A: Yes, you can filter the BLAST results table using alignment length – however you cannot specify a minimum alignment length in the BLAST input page, since this is not a default parameter that can be included in BLAST search.

Q: Hello, I have NirK sequences extracted from my metagenome and I would like to determine taxonomy for each gene. I guess I would use blastp but I have hundreds of sequences. Is it possible to determine taxonomy in IMG or do I have to download sequences and classify them via some other program?
A: We assume your metagenome has not been submitted to IMG. You have several options: 1) if your metagenome is assembled, you can submit it to IMG, in which case IMG will generate the best LAST hits for all proteins in your metagenome including NirK, as well as attempt to assign the scaffold “lineage” (i. e. predicted taxonomy) to all scaffolds, including the scaffolds on which NirK is found. Alternatively, you can run multiple queries using Find Genes -> BLAST -> All isolates. The query is limited to 10,000 characters, so you can submit multiple sequences as multi-fasta. That said, even for deeply sequenced, but assembled metagenomes, we haven’t seen instances of hundreds of NirK genes. Are you sure it’s not amplicon data or unassembled WGS sequences? If the latter, taxonomic assignment won’t be terribly accurate, so it may be better to try and assemble the data. The correct way is to curate a reference set of established closely related NirK sequences and align and tree with your metagenome candidates.

Q: Will we be receiving emails for the next webinars
A: Yes, you will

Q: When looking for a gene product, if I’m not sure about the name (nodule or nodulation perhaps) can I type nodul# to allow anything starting with nodul?
A: Yes, every search term is treated as a partial search – there is no need for “#” or anything else.

Q : is there a way to get the exported fasta file as a download and not just the text in a webpage?
A: Yes, save genes to the workspace gene set. Export genes from here.

Q: Is there a recommend tool to assess phylogenetic diversity or similarity between all predicted BGCs within a collection of bacteria, eg. a genus or species of bacteria?
A: Please explore tools and content in IMG-ABC, a data mart dedicated to biosynthetic gene clusters (BGC) for secondary metabolites.

Q: How to blast with multiple queries?
A: You can submit more than one sequence in the query window. However, there is a 10,000 character limit on the total length of the query. If you want to run BLAST with multiple sequences totaling more than 10,000 characters (including fasta headers), you may have to split it into 10,000 character chunks.

Q: can we get a certificate of participation for these seminars
A:, sorry, we don’t give certificates for webinars. But if you attend an in-person MGM workshop at JGI, you will get a certificate.

GOLD-IMG Webinar: Data Submission & Management

May 14, 2020 By mlballon

Q(uestion): For projects submitted to GOLD by external users, do we need to submit our sequences to NCBI first to get the NCBI umbrella Name/ID? Or can it remain an empty field?
A(nswer): No you do not need to submit your sequences to NCBI before submitting project metadata in GOLD. To reiterate what Reddy said earlier, NCBI Umbrella BioProject only applies to large scale studies with several projects under them. example Human Microbiome Project

Q: How do you determine the sequencing quality of your project?
A: You can find more information in our help section under GOLD Terminology and in this publication.

Q: Do the external references come from NCBI?
A: On a GOLD AP external references such as GenBank ID, Assembly Accession, Release date and SRA Runs come from NCBI. Once your genome is annotated in IMG, the IMG Taxon ID will show up in the external references section of the AP.

Q: How long does the entire submission process usually take?
A: That depends on whether it’s an isolate or a metagenome. Isolates generally get processed within 2 weeks; metagenome loading depends on the current IMG workload. They may show up within the same 2 weeks or take a lot longer if we have to process a large batch of JGI-generated data, which takes priority.

Q: Can we get a copy of Marcel’s slides too?
A: They were emailed to you – they’re all in a single pdf and should follow GOLD slides.

Q: If you have a study where you want to annotate genomes of several isolates, do you need to submit individual sequencing projects for each individually?
A: For large submissions, email GOLD and IMG teams.

IMG Webinar: Metagenome Bins Q&A

May 1, 2020 By mlballon

Q(uestion): Are MIMAG values provided through IMG? Or do we need to calculate these values ourselves based on info available on IMG?
A(nswer): IMG provides CheckV completeness and contamination and RNA counts for each bin, and indicates the corresponding quality tier following the MIMAG standard

Q: Do you check MIMAG compliance of bins loaded into IMG?
A: We do, and we only load bins with medium or high quality (i.e. low quality bins will not be available through the IMG interface).

Q: Any control for “strain heterogeneity”?
A: Not directly, i.e. a genome bin is a “population genome”, and may include genome fragments from several closely related strains

Q: If one searches for genomes with Find Genomes –> Genome by Taxonomy, they can find not only genome from isolates but also MAGs. What is the overlap of those MAGs with the bins found via the path showed now in the presentation?
A: MAGs available through “Find Genomes” were obtained from individual users or legacy MAGs imported from NCBI. These are genome bins from which the corresponding metagenome may or may not be in IMG, and for which we expect the submitter did some level of curation. Our policy is to not import MAGs from GenBank at present given the variable quality of this data. The genome bins that IMG generates and displays through the “Search Bins” are automatically created by the IMG pipeline and not curated, so are not included in the “Genome Search” but only available through this specific “Bin Search”.

Q: Does the quality category mostly refer to the completeness, or some other metrics?
A: The quality is a combination of completeness and contamination per the MIMAG standards (go here to read the paper).

Q: Are all the bins available in IMG created with the same pipeline? e.g. same versions of tools etc?
A: The pipeline is essentially the same in a sense that it uses the same MetaBat, CheckM and GTDB-tk. However, their versions may be different. This is why the pipeline information is listed for each bin.

Q: Are bins representing only the core genome of the population? I guess that this population would correspond to a species but not to higher taxonomic levels, is it right?
A: Correct, bins will tend to represent the core genome of the population, and “accessory” regions of the genomes can be missed, i.e. would be unbinned scaffolds/contigs.

Q: I noticed some metagenomes have different assembly versions. Often there’s a newer version of the assembly made with SPADES. It seems the bins are added only for this SPADES assembly version. Do you plan to add bins for the other assembly versions – such as done with Megahit, or combined assemblies
A: At this point, there is no plan to go back to older datasets and bin these, as we prioritize binning newly submitted metagenomes.

Q: What’s the maximum number of scaffolds/bins that can be selected at once?
A: The limits are set by IMG preferences in MyIMG menu. By default they are set at 1,000, but you can increase it.

Q: I don’t quite get the difference between adding the scaffolds to adding the genomes to your cart
A: One thing that may not be intuitive is that, for IMG, a “Genome bin” is a list of scaffolds, but it is not a genome. That is the reason why bins do not appear in “Genome Search”, and why users cannot add genome bins to a “Genome Cart”. For bins, adding to the “Genome Cart” would instead add the corresponding metagenome to the cart.

Q: Can you reduce the bins by taxonomy to only search across select metagenomes, i.e. only the metagenomes in my cart? What if we are interested in just a set of metagenomes ?
A: This is not possible with the “browse” visualization, but is available through the “Bin Search” page.

Q: The Genome Portal has places it allows regex. Can you use regex in these searches/any IMG search box?
A: You can’t use regex in the “Quick Search” field on top of IMG page (as far as I know). However, specific fields in the Bin Search (and other advanced search tools) do allow for regex. The way you should format your regex is detailed at the top of the “Advanced Search Builder”. Note that I use “regex” loosely here, what is available is mostly partial text search and “range” search for numeric fields. In addition, IMG allows regex filtering of most tables, i.e. you could use regex on the result table you would get from a search. Please refer to the previous webinar for details.

Q: What is the difference between bin lineage and gtdbtk lineage given the fact that the img pipeline uses gtdbtk to infer phylogeny?
A: IMG pipeline usees its own taxonomic classification tool which is not gtdbtk. The two classifications are different in terms of the tool used to infer taxonomic placement, and in the underlying taxonomy as well. You can see the GTDB classification here. IMG “lineage” is instead based on NCBI taxonomy.

Q: Can I BLAST a particular gene sequence within a specific set of binned metagenomes?
A: No, right now BLAST is available only against the entire metagenome. However, you can filter BLAST results to retain only hits to scaffolds within bins using tools in the Scaffold Sets tab in Workspace, like Set Operation.

Q: Is there an API version for filtering and exporting data similar to the GUI?
A: At this point, there is no API which allows to browse and search IMG genome bins as done through the GUI. But

Q: How can you download IMG bins?
A: You would add the scaffold from the bin you are interested in to your cart or set, and you can download the scaffolds from there, as shown in the previous webinar.

Q: How can I export the specific metagenome bins after I’ve got the ones I want? Or export all the genes (possibly tens or hundreds of thousands) out of the specific bins ? I’m pretty sure you can’t do more than 20,000 genes from the IMG GUI
A: You can add them as Scaffold Sets to your Workspace and then use Export there to generate a nucleotide fasta file. Note that this is limited by the size of your workspace, so indeed will be limited to 20,000. You can also add all genes on these scaffolds as Gene Set in your Workspace and export their protein fasta from there. For large-scale studies, the easier would indeed be to retrieve the metagenome of interest through the genome portal, and then use the list of scaffold exported for each bin to “reconstruct” bins on your side.

Q: Is blast search as you described for Martha the only way to challenge metagenomes with specific nucleotide sequences? In other words, is that the only method to do an “in silico” primer search?
A: Correct, this would be the way to do this type of primer search using IMG.

Q: I want to search for some species of bacteria within a set of 3k metagenomics. The easier way to go is through IMG/MER > Bins by taxonomy or to so “Compare Genomes” > Genomes within Metagenomic?
A: Since only medium and high quality are included in IMG, in order to detect species in metagenomes with some sensitivity, I would suggest using “Compare Genomes”. But as Emiley mentioned in the webinar, it mostly depends on the analysis you would like to run: i.e. is it more comparative genomics, or detection of a species across metagenomes. For the latter, as a quick search to see that the species is present at all, we would recommend IMG BLAST->RNA with 16S or 23S rRNA sequences. Don’t forget to change e-value to e-50 or so, or else you will get too many hits.

Q: Can you download scaffolds like you can download genes/genomes from your genome cart (like last webinar)?
A: Correct, once you have added the scaffolds to your cart, or the genes to your cart, download would work the same way as presented in the previous webinar

Q: And is it possible to add bins from multiple metagenomes in the scaffold cart and download them all together?
A: Yes, you can add multiple bins to Scaffold Cart or copy them to your Workspace Scaffold Sets and export all of them at once.

Q: There’s a cart limit that is quickly reached in these fragmented in the number of scaffolds = so is there a Globus version?
A: IMG is not the place for batch downloads. However, you can export bins in the form of scaffold ids (again, preferably from your Workspace and not Scaffold Cart), and then extract the corresponding scaffolds/contigs from the nucleotide fasta file you’ve downloaded via Genome Portal.

Q: I tried to export a set of rRNA genes (20K) from a metagenomic, however I was able to add only 999 rRNA genes to the gene card. How do I go about that?
A: Two things can help here: 1. Change your IMG preferences to increase the limits of gene lists. This is under MY IMG menu: go to prefs, and set gene list to 20000 max or 2. try creating a gene set in Workspace and export it from there.

IMG Webinar: Data Export and Download Q&A

April 27, 2020 By mlballon

Q(uestion): Is it possible to also find the raw/clean reads of metagenomes in IMG?
A(nswer): For JGI-generated datasets, reads can be accessed through the Genome Portals site. Portals Website: https://genome.jgi.doe.gov/portal/

Q: For the functional annotations, are the annotations updated en masse? I.e., for PFAMs, for an organism that was annotated 5 yrs ago, if I want to compare it to one that I just finished, are the PFAM data ‘equivalent’ for the two organisms?
A: For isolate genomes, yes, there is a backfill process to update Pfam assignments. For metagenomes annotations are not updated. However, specifically for Pfams, the majority of them have been stable for the last 5 years or so.

Q: Is there some place to find the date of the backfill, for publication purposes?
A: Please check the genome add date in the Genome Details page. We generated the download files when we loaded the genome into IMG. The log of system-wide annotation updates (e. g. Pfam refresh) can be found in the IMG version page and also users are notified via IMG user forum.

Q: When I download the bacterial genome I have three files (taxonomy, genes and intergenic). Because I dowloaded the bacterial genome to be used as a reference in an assembly of my isolates, my question is whether the taxonomy FASTA file contain the genes that is in a separated file.
A: The standard IMG tarball does not include the taxonomy data. We may be mistaken, but you probably refer to the nucleotide fasta file of the genome assembly, which indeed has the name of the genome (taxonomy) in the fasta header. If we understood you correctly, then the answer is – yes, this fasta file includes the entire nucleotide sequence of the genome, both the genes and intergenic regions.

Q: In gff files of some genomes, there would be more than one IDs of product=16S. Are the sequences of the corresponding IDs always the same?
A: You’re right – some isolate genomes have multiple copies of 16S. In IMG each of them will be assigned a unique numerical gene id. Even if their sequences are identical, their locations on the chromosome are different. However, some genomes have multiple divergent copies of 16S. This is very rare in Bacteria, but not that uncommon in Archaea. The divergence of 16S copies in the same archaeal genome may be as high as 5%.

Q: Hi, what would be the easiest way to find if the dataset was published?
A: IMG stores this data in the “Is published” field. You can access this information in multiple places: you can use Advanced Genome Search with the field “Is published” (in “Sequence Assembly Annotation” category) to select only published or only unpublished genomes by setting the filter to “Yes” or “No”, respectively. You can add the column “Is published” to your search results or in your Genome Cart. Lastly, you can go to the Genome Details page of your dataset of interest. If you see a “Genome Publication” field with publication details (normally under the Project Map), this dataset is published.

Q: To be sure: if this is IMG data I can’t not use it?
A: It depends on whether the data is public or private, and for public data it depends on whether it’s a JGI- generated dataset (“Sequencing Center” field contains “DOE Joint Genome Institute (JGI)”) or not. Non-JGI generated public data can be used freely. JGI-generated published data can be used freely with an appropriate citation. Use of the JGI-generated unpublished data (“Is published” field is “No”) requires that you ask permission from the PI of the dataset (the information stored in the fields “Contact Name” and “Contact email”).

Q: What about to use data from gene neighborhoods? Where does that fall?
A: This is a pretty specialized topic, and we are not sure how many participants are interested in it. In general, IMG has several places where conserved neighborhoods are shown: on the Gene Details page of your gene of interest you can click on one of the “Conserved Neighborhoods” links (next to the chromosome neighborhood graphic display). You can also find homologs of your gene of interest either in an entire IMG (use “Top IMG Homologs” on Gene Details page) or in your set of genomes of interest (use “IMG Genome BLAST” on Gene Details page). Then add the homologs to the Gene Cart, and view gene neighborhoods in the “Neighborhoods” tab.
You can also search for conserved cassettes, which are based on the protein families of interest (rather than specific gene of interest). This option can be found in the Find Genes menu. Lastly, IMG-ABC allows you to search for co-occurrences of Pfams via the ClusterScout tool. Even though the tool is currently available only in IMG-ABC, it actually searches through the entire genomes, not just biosynthetic gene clusters. Please refer to IMG-ABC publications and IMG-ABC site for details.

Q: Can we save a set of genomes that we did from a search to later do a cassette search?
A: Yes, you add them to your Genome Cart – however there are limits on #s of genomes for cassette search.

Q: Is there a way to get around the # limitation? Or another way to look?
A: you can change Preferences in MyIMG to increase the limit. Alternatively, you can save them in your Workspace.

Q: Do you mind walking me over how to extend that limitation?
A: Please go to “My IMG” menu item, and then select “Preferences”. There are limits on the number of search results list, gene neighborhoods, and so on. You can increase them (even to the upper limit) and then save preferences.

Q: I have noticed that for a dataset I was searching for the metagenomes available were less through the JGI genome portal. Why is that so?
Q: *compared to the IMG search results
A: There are multiple reasons why the number of datasets for the same sample may be different between IMG and Genome Portal: some of them may have been made “obsolete” based on request from P.I. or Project Manager and no longer visible in IMG, while the data still shows up in the portal. Alternatively, the data in the Portal may have been moved to the “hidden” status. In addition, there may be some data in the portal that wasn’t loaded into IMG, again based on the requests of the PI or Project manager.

Q: Is there a limit to the number of genomes one can download at once?
A: Yes, there are limits on the content of IMG Carts including Genome Cart (20,000). So you can’t request a download of more than 20,000 genomes at a time. However, irrespective of IMG limits, if you request download of tens of thousands of datasets, JGI Genome Portal may not be able to fulfill the request due to the sheer size of the data. In general, downloads work for tens to hundreds of datasets, not for thousands.

Q: Can I use as a reference a bacterial genome sequenced by DOE JGI to assembly my bacterial genome and publish the paper?
A: in theory yes, see the details of the policy here: https://jgi.doe.gov/user-programs/pmo-overview/policies/
You may or may not have to contact the PI of the genome depending on whether the genome is published or not.

Q: Is it possible to find the same metagenome/genome in IMG and Genome Portal by using the IMG Genome ID or Genome Portal uses different ID numbers?
A: Yes, IMG taxon ID is searchable in Genome Portals

Q: Is there a statute of limitations on how long a PI could restrict use of “unpublished” data? E.g. 5 years? 10 years?
A: Please refer to the JGI data usage policy: https://jgi.doe.gov/user-programs/pmo-overview/policies/
Briefly, for datasets generated for proposals accepted prior to Nov 2018, there is no statute of limitations and they remain in reserved status indefinitely until the PI publishes a paper. For datasets generated for proposals accepted after Nov 2018 the datasets transition to unreserved status 2 years after being made public.

Q: Can you export the files after this last table with Public data?
A: Yes you can, anything you reconfigure and redisplay is exportable.

Q: What is the reason that some genomes are not available?
A: Please see the document on data downloads in the IMG Help Section. Briefly, IMG data is virtually identical to the GenBank record and therefore IMG redirects you to GenBank for downloads. Or alternatively a private submitter has opted out from providing the data via Genome Portal.

Q: can she show us again how she eliminated the 43 non-public genomes from the cart (to get 26 remaining)
A: It will be on YouTube. Also please refer to the “IMG Tips and Tricks for table configuration and filtering” here:

Q: I would like to search some genes instead of the whole genome sequences
A: IMG provides multiple ways to search for genes of interest including keyword search in Quick and Advanced Gene Search, sequence similarity searches and export of genes retrieved by Function Profile. If you are interested all or any of them please respond to the webinar survey and indicate your interest in these topics in the Suggestions section.

Q: I’m interested in advanced gene searches (eg finding/downloading orthologs)
A: Please mention that in the feedback form. For now, please check out either Find Genes > Gene Search > Advanced Search Builder and feel free to ask us follow up Q using https://img.jgi.doe.gov/cgi-bin/mer/main.cgi?section=Questions

Q: Why do I have more than one 16S rRNA?
A: Isolate genomes can have anywhere from one to 15 or more copies of 16S rRNA. IMG treats them as individual genes and assigns unique identifiers even if their sequences are identical.

Q: Because Natalia started with only 1K genes, she might be missing some other genes with that specific name?
A: yes, you’re right, she would have needed to change her preferences first and start over

Q: what if you want to search for genes across multiple genomes?
A: you can use Gene Search. Alternatively, you can use function profile in Function Cart. Briefly – add genomes/metagenomes to genome cart first, after that you can add Functions (e.g., Pfam, Tigrfam to Function cart – and then use “profile & Alignment” tab from Function cart

Q: Is BLAST the best way to do this?
A: If your query does not have an protein family affiliation (e.g., Pfam, COG, tigrfam, etc), then you will have to resort to BLAST instead, and cannot use Profile & alignment

Q: Are all GenBank microbial genomes in IMG, or do the owners have to choose to have them included in IMG?
A: we do not have all GenBank genomes – we try to maintain a broad “diversity” of genomes – so you will not see all possible strains of M.tb for example. If you are interested to see any particular GenBank genomes in IMG, please contact us through the question link.

Q: Are items like “Transmembrane helices” regularly recalculated (e.g calc’d on the fly) or are they locked in to the original annotation
A: TMs are precomputed at the time of genome loading. We didn’t see large changes when we switched to IMG pipeline v.5+, which uses a newer version of TmHMM.

IMG Webinar: Genome Search Q&A

April 16, 2020 By mlballon

Q(uestion): Do the viruses include prophage that are integrated into bacterial genomes?
A(nswer): Not right now. We’re working on the new version of IMG-VR, which will include prophage
annotations.

Q: Is this webinar being recorded for future reference?
A: Yes, it is being recorded and the recording will be posted. You'll get the link.

Q: Symbionts associated with marine invertebrates would be Host-associated or Environmental?
A: These would be host-associated. We know, it may be confusing. One of the hardest cases that we’ve seen are so-called fungal gardens – fungi cultivated by leaf cutter ants on shredded leaves. It is sort of host-associated, but also somewhat engineered 🙂

Q: Is it mandatory to classify our sample in all levels to submit it to IMG?
A: No. We ask you to classify the sample at the broadest level of environmental category (host-
associated etc.).

Q: Is it possible to search for Conserved Domains using cdd name?
A: No, IMG doesn’t have CDD database, only some of the components (COGs, Pfams, TIGRfams, SMART).

Q: Is there a way to search a particular gene in entire bacterial genome database?
A: It depends on the definition of a "particular gene". If you have the sequence, you can run
BLAST against all IMG isolates.

Q: I do the same search, I get a different (smaller) number of hits.
A: Rekha is what we call a “superuser,” meaning that she has access to a bunch of private
genomes to which you don’t have access. It is expected that you see smaller number of hits.

Q: Blast search option will be covered today’s webninar?
A: No, it will be one of the webinars that we’re planning. We’ve settled on the first 4, and are
asking for the feedback as to the additional topics that we need to cover. IMG similarity
searches is one of the possible topics.

Q: Is searching for PFAM fold an option in the advanced search builder? I can’t seem to see it..
A: Search for Pfams is in the Gene Search. You can search by Pfam id, name, or construct a
complex query with Advanced Search.

Q: What is the difference between IMG and genome portal?
A: JGI Genome Portal is the place where you go to download the data in bulk. IMG provides
tools for data analysis and some (not bulk) download. It may not be obvious with Genome
Search, but will be more clear with additional tools.

Q: Would it be possible to do this same type of search for a particular bacterial species with a
set of Metagenomics that I have saved in my Workspace?
A: A short answer is "not through Advanced Genome Search". A long answer is that IMG has
several options to search for a particular species in metagenomic data. We're soliciting
feedback for future webinar topics, this can be one of them.

Q: For analysis with metagenome data, I need a known sequence of a gene? It is the first time
that I am going to carry out this type of study
A: It depends on how you want to analyze the metagenome data. If you want to find homologs
of a particular gene in metagenomic data, then you need either its sequence or its affiliation
with some protein family, like Pfam.

Q: Are the genomes in GOLD or submitted to IMG also deposited in GenBank?
A: No. You need to do your own GenBank submissions. If the genomes are JGI sequenced, we do submit. But for all externally sequenced genomes you annotate in IMG, user do their own GenBank submissions. IMG (and GOLD) have 3 types of genomes: JGI-generated; private submissions not sequenced by the JGI; public genomes downloaded from GenBank. The third category is in GenBank. JGI-generated genomes eventually are submitted to GenBank, with the
exception of single-cell genomes and metagenome-assembled genomes. Private submissions may or may not go to GenBank.

Q: Can we keep the annotation from JGI when submit to GenBank? So we don’t need to do a re-annotation via GenBank
A: Yes, you can use the JGI annotation to submit the genome to GenBank or reannotate at GenBank. The reason why we don't provide this service to external submitters is because we never or almost never have the complete metadata in order to generate structured comment for the submission. However, JGI-generated gff files generally pass GenBank validation, so you can use them for submission.

Q: How to analyze hypothetical proteins more elaborately? What is the difference between
draft and permanent draft genome?
A: WRT hypothetical proteins, we can try to address this in on of the future webinars. WRT
draft vs permanent draft, “draft” in general means that people may continue sequencing efforts
in order to improve the assembly. “Permanent draft” means that no more sequencing is being
done. You will find definitions for these and other terms used at the following help pages on
GOLD https://gold.jgi.doe.gov/help -> GOLD Terminology

Q: Hi, I have a question about the special symbols are acceptable in the searches. For example I
learned today to use “<=” symbol for less than or equal. Are there other useful symbols?
A: Search Builder accepts ranges expressed as “x to y.” In Results you can filter tables using Perl
regular expressions. We’ll post some advanced examples to show how exactly these can be
used.

Q: I cannot follow sometimes. Can we get the recorded video somewhere? That would great
help to review what we discussion today.
A: Yes, the webinar is being recorded and will be posted. You will get the link.

Q: I am really interested in the workshop, but I am not pretty sure whether I would be able to
participate due to the vise issue. Can I get refund if I am not able to attend?
A: Yes, you will be fully refunded for registration if you cannot attend.