![]() ![]() I apologize for the long post (wanted to be clear about what I have tried). are not simply throwing another software similar to the ones above at it) :oD. What other approaches should I take to determine the species (especially interested to hear ideas that add something new to what I have already tried, i.e. How can I determine this is isolated bacteria? And not something else like archaea or pure contamination or multiple species? When does one feel confident they can report the species back to biologists? Are my numbers above indeed too low and conflicting? How should I determine if contamination should be removed and how should I do so safely (if needed)? My worries are that it could be that it is not bacteria (maybe archaea), is too contaminated, is not "isolated", etc.įor anyone with experience, what would you recommend to someone in my position to confidently determine what species this is (if that even seems possible at this point)? Specifically: The biologists press me to tell them what species their sample is and I have been unable to feel confident about giving an answer. My main findings so far: 1) There may be contamination, 2) Mapping/alignment/similarity scores seem low, 3) Different software can point (despite being low in value) to different Helicobacter species. Over the weeks, I feel like I am working in circles and throwing different software at this sample. I do not have much experience at all with this type of analysis. I am not sure if there are certain parameters I should use (such as blastN versus megablast) and whether theses results seem consistent - especially because the highest WKID value from sendsketch was a different Helicobacter species. BlastN on the first 25 bases had Helicobacter cetorum MIT 00-7128 with lowest e-value (3e-39), megablast on the first 25 bases had "no significant similarity found message", BlastN on the last 25 bases had Helicobacter felis ATCC 49179 with lowest e-value (1e-25), and megablast on the last 25 bases had Helicobacter felis ATCC 49179 with lowest e-value (3e-19). ![]() I did this for both the first 25 bases and the last 25 bases in the file. In my latest post (linked above), a user suggested taking small selections of reads (~20-25) and blasting. macacae showing up most in sendsketch) - but still that probably does not matter given how low these values are anyway. pylori showing up more in sourmash and Kraken and H. They also do not seem consistent (with H. I thought the numbers above seemed low (please let me know if you think otherwise). It had only 14% bacteria classified (and even less virus/plasmid), with most (5.66%) mapping to Helicobacter pylori. Ran Kraken on Galaxy against a database of "plasmids", "viruses", and "bacteria". The highest similarity was 43.5% to Helicobacter pylori P12, but there were dozens of similarities with almost the same percentage to other Helicobacter strains. Used sourmash to compute k-mer sequences to determine relatedness of genomes. So, I tried (2), (3), (4) below:ĭownloaded all “Complete genomes” of “Helicobacter” genus from NCBI (n=221). I thought I should do a more sophisticated approach than blasting contigs. I also noticed that some contigs (shorter ones) blasted to Mus musculus/ Homo sapiens. So, I downloaded that as a reference and aligned the sample to it using BWA. Some seemed to align to Helicobacter cetorum MIT 00-7128. I have been attempting this for weeks and am stuck. My job is to determine the identity (closest species etc.) of these samples. The wet-lab researchers believe it is from an isolated strain of bacteria likely Helicobacter (this may be questionable). fastq files (where each read has about 150 bases). This is somewhat of a follow-up from previous posts (most recently C: How to determine which NCBI sequence to map against (multiple sequences for sing ). ![]()
0 Comments
Leave a Reply. |