To test this idea, L. acidophilus
was sorted from one of the bacterial yogurt extractions, (L. acidophilus abundance <0.2% by flow analysis) as either single cell or 50-cell templates for MDA, and sequenced using the Illumina MiSeq platform. For reference mapping, reads from both the single and 50-cell sorted Niraparib amplicons were normalized and mapped to L. acidophilus NCFM (Figure 5). In parallel, as reference genomes are unavailable in most cases, we also assembled the genome de novo using the normalized reads. The assembly tool CLC was used to both map reads and assemble contigs de Saracatinib novo. Having a reference genome available allowed us to accurately assess the extent of genome coverage using both mapped reads and de novo assembly. As we hypothesized, reads mapping from the 50-cell template yielded near-complete genome coverage at 99.9%, while the single cell template fell short at 68% with far more www.selleckchem.com/products/pf299804.html amplification bias (Figure 5). Bias is clear (Figure 5B) in the single cell template with a large portion of the genome lacking coverage while other regions are covered at very high frequencies of >8,000 fold. For the de
novo assembled genome, the 50-cell template yielded 124 contigs (compared to 555 for the single cell) with >99.8% coverage of the reference and ~8-10% contamination by sequences from non-L. acidophilus species. The contaminating non-Lactobacillus reads were identified by searching assembled contigs in sequenced microbial genomes. We found that the single cell data was contaminated with sequences from bacteria close to a sequenced Pseudomonas genome (accession number, CP002290) and the 50-cell data was contaminated with genomic sequences related to Rhodopseudomonas (CP000283), Bradyrhizobium (BA000040) and Nitrobacter (CP000115). 13.37% of the single second cell read
data mapped to the Pseudomonas genome and 3.23% of the 50-cell data mapped to the Rhodopseudomonas genome, 0.6% to the Bradyrhizobium and 0.14% to the Nitrobacter. The contaminations were likely generated during the cell sorting and/or the MDA process. MDA-related contaminants, such as non-specific amplification and DNA presented in reagents, are common to virtually any approach that utilizes whole genome amplification [33, 43–46]. Beside possible contamination from the MDA process, most contaminants were probably introduced during the cell sorting process since contaminated sequences were not shared between single and 50-cell results.