Taxonomical classification of sediment MetagenomeAll the high quality reads, obtained from sediments of different locations, have been considered as input for sensitive taxonomic classification analysis. Taxonomic classification revealed that all the total reads could not be classified. From the Kanpur sediment samples, only 41.58%, 49.35% and 54.79%, of the total reads were classified form KAN-1, KAN-2 and KAN-3 samples, respectively. From Farakka sediments samples, only 50.82%, 52.08% and 35.68% of the total reads has been classified form FAR-1, FAR-2 and FAR-3 samples, respectively. Similarly, From New Delhi, 53.37%, 38.95% and 44.82% of the total reads have been classified form ND-1, ND-2 and ND-3 sediment samples, respectively (Supplementary Table 1). On the basis of the taxonomical classification, a large number of probiotics species were identified from the sediment samples of river Ganga and Yamuna. Four Vibrio species, nine Bacillus species, sixteen Lactobacillus species, five Bifidobacterium species, three Shewanella species, three Pediococcus, seven Enterococcus, four Pseudomonas, along with one each for Streptococcus, Leuconostoc, Aeromonas, Micrococcus, Paenibacillus and Lactococcus species were identified from the metagenome. Roseobacater species, Vagococuus species, and Oenoccus species were also found from different locations.
Phylogenetic analysis of probiotics species
To understand the evolutionary relationship among the 242 identified genome of probiotics species, derived from the sediments of the river Ganga and Yamuna, a multiple sequence analysis (MSA) was carried out. MSA revealed that some of the probiotics species are highly conserved throughout the evolution; however, majority of the species showed diversity. Phylogenetic tree analysis clearly demarcated that all the identified probiotics species shaped eight different clusters (Figure 2). Further, it was found that in CLUSTER-1, L. murinus and L. animalis, derived from Kanpur sample, were found phylogenetically very close with bootstrap value of 99. In CLUSTER-2, L. nodensis and L. perolens, derived from Farakka sample, were found very close with boot strap value of 83 and similarly in case of L. gasseri and L. coryniformis, derived from Kanpur sample with bootstrap value of 98. The highest number of evolutionary closed probiotics species were found in CLUSTER-3 and they are, E. faecium from Yamuna sample and P. chlororaphis from Farakka sample (bootstrap value, 88); L. coleohominis, L. pobuzihii and B. clausii, all derived from Yamuna sample (bootstrap value, 97); and L. rapi, derived from Yamuna and B. subtilis, derived from Farakka (bootstrap value of 81). Similarly, in CLUSTER-4, L. kimchicus, L. delbrueckii and L. helveticus, all derived from Kanpur sample, were found phylogenetically very closely related with B. animalis which is derived from Yamuna sample with bootstrap value more than 80. In CLUSTER-6, L. paracasei (strain 1) and L. Paracasei (Strain 2), both derived from Kanpur sample, showed very high bootstrap value (98%).
Statistical metagenomics for probiotics species
In the classified metagenomics data, a total of 67 probiotics species from the 18 genus were considered for statistical metagenomics analysis. Heatmap analysis showed the demarcation between Kanupr and Farakka due to higher level of pollution in Kanpur which reduces relative abundance of large number of the probiotics species. The Yamuna river also has the differences in prevalence of probiotics species which differ from Kanpur and Farakka (Figure 3).
Relative abundance analysis:
Statistical metagenomics analysis revealed that, the genus Lactobacillus have relative abundance in similar proportion in all the nine sampling sites, however, L. casei was present in relatively high proportion in Farakka stretches with statistical significance (p value 0.02). B.clausii was found in high proportion (p≤0.05) at Farakka stretch whereas B. mycoides found in high proportion (p≤0.05) at Kanpur stretch. Our metagenomic data showed that, there are one species of Vibrio (V. harveyi) which displayed differential relative abundance between three locations and were found in relatively lower (p≤0.05) proportion in Yamuna stretch as compared to Kanpur stretch. Similarly, Shewanella colwelliana was present in different proportion among three different locations and found higher proportion (p≤0.05) at Kanpur stretches. E. faecium was found in high proportion at Yamuna stretches as compared to other locations (p≤0.05) (Table 2).
On the basis of taxonomical hierarchy it was revealed that in all the three locations (Kanpur, Farakka and New Delhi) L. curvatus was having similar relative abundances. L. brevis also showed similar trend though the relative abundance was bit higher in Farakka, whereas L. casei showed significantly lower abundance in New Delhi as compared to rest two locations (Figure 4A). Among the Pediococcus population, it is interesting to note that at Kanpur P. acidilactici is significantly (Student’s t-test, p ≤0.05) dominant over the all taxonomical profile; however, P. pentosaceus and P. ethanolidurans showed equal distribution among the three locations (Figure 4B). Likewise, Pseudomonas population showed equal distribution of relative abundance in all the three locations for P. stutzeri and P. synxantha but found significant (Student’s t-test, p ≤0.05) difference for P. fluorescens, P. Chlororaphis, (Figure 4C) found higher at Kanpur. Among the Enterococcus species Enterococcus durans, Enterococcus malodoratus, Enterococcus raffinosus, Enterococcus hirae, and Enterococcus mundtii showed non-significant distribution. Enterococcus faecium and Enterococcus faecalis showed higher abundance (Student’s t-test, p ≤0.05) inYamuna compared to Kanpur and Farakka, respectively (Figure 4D).
Principal component and cluster analysis
In the present study the data were standardised, before PCA and cluster analysis. The PCA analysis showed that first principal component (PC1) explains 65.57%, second principal component explains (PC2)13.51% variation, third principal component explains 6.31% and altogether explain 85.4% variation in the data sets. The biplot of probiotics were made between these two principal components, PC1 and PC2, (Figure 5). The biplot showed that, all the probiotics, except Shewanella, Bifidobacterium and Vibrio, showed maximum variation among sampling sites and highly correlated to PC1. The bacteria, Shewanella and Bifidobacterium are correlated to PC2, whereas Vibrio is correlated to PC3. In the present study, only three principal components were taken corresponding to their Eigen value greater than one.
In the cluster analysis, sampling stations were grouped using ward method and dendrogram was shown in Figure 6. It was found that, sampling sites, ND-1 and KAN-3 made one cluster, sampling sites KAN-2, KAN-1, ND-3, FAR-3 and FAR-2 made second cluster and ND-2 and FAR-1 made third cluster. It is interesting to note that, the third cluster was completely separated from the rest of the sampling sites (Figure 6).
Functional metagenomics analysis
This analysis was done to explore the interaction networks among proteins (predicted by ORF analysis) from important genes of probiotics species identified from the two locations of the river Ganga (Kanpur and Farakka) and one location at the river Yamuna (New Delhi). The interaction network was made using STRING web application and shown in Figure 7. It was observed that, there are 54 numbers of nodes which were involved in the protein-protein interactions (PPIs), with very low PPI enrichment p-value (
Figure 7 showed that, all proteins in the network were integrated to each other and further functional pathways enrichment analysis of the network revealed that there are, one GO-term (GO:0016868) enriched in Molecular Function, 3 GO-terms (GO:0005737, GO:0044424 and GO:0005622) significantly enriched in Cellular Component and one single pathway (00500) is enriched in KEGG Pathways (Table 3).