ZIEL Institute for Food and Health, Core Facility Microbiome/NGS, Technical University of Munich, Freising, Germany;
16S rRNA gene sequencing has become a popular method for rapid and comprehensive analysis of the diversity and composition of complex microbial communities. However, this method is prone to technical artefacts at various levels of the workflow. The most common method to analyse 16S amplicon data is building a cluster of sequences, representing single microbial entities on a 97% sequence identity (OTUs). Diversity measures derived from OTU-based datasets are strongly influenced by parameter settings such as filtering of spurious OTUs. This is crucial because of interpretation, reproducibility and quality. This study aims to bring clarity about filtering thresholds, usable to exclude spurious OTUs from high-throughput 16S rRNA amplicon datasets.
To determine an appropriate filtering cutoff two types of mock communities are used: ten different communities from published studies and two in-house generated datasets. This was complemented by the analysis of fecal samples of four gnotobiotic mice, colonized with different mixtures of bacteria. To analyse the impact of filtering, two studies with open access to sequence datasets are used as reference.
By filtering data with the commonly used method of removing singletons, shows on average 71% of all OTUs to be artefacts. A filtering cutoff of 0.25% reduces the number to 1.17% in mock-communities and 3.57% in gnototiotic mice, while still capturing 85% of true positives. Even with a low cumulative abundance of 1.14%, these artefacts are appearing in the data set and are as well considered as sequences while building the phylogenetic tree. Especially richness is influenced by the absolute number of OTUs and shows different results in both reference studies (0.25% = 195 ± 78 and 156 ± 44; no singletons = 364 ± 140 and 531 ± 201). Intra-individual stability of the microbiome is dependent on the used filtering method as well as stability of richness which is less dynamic by filtering with 0.25% cutoff (p-value < 0.001). A shift of median unweighted UniFrac distance by 0.36 per individual can be observed, which assumes a more stable microbiome. It is noting that the outcome in first study is inversely for generalized UniFrac distance. The second study shows the same pattern for both methods and distances even though there is a difference in distances for unweighted UniFrac. This affects the interpretation of stability of the human gut microbiome. With this work we would like to raise the awareness of interpreting the outcome of 16S r RNA gene sequencing data. Since there is no standardisation it is important to know the methods behind the analysis and to be sensitised about the possible impact of different filtering approaches and used distance matrices. Nevertheless, it is not only important to carefully interpreted results it is also important to obtain high-quality results. Using a proportional cutoff is an independent filtering method to remove spurious OTUs in microbial datasets.
|Back to GQ2019 overview page