schliessen

Filtern

 

Bibliotheken

A comparison of methods for clustering 16S rRNA sequences into OTUs

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences i... Full description

Journal Title: PLoS ONE 01 January 2013, Vol.8(8), p.e70837
Main Author: Wei Chen
Other Authors: Clarence K Zhang , Yongmei Cheng , Shaowu Zhang , Hongyu Zhao
Format: Electronic Article Electronic Article
Language: English
Subjects:
ID: E-ISSN: 1932-6203 ; DOI: 10.1371/journal.pone.0070837
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: doaj_soai_doaj_org_article_a12ef91bb52740918f3e34778feaf0c2
title: A comparison of methods for clustering 16S rRNA sequences into OTUs
format: Article
creator:
  • Wei Chen
  • Clarence K Zhang
  • Yongmei Cheng
  • Shaowu Zhang
  • Hongyu Zhao
subjects:
  • Sciences (General)
ispartof: PLoS ONE, 01 January 2013, Vol.8(8), p.e70837
description: Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
language: eng
source:
identifier: E-ISSN: 1932-6203 ; DOI: 10.1371/journal.pone.0070837
fulltext: fulltext_linktorsrc
issn:
  • 1932-6203
  • 19326203
url: Link


@attributes
ID1657281158
RANK0.07
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
LOCALfalse
PrimoNMBib
record
control
sourcerecordidoai_doaj_org_article_a12ef91bb52740918f3e34778feaf0c2
sourceiddoaj_s
recordidTN_doaj_soai_doaj_org_article_a12ef91bb52740918f3e34778feaf0c2
sourcesystemOther
dbidDOA
pqid1430434063
galeid478298226
display
typearticle
titleA comparison of methods for clustering 16S rRNA sequences into OTUs
creatorWei Chen ; Clarence K Zhang ; Yongmei Cheng ; Shaowu Zhang ; Hongyu Zhao
ispartofPLoS ONE, 01 January 2013, Vol.8(8), p.e70837
identifierE-ISSN: 1932-6203 ; DOI: 10.1371/journal.pone.0070837
subjectSciences (General)
descriptionRecent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
languageeng
oafree_for_read
source
version9
lds50peer_reviewed
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
linktorsrc$$Uhttps://doaj.org/article/a12ef91bb52740918f3e34778feaf0c2$$EView_full_text_in_DOAJ
search
creatorcontrib
0Wei Chen
1Clarence K Zhang
2Yongmei Cheng
3Shaowu Zhang
4Hongyu Zhao
titleA comparison of methods for clustering 16S rRNA sequences into OTUs
description

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.

subjectSciences (General)
general
0English
1Public Library of Science (PLoS)
210.1371/journal.pone.0070837
3Directory of Open Access Journals (DOAJ)
sourceiddoaj_s
recordiddoaj_soai_doaj_org_article_a12ef91bb52740918f3e34778feaf0c2
issn
01932-6203
119326203
rsrctypearticle
creationdate2013
addtitlePLoS ONE
searchscope
0doaj_full
1doaj1
scope
0doaj_full
1doaj1
lsr45$$EView_full_text_in_DOAJ
tmp01Directory of Open Access Journals (DOAJ)
tmp02DOA
startdate20130101
enddate20130101
lsr40PLoS ONE, 01 January 2013, Vol.8 (8), p.e70837
doi10.1371/journal.pone.0070837
citationpf e70837 vol 8 issue 8
lsr30VSR-Enriched:[pqid, pages, galeid]
sort
titleA comparison of methods for clustering 16S rRNA sequences into OTUs
authorWei Chen ; Clarence K Zhang ; Yongmei Cheng ; Shaowu Zhang ; Hongyu Zhao
creationdate20130101
lso0120130101
facets
frbrgroupid8913507597496640566
frbrtype5
newrecords20190714
languageeng
topicSciences (General)
collectionDirectory of Open Access Journals (DOAJ)
prefilterarticles
rsrctypearticles
creatorcontrib
0Wei Chen
1Clarence K Zhang
2Yongmei Cheng
3Shaowu Zhang
4Hongyu Zhao
jtitlePLoS ONE
creationdate2013
toplevelpeer_reviewed
delivery
delcategoryRemote Search Resource
fulltextfulltext_linktorsrc
addata
au
0Wei Chen
1Clarence K Zhang
2Yongmei Cheng
3Shaowu Zhang
4Hongyu Zhao
atitleA comparison of methods for clustering 16S rRNA sequences into OTUs
jtitlePLoS ONE
risdate20130101
volume8
issue8
spagee70837
eissn1932-6203
formatjournal
genrearticle
ristypeJOUR
abstract

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.

pubPublic Library of Science (PLoS)
doi10.1371/journal.pone.0070837
urlhttps://doaj.org/article/a12ef91bb52740918f3e34778feaf0c2
lad01PLoS ONE
oafree_for_read
pagese70837
date2013-01-01