schliessen

Filtern

 

Bibliotheken

How Many Bootstrap Replicates Are Necessary?

Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on... Full description

Journal Title: Journal of computational biology 2010-03, Vol.17 (3), p.337-354
Main Author: Pattengale, Nicholas D
Other Authors: Alipour, Masoud , Bininda-Emonds, Olaf R P , Moret, Bernard M E , Stamatakis, Alexandros
Format: Electronic Article Electronic Article
Language: English
Subjects:
Quelle: Alma/SFX Local Collection
Publisher: United States: Mary Ann Liebert, Inc
ID: ISSN: 1066-5277
Link: https://www.ncbi.nlm.nih.gov/pubmed/20377449
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_proquest_miscellaneous_733565014
title: How Many Bootstrap Replicates Are Necessary?
format: Article
creator:
  • Pattengale, Nicholas D
  • Alipour, Masoud
  • Bininda-Emonds, Olaf R P
  • Moret, Bernard M E
  • Stamatakis, Alexandros
subjects:
  • Bootstrapping (Statistics)
  • Computational biology
  • Computational Biology - methods
  • Confidence Intervals
  • Databases, Genetic
  • Likelihood Functions
  • Phylogenetic trees
  • Phylogeny
  • Reproducibility of Results
  • Time Factors
  • Usage
ispartof: Journal of computational biology, 2010-03, Vol.17 (3), p.337-354
description: Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.
language: eng
source: Alma/SFX Local Collection
identifier: ISSN: 1066-5277
fulltext: fulltext
issn:
  • 1066-5277
  • 1557-8666
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.7011738
LOCALfalse
PrimoNMBib
record
control
sourceidgale_opena
recordidTN_cdi_proquest_miscellaneous_733565014
sourceformatXML
sourcesystemPC
galeidA224168402
sourcerecordidA224168402
originalsourceidFETCH-LOGICAL-1550t-faa14791948c3f46ad3e570778885609d650246402510bd71437e6dd24e4b0463
addsrcrecordideNp1kc1rFTEUxYMotj5dupXZieA8b74zK3kW-wG1QtV1yCR3SmRmMk5mqP3vzePVQgXJIiH87sk5OYS8prClYJoPfmi3DKDZAtXNE3JMpdS1UUo9LWdQqpZM6yPyIuefAJQr0M_JEQOutRDNMXl_nm6rL268qz6ltORldlN1jVMfvVswV7sZqyv0mLOb7z6-JM8612d8db9vyI_Tz99PzuvLr2cXJ7vLujwOS905R4VuaCOM551QLnCUGrQ2xkgFTVASmFACmKTQBk0F16hCYAJFC0LxDbk66KYJRxdntNMch-LAJhdtGHGxAcM62dvOllCWO2EYU0KC6QAkDZ5x5ZTy0gTVtLoIvj0ITnP6tWJe7BCzx753I6Y1W825LKaKkQ3ZHsgb16ONY5fKl_iyAg7RpxG7WO53jAmqTElQBt49GijMgr-XG7fmbC--XT9m6wPr55TzjN1DLgp236Ytbdp9m3bfZuHf3Nte2wHDA_23vgLwfwR9XNwSi4fZxf4_sn8AdaOlgQ
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid733565014
display
typearticle
titleHow Many Bootstrap Replicates Are Necessary?
sourceAlma/SFX Local Collection
creatorPattengale, Nicholas D ; Alipour, Masoud ; Bininda-Emonds, Olaf R P ; Moret, Bernard M E ; Stamatakis, Alexandros
creatorcontribPattengale, Nicholas D ; Alipour, Masoud ; Bininda-Emonds, Olaf R P ; Moret, Bernard M E ; Stamatakis, Alexandros
descriptionPhylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.
identifier
0ISSN: 1066-5277
1EISSN: 1557-8666
2DOI: 10.1089/cmb.2009.0179
3PMID: 20377449
languageeng
publisherUnited States: Mary Ann Liebert, Inc
subjectBootstrapping (Statistics) ; Computational biology ; Computational Biology - methods ; Confidence Intervals ; Databases, Genetic ; Likelihood Functions ; Phylogenetic trees ; Phylogeny ; Reproducibility of Results ; Time Factors ; Usage
ispartofJournal of computational biology, 2010-03, Vol.17 (3), p.337-354
rightsCOPYRIGHT 2010 Mary Ann Liebert, Inc.
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1550t-faa14791948c3f46ad3e570778885609d650246402510bd71437e6dd24e4b0463
citesFETCH-LOGICAL-1550t-faa14791948c3f46ad3e570778885609d650246402510bd71437e6dd24e4b0463
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
thumbnail$$Usyndetics_thumb_exl
backlink$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20377449$$D View this record in MEDLINE/PubMed
search
creatorcontrib
0Pattengale, Nicholas D
1Alipour, Masoud
2Bininda-Emonds, Olaf R P
3Moret, Bernard M E
4Stamatakis, Alexandros
title
0How Many Bootstrap Replicates Are Necessary?
1Journal of computational biology
addtitleJ Comput Biol
descriptionPhylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.
subject
0Bootstrapping (Statistics)
1Computational biology
2Computational Biology - methods
3Confidence Intervals
4Databases, Genetic
5Likelihood Functions
6Phylogenetic trees
7Phylogeny
8Reproducibility of Results
9Time Factors
10Usage
issn
01066-5277
11557-8666
fulltexttrue
rsrctypearticle
creationdate2010
recordtypearticle
recordideNp1kc1rFTEUxYMotj5dupXZieA8b74zK3kW-wG1QtV1yCR3SmRmMk5mqP3vzePVQgXJIiH87sk5OYS8prClYJoPfmi3DKDZAtXNE3JMpdS1UUo9LWdQqpZM6yPyIuefAJQr0M_JEQOutRDNMXl_nm6rL268qz6ltORldlN1jVMfvVswV7sZqyv0mLOb7z6-JM8612d8db9vyI_Tz99PzuvLr2cXJ7vLujwOS905R4VuaCOM551QLnCUGrQ2xkgFTVASmFACmKTQBk0F16hCYAJFC0LxDbk66KYJRxdntNMch-LAJhdtGHGxAcM62dvOllCWO2EYU0KC6QAkDZ5x5ZTy0gTVtLoIvj0ITnP6tWJe7BCzx753I6Y1W825LKaKkQ3ZHsgb16ONY5fKl_iyAg7RpxG7WO53jAmqTElQBt49GijMgr-XG7fmbC--XT9m6wPr55TzjN1DLgp236Ytbdp9m3bfZuHf3Nte2wHDA_23vgLwfwR9XNwSi4fZxf4_sn8AdaOlgQ
startdate201003
enddate201003
creator
0Pattengale, Nicholas D
1Alipour, Masoud
2Bininda-Emonds, Olaf R P
3Moret, Bernard M E
4Stamatakis, Alexandros
general
0Mary Ann Liebert, Inc
1Berlin, Springer
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
87X8
9BOBZL
10CLFQK
sort
creationdate201003
titleHow Many Bootstrap Replicates Are Necessary?
authorPattengale, Nicholas D ; Alipour, Masoud ; Bininda-Emonds, Olaf R P ; Moret, Bernard M E ; Stamatakis, Alexandros
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1550t-faa14791948c3f46ad3e570778885609d650246402510bd71437e6dd24e4b0463
rsrctypearticles
prefilterarticles
languageeng
creationdate2010
topic
0Bootstrapping (Statistics)
1Computational biology
2Computational Biology - methods
3Confidence Intervals
4Databases, Genetic
5Likelihood Functions
6Phylogenetic trees
7Phylogeny
8Reproducibility of Results
9Time Factors
10Usage
toplevel
0peer_reviewed
1online_resources
creatorcontrib
0Pattengale, Nicholas D
1Alipour, Masoud
2Bininda-Emonds, Olaf R P
3Moret, Bernard M E
4Stamatakis, Alexandros
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7MEDLINE - Academic
8OpenAIRE (Open Access)
9OpenAIRE
jtitleJournal of computational biology
delivery
delcategoryRemote Search Resource
fulltextfulltext
addata
au
0Pattengale, Nicholas D
1Alipour, Masoud
2Bininda-Emonds, Olaf R P
3Moret, Bernard M E
4Stamatakis, Alexandros
formatjournal
genrearticle
ristypeJOUR
atitleHow Many Bootstrap Replicates Are Necessary?
jtitleJournal of computational biology
addtitleJ Comput Biol
date2010-03
risdate2010
volume17
issue3
spage337
epage354
pages337-354
issn1066-5277
eissn1557-8666
abstractPhylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.
copUnited States
pubMary Ann Liebert, Inc
pmid20377449
doi10.1089/cmb.2009.0179
tpages18
oafree_for_read