schliessen

Filtern

 

Bibliotheken

SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accur... Full description

Journal Title: Systematic biology 2012-01-01, Vol.61 (1), p.90-106
Main Author: Liu, Kevin
Other Authors: Warnow, Tandy J , Holder, Mark T , Nelesen, Serita M , Yu, Jiaye , Stamatakis, Alexandros P , Linder, C. Randal
Format: Electronic Article Electronic Article
Language: English
Subjects:
DNA
Quelle: Alma/SFX Local Collection
Publisher: England: Oxford University Press
ID: ISSN: 1063-5157
Link: https://www.ncbi.nlm.nih.gov/pubmed/22139466
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_proquest_miscellaneous_912639937
title: SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
format: Article
creator:
  • Liu, Kevin
  • Warnow, Tandy J
  • Holder, Mark T
  • Nelesen, Serita M
  • Yu, Jiaye
  • Stamatakis, Alexandros P
  • Linder, C. Randal
subjects:
  • Accuracy
  • Algorithms
  • Automation
  • Computer Simulation
  • Datasets
  • Deoxyribonucleic acid
  • DNA
  • Estimate reliability
  • Estimating techniques
  • Estimation methods
  • Evolution, Molecular
  • Likelihood Functions
  • Maximum likelihood method
  • Missing data
  • Modeling
  • Opal
  • Optimization algorithms
  • Phylogenetics
  • Phylogeny
  • Sequence alignment
  • Sequence Alignment - methods
  • Software
  • Taxa
  • Topology
ispartof: Systematic biology, 2012-01-01, Vol.61 (1), p.90-106
description: Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324: 1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-11-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divi
language: eng
source: Alma/SFX Local Collection
identifier: ISSN: 1063-5157
fulltext: fulltext
issn:
  • 1063-5157
  • 1076-836X
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.7273455
LOCALfalse
PrimoNMBib
record
control
sourceidjstor_opena
recordidTN_cdi_proquest_miscellaneous_912639937
sourceformatXML
sourcesystemPC
jstor_id41515179
oup_id10.1093/sysbio/syr095
sourcerecordid41515179
originalsourceidFETCH-LOGICAL-1489t-5c954b157f5ec395b0de8212a5c6cc67860a9b15977e1c32242942073242dcb73
addsrcrecordideNqFks9u1DAQxi0EomXhyBFkcYFLwH8SO-5tVbWwqAikLoib5TiT4lU2DrZz2Efqc_TF8DbLHiohTjPS_PT5m_mM0EtK3lOi-Ie4i43zuQSiqkfolBIpipqLn4_3veBFRSt5gp7FuCGEUlHRp-iEMcpVKcQp8tfL9d1tsVqd4R8QdvjSxITN0OKltVMwCfC12059MgP4KeKLmNzWJOcH7Dv8JQ_c2GcGfk8wWMDL3t0MWxhSvBf59mvX-xsYIDmL1wEgPkdPOtNHeHGoC_T98mJ9_qm4-vpxdb68KmhZq1RUVlVlk513FViuqoa0UDPKTGWFtULWghiV50pKoJYzVjJVMiJ5blrbSL5An2ddP8JgXAA9huw87LQ3TrfZkW5NMtYl0KomdVdyUduWSMPqpuuY4JYBb6ikostib2exMfi8aEx666KFvp-vohXNvFJ8_-ybB-TGT2HIm2aoFEqJHMkCFTNkg48xQHf0Roneh6rnUPUcauZfH0SnZgvtkf6bYgb4A8G8131KKRjX_1P23eFE0_hfB69mdBOTD0e4pPlzUan4H6GMyOw
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid914699606
display
typearticle
titleSATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
sourceAlma/SFX Local Collection
creatorLiu, Kevin ; Warnow, Tandy J ; Holder, Mark T ; Nelesen, Serita M ; Yu, Jiaye ; Stamatakis, Alexandros P ; Linder, C. Randal
creatorcontribLiu, Kevin ; Warnow, Tandy J ; Holder, Mark T ; Nelesen, Serita M ; Yu, Jiaye ; Stamatakis, Alexandros P ; Linder, C. Randal
descriptionHighly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324: 1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-11-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
identifier
0ISSN: 1063-5157
1EISSN: 1076-836X
2DOI: 10.1093/sysbio/syr095
3PMID: 22139466
languageeng
publisherEngland: Oxford University Press
subjectAccuracy ; Algorithms ; Automation ; Computer Simulation ; Datasets ; Deoxyribonucleic acid ; DNA ; Estimate reliability ; Estimating techniques ; Estimation methods ; Evolution, Molecular ; Likelihood Functions ; Maximum likelihood method ; Missing data ; Modeling ; Opal ; Optimization algorithms ; Phylogenetics ; Phylogeny ; Sequence alignment ; Sequence Alignment - methods ; Software ; Taxa ; Topology
ispartofSystematic biology, 2012-01-01, Vol.61 (1), p.90-106
rights
0Copyright © 2012 Society of Systematic Biologists
1The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2011
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1489t-5c954b157f5ec395b0de8212a5c6cc67860a9b15977e1c32242942073242dcb73
citesFETCH-LOGICAL-1489t-5c954b157f5ec395b0de8212a5c6cc67860a9b15977e1c32242942073242dcb73
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
thumbnail$$Usyndetics_thumb_exl
backlink$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22139466$$D View this record in MEDLINE/PubMed
search
creatorcontrib
0Liu, Kevin
1Warnow, Tandy J
2Holder, Mark T
3Nelesen, Serita M
4Yu, Jiaye
5Stamatakis, Alexandros P
6Linder, C. Randal
title
0SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
1Systematic biology
addtitleSyst Biol
descriptionHighly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324: 1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-11-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
subject
0Accuracy
1Algorithms
2Automation
3Computer Simulation
4Datasets
5Deoxyribonucleic acid
6DNA
7Estimate reliability
8Estimating techniques
9Estimation methods
10Evolution, Molecular
11Likelihood Functions
12Maximum likelihood method
13Missing data
14Modeling
15Opal
16Optimization algorithms
17Phylogenetics
18Phylogeny
19Sequence alignment
20Sequence Alignment - methods
21Software
22Taxa
23Topology
issn
01063-5157
11076-836X
fulltexttrue
rsrctypearticle
creationdate2012
recordtypearticle
recordideNqFks9u1DAQxi0EomXhyBFkcYFLwH8SO-5tVbWwqAikLoib5TiT4lU2DrZz2Efqc_TF8DbLHiohTjPS_PT5m_mM0EtK3lOi-Ie4i43zuQSiqkfolBIpipqLn4_3veBFRSt5gp7FuCGEUlHRp-iEMcpVKcQp8tfL9d1tsVqd4R8QdvjSxITN0OKltVMwCfC12059MgP4KeKLmNzWJOcH7Dv8JQ_c2GcGfk8wWMDL3t0MWxhSvBf59mvX-xsYIDmL1wEgPkdPOtNHeHGoC_T98mJ9_qm4-vpxdb68KmhZq1RUVlVlk513FViuqoa0UDPKTGWFtULWghiV50pKoJYzVjJVMiJ5blrbSL5An2ddP8JgXAA9huw87LQ3TrfZkW5NMtYl0KomdVdyUduWSMPqpuuY4JYBb6ikostib2exMfi8aEx666KFvp-vohXNvFJ8_-ybB-TGT2HIm2aoFEqJHMkCFTNkg48xQHf0Roneh6rnUPUcauZfH0SnZgvtkf6bYgb4A8G8131KKRjX_1P23eFE0_hfB69mdBOTD0e4pPlzUan4H6GMyOw
startdate20120101
enddate20120101
creator
0Liu, Kevin
1Warnow, Tandy J
2Holder, Mark T
3Nelesen, Serita M
4Yu, Jiaye
5Stamatakis, Alexandros P
6Linder, C. Randal
general
0Oxford University Press
1Oxford University Press (OUP)
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
8K9.
97X8
10CLFQK
sort
creationdate20120101
titleSATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
authorLiu, Kevin ; Warnow, Tandy J ; Holder, Mark T ; Nelesen, Serita M ; Yu, Jiaye ; Stamatakis, Alexandros P ; Linder, C. Randal
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1489t-5c954b157f5ec395b0de8212a5c6cc67860a9b15977e1c32242942073242dcb73
rsrctypearticles
prefilterarticles
languageeng
creationdate2012
topic
0Accuracy
1Algorithms
2Automation
3Computer Simulation
4Datasets
5Deoxyribonucleic acid
6DNA
7Estimate reliability
8Estimating techniques
9Estimation methods
10Evolution, Molecular
11Likelihood Functions
12Maximum likelihood method
13Missing data
14Modeling
15Opal
16Optimization algorithms
17Phylogenetics
18Phylogeny
19Sequence alignment
20Sequence Alignment - methods
21Software
22Taxa
23Topology
toplevel
0peer_reviewed
1online_resources
creatorcontrib
0Liu, Kevin
1Warnow, Tandy J
2Holder, Mark T
3Nelesen, Serita M
4Yu, Jiaye
5Stamatakis, Alexandros P
6Linder, C. Randal
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7ProQuest Health & Medical Complete (Alumni)
8MEDLINE - Academic
9OpenAIRE
jtitleSystematic biology
delivery
delcategoryRemote Search Resource
fulltextfulltext
addata
au
0Liu, Kevin
1Warnow, Tandy J
2Holder, Mark T
3Nelesen, Serita M
4Yu, Jiaye
5Stamatakis, Alexandros P
6Linder, C. Randal
formatjournal
genrearticle
ristypeJOUR
atitleSATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
jtitleSystematic biology
addtitleSyst Biol
date2012-01-01
risdate2012
volume61
issue1
spage90
epage106
pages90-106
issn1063-5157
eissn1076-836X
abstractHighly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324: 1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-11-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
copEngland
pubOxford University Press
pmid22139466
doi10.1093/sysbio/syr095
oafree_for_read