schliessen

Filtern

 

Bibliotheken

Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking

Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results.... Full description

Journal Title: Systematic biology 2017-03-01, Vol.66 (2), p.218-231
Main Author: Bogusz, Marcin
Other Authors: Whelan, Simon
Format: Electronic Article Electronic Article
Language: English
Subjects:
Quelle: Alma/SFX Local Collection
Publisher: England: Oxford University Press
ID: ISSN: 1063-5157
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_swepub_primary_oai_DiVA_org_uu_316533
title: Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking
format: Article
creator:
  • Bogusz, Marcin
  • Whelan, Simon
subjects:
  • Algorithms
  • Alignment
  • Alignment-free
  • based phylogenetics
  • Benchmarking
  • Biological Evolution
  • Biological Sciences
  • Biologiska vetenskaper
  • Classification - methods
  • Comparative analysis
  • distance
  • distance-based phylogenetics
  • Estimating techniques
  • Evolution, Molecular
  • Evolutionary Biology
  • Evolutionsbiologi
  • free
  • Models, Genetic
  • Natural Sciences
  • Naturvetenskap
  • pair Hidden Markov Models
  • phylogenetic inference
  • Phylogenetics
  • Phylogeny
  • Sequence Alignment
  • statistical alignment
  • Statistical inference
ispartof: Systematic biology, 2017-03-01, Vol.66 (2), p.218-231
description: Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.
language: eng
source: Alma/SFX Local Collection
identifier: ISSN: 1063-5157
fulltext: fulltext
issn:
  • 1063-5157
  • 1076-836X
  • 1076-836X
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.5137358
LOCALfalse
PrimoNMBib
record
control
sourceidjstor_swepu
recordidTN_cdi_swepub_primary_oai_DiVA_org_uu_316533
sourceformatXML
sourcesystemPC
jstor_id26408887
oup_id10.1093/sysbio/syw074
sourcerecordid26408887
originalsourceidFETCH-LOGICAL-1532t-c1f3b3a2d305564ebeaa9870adfe84e5ff9ab69a21bb123bd5e37263e67c3113
addsrcrecordideNqFkc1v1DAQxSMEomXhyBEUiQsHAnYmdhxuS1s-pFI4rIATlpNMdr0kdrAdrfa_x90sRVRCnGYk__w0770keUzJS0oqeOX3vtY2jh0pizvJKSUlzwTwb3evdw4Zo6w8SR54vyWEUs7o_eQkLzkAMDhNvn_e7Hu7RoNBN-nKIaYXPuhBBW1N-lWHTapMe1jsFNJlr9dmQBNep1e4S8-1D8o0mH7E-N76A_sGTbMZlPuhzfphcq9TvcdHx7lIVm8vVmfvs8tP7z6cLS8zyiAPWUM7qEHlLRDGeIE1KlWJkqi2Q1Eg67pK1bxSOa1rmkPdMoQy54C8bIBSWCQvZlm_w3Gq5eiiA7eXVml5rr8spXVrOU0Sov9ofJFczbgd0Sjt8K8PbcxCtthOo9x1MoYmlYC84blqVIVdUTLWKcEFtETVlANRUfD5LDg6-3NCH-SgfYN9rwzayUsqYthMCEIi-uwWurWTMzGcSAlWMkEqEalsphpnvXfY3VxIibzuXc69y7n3yD89qk71gO0N_bvoCMAtwUaHQ8nBKd3_U_boy8Yw_nfBkxnd-mDdnwN4QYQQJfwCzZjVEg
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid1885758098
display
typearticle
titlePhylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking
sourceAlma/SFX Local Collection
creatorBogusz, Marcin ; Whelan, Simon
creatorcontribBogusz, Marcin ; Whelan, Simon
descriptionPhylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.
identifier
0ISSN: 1063-5157
1ISSN: 1076-836X
2EISSN: 1076-836X
3DOI: 10.1093/sysbio/syw074
4PMID: 27633353
languageeng
publisherEngland: Oxford University Press
subjectAlgorithms ; Alignment ; Alignment-free ; based phylogenetics ; Benchmarking ; Biological Evolution ; Biological Sciences ; Biologiska vetenskaper ; Classification - methods ; Comparative analysis ; distance ; distance-based phylogenetics ; Estimating techniques ; Evolution, Molecular ; Evolutionary Biology ; Evolutionsbiologi ; free ; Models, Genetic ; Natural Sciences ; Naturvetenskap ; pair Hidden Markov Models ; phylogenetic inference ; Phylogenetics ; Phylogeny ; Sequence Alignment ; statistical alignment ; Statistical inference
ispartofSystematic biology, 2017-03-01, Vol.66 (2), p.218-231
rights
0Copyright © 2017 Society of Systematic Biologists
1The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2016
2The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1532t-c1f3b3a2d305564ebeaa9870adfe84e5ff9ab69a21bb123bd5e37263e67c3113
citesFETCH-LOGICAL-1532t-c1f3b3a2d305564ebeaa9870adfe84e5ff9ab69a21bb123bd5e37263e67c3113
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
thumbnail$$Usyndetics_thumb_exl
backlink
0$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27633353$$D View this record in MEDLINE/PubMed
1$$Uhttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-316533$$DView record from Swedish Publication Index
search
creatorcontrib
0Bogusz, Marcin
1Whelan, Simon
title
0Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking
1Systematic biology
addtitleSyst Biol
descriptionPhylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.
subject
0Algorithms
1Alignment
2Alignment-free
3based phylogenetics
4Benchmarking
5Biological Evolution
6Biological Sciences
7Biologiska vetenskaper
8Classification - methods
9Comparative analysis
10distance
11distance-based phylogenetics
12Estimating techniques
13Evolution, Molecular
14Evolutionary Biology
15Evolutionsbiologi
16free
17Models, Genetic
18Natural Sciences
19Naturvetenskap
20pair Hidden Markov Models
21phylogenetic inference
22Phylogenetics
23Phylogeny
24Sequence Alignment
25statistical alignment
26Statistical inference
issn
01063-5157
11076-836X
21076-836X
fulltexttrue
rsrctypearticle
creationdate2017
recordtypearticle
recordideNqFkc1v1DAQxSMEomXhyBEUiQsHAnYmdhxuS1s-pFI4rIATlpNMdr0kdrAdrfa_x90sRVRCnGYk__w0770keUzJS0oqeOX3vtY2jh0pizvJKSUlzwTwb3evdw4Zo6w8SR54vyWEUs7o_eQkLzkAMDhNvn_e7Hu7RoNBN-nKIaYXPuhBBW1N-lWHTapMe1jsFNJlr9dmQBNep1e4S8-1D8o0mH7E-N76A_sGTbMZlPuhzfphcq9TvcdHx7lIVm8vVmfvs8tP7z6cLS8zyiAPWUM7qEHlLRDGeIE1KlWJkqi2Q1Eg67pK1bxSOa1rmkPdMoQy54C8bIBSWCQvZlm_w3Gq5eiiA7eXVml5rr8spXVrOU0Sov9ofJFczbgd0Sjt8K8PbcxCtthOo9x1MoYmlYC84blqVIVdUTLWKcEFtETVlANRUfD5LDg6-3NCH-SgfYN9rwzayUsqYthMCEIi-uwWurWTMzGcSAlWMkEqEalsphpnvXfY3VxIibzuXc69y7n3yD89qk71gO0N_bvoCMAtwUaHQ8nBKd3_U_boy8Yw_nfBkxnd-mDdnwN4QYQQJfwCzZjVEg
startdate20170301
enddate20170301
creator
0Bogusz, Marcin
1Whelan, Simon
general
0Oxford University Press
1Uppsala universitet, Evolutionsbiologi
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
8K9.
97X8
10BOBZL
11CLFQK
12ADTPV
13D8T
sort
creationdate20170301
titlePhylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking
authorBogusz, Marcin ; Whelan, Simon
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1532t-c1f3b3a2d305564ebeaa9870adfe84e5ff9ab69a21bb123bd5e37263e67c3113
rsrctypearticles
prefilterarticles
languageeng
creationdate2017
topic
0Algorithms
1Alignment
2Alignment-free
3based phylogenetics
4Benchmarking
5Biological Evolution
6Biological Sciences
7Biologiska vetenskaper
8Classification - methods
9Comparative analysis
10distance
11distance-based phylogenetics
12Estimating techniques
13Evolution, Molecular
14Evolutionary Biology
15Evolutionsbiologi
16free
17Models, Genetic
18Natural Sciences
19Naturvetenskap
20pair Hidden Markov Models
21phylogenetic inference
22Phylogenetics
23Phylogeny
24Sequence Alignment
25statistical alignment
26Statistical inference
toplevel
0peer_reviewed
1online_resources
creatorcontrib
0Bogusz, Marcin
1Whelan, Simon
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7ProQuest Health & Medical Complete (Alumni)
8MEDLINE - Academic
9OpenAIRE (Open Access)
10OpenAIRE
11SwePub
12SWEPUB Freely available online
jtitleSystematic biology
delivery
delcategoryRemote Search Resource
fulltextfulltext
addata
au
0Bogusz, Marcin
1Whelan, Simon
formatjournal
genrearticle
ristypeJOUR
atitlePhylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking
jtitleSystematic biology
addtitleSyst Biol
date2017-03-01
risdate2017
volume66
issue2
spage218
epage231
pages218-231
issn
01063-5157
11076-836X
eissn1076-836X
abstractPhylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.
copEngland
pubOxford University Press
pmid27633353
doi10.1093/sysbio/syw074
oafree_for_read