schliessen

Filtern

 

Bibliotheken

Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty

Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce... Full description

Journal Title: Systematic biology 2018-11-01, Vol.67 (6), p.997-1009
Main Author: Chatzou, Maria
Other Authors: Floden, Evan W , Di Tommaso, Paolo , Gascuel, Olivier , Notredame, Cedric
Format: Electronic Article Electronic Article
Language: English
Subjects:
Publisher: England: Oxford University Press
ID: ISSN: 1063-5157
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_hal_primary_oai_HAL_lirmm_02078444v1
title: Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
format: Article
creator:
  • Chatzou, Maria
  • Floden, Evan W
  • Di Tommaso, Paolo
  • Gascuel, Olivier
  • Notredame, Cedric
subjects:
  • Biodiversity
  • Bioinformatics
  • Bootstrap analysis
  • Classification - methods
  • Computer Science
  • Evolution
  • Life Sciences
  • Models, Genetic
  • Phylogenetics
  • Phylogeny
  • Populations
  • Populations and Evolution
  • Proteins - chemistry
  • Proteins - genetics
  • REGULAR ARTICLES
  • Sequence Alignment
  • Software
  • Systematics
  • Systematics, Phylogenetics and taxonomy
  • taxonomy
  • Uncertainty
ispartof: Systematic biology, 2018-11-01, Vol.67 (6), p.997-1009
description: Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.
language: eng
source:
identifier: ISSN: 1063-5157
fulltext: no_fulltext
issn:
  • 1063-5157
  • 1076-836X
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.554369
LOCALfalse
PrimoNMBib
record
control
sourceidjstor_opena
recordidTN_cdi_hal_primary_oai_HAL_lirmm_02078444v1
sourceformatXML
sourcesystemPC
jstor_id26582223
oup_id10.1093/sysbio/syx096
sourcerecordid26582223
originalsourceidFETCH-LOGICAL-c535t-3b088479431063f82168416c2fdeeb7cf2f0bf15705189e004852a6c639f45900
addsrcrecordideNqFksFrFDEUhwdRbF09elTmKMjoSzKZyRzXot3CooVa8BYymaRNmU2mSVZd_3rfMuuCgvSUkHz5eO_lVxQvCbwj0LH3aZd6F3D5CV3zqDgl0DaVYM23x_t9wypOeHtSPEvpDoCQhpOnxQkD2vEOxGlxf268iWp0v8xQfgghpxzVVF5tpynEnEobYnl5uxvDDXLZ6XLp1bhLJpXBlpcxZON8eWXut8ZrPLzwOkR8qbLzN-VydDd-Y3wur_E2ZuV83j0vnlg1JvPisC6K608fv56tqvWX84uz5brSnPFcsR6EqNuuZvsurKCkETVpNLWDMX2rLbXQW-wNOBGdAagFp6rRDetsjb3BoljN3jAZr1w0copuo-JOBuXkgN3IMEgciOysBdt3RiutBm5bK3rKAMAwMGDaDlWfH1INZthO8oeVOGTJuQZR1wILYQPBqmhLetvVhPW8JUODwrez8FaNf7lWy7UcXdxsJFBo0VF_J0i_mekpBhx0ynLjkjbjqLwJ2yQpIS3hrMEni6KaUR1DStHYo52A3OdFznmRc16Qf31Qb_uNGY70n4AgwP4Rapfxd4PHnLjxv9pDxQGH8lAFr2b0LuUQjzBtuKAUv-E3V0jm-g
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid2117153602
display
typearticle
titleGeneralized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
creatorChatzou, Maria ; Floden, Evan W ; Di Tommaso, Paolo ; Gascuel, Olivier ; Notredame, Cedric
contributorHalanych, Ken
creatorcontribChatzou, Maria ; Floden, Evan W ; Di Tommaso, Paolo ; Gascuel, Olivier ; Notredame, Cedric ; Halanych, Ken
descriptionPhylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.
identifier
0ISSN: 1063-5157
1EISSN: 1076-836X
2DOI: 10.1093/sysbio/syx096
3PMID: 30295908
languageeng
publisherEngland: Oxford University Press
subjectBiodiversity ; Bioinformatics ; Bootstrap analysis ; Classification - methods ; Computer Science ; Evolution ; Life Sciences ; Models, Genetic ; Phylogenetics ; Phylogeny ; Populations ; Populations and Evolution ; Proteins - chemistry ; Proteins - genetics ; REGULAR ARTICLES ; Sequence Alignment ; Software ; Systematics ; Systematics, Phylogenetics and taxonomy ; taxonomy ; Uncertainty
ispartofSystematic biology, 2018-11-01, Vol.67 (6), p.997-1009
rights
0The Author(s) 2018
1The Author(s) 2018. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2018
2Distributed under a Creative Commons Attribution 4.0 International License
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-c535t-3b088479431063f82168416c2fdeeb7cf2f0bf15705189e004852a6c639f45900
citesFETCH-LOGICAL-c535t-3b088479431063f82168416c2fdeeb7cf2f0bf15705189e004852a6c639f45900
links
openurl$$Topenurl_article
thumbnail$$Usyndetics_thumb_exl
backlink
0$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30295908$$D View this record in MEDLINE/PubMed
1$$Uhttps://hal-lirmm.ccsd.cnrs.fr/lirmm-02078444$$DView record in HAL
search
contributorHalanych, Ken
creatorcontrib
0Chatzou, Maria
1Floden, Evan W
2Di Tommaso, Paolo
3Gascuel, Olivier
4Notredame, Cedric
title
0Generalized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
1Systematic biology
addtitleSyst Biol
descriptionPhylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.
subject
0Biodiversity
1Bioinformatics
2Bootstrap analysis
3Classification - methods
4Computer Science
5Evolution
6Life Sciences
7Models, Genetic
8Phylogenetics
9Phylogeny
10Populations
11Populations and Evolution
12Proteins - chemistry
13Proteins - genetics
14REGULAR ARTICLES
15Sequence Alignment
16Software
17Systematics
18Systematics, Phylogenetics and taxonomy
19taxonomy
20Uncertainty
issn
01063-5157
11076-836X
fulltextfalse
rsrctypearticle
creationdate2018
recordtypearticle
recordideNqFksFrFDEUhwdRbF09elTmKMjoSzKZyRzXot3CooVa8BYymaRNmU2mSVZd_3rfMuuCgvSUkHz5eO_lVxQvCbwj0LH3aZd6F3D5CV3zqDgl0DaVYM23x_t9wypOeHtSPEvpDoCQhpOnxQkD2vEOxGlxf268iWp0v8xQfgghpxzVVF5tpynEnEobYnl5uxvDDXLZ6XLp1bhLJpXBlpcxZON8eWXut8ZrPLzwOkR8qbLzN-VydDd-Y3wur_E2ZuV83j0vnlg1JvPisC6K608fv56tqvWX84uz5brSnPFcsR6EqNuuZvsurKCkETVpNLWDMX2rLbXQW-wNOBGdAagFp6rRDetsjb3BoljN3jAZr1w0copuo-JOBuXkgN3IMEgciOysBdt3RiutBm5bK3rKAMAwMGDaDlWfH1INZthO8oeVOGTJuQZR1wILYQPBqmhLetvVhPW8JUODwrez8FaNf7lWy7UcXdxsJFBo0VF_J0i_mekpBhx0ynLjkjbjqLwJ2yQpIS3hrMEni6KaUR1DStHYo52A3OdFznmRc16Qf31Qb_uNGY70n4AgwP4Rapfxd4PHnLjxv9pDxQGH8lAFr2b0LuUQjzBtuKAUv-E3V0jm-g
startdate20181101
enddate20181101
creator
0Chatzou, Maria
1Floden, Evan W
2Di Tommaso, Paolo
3Gascuel, Olivier
4Notredame, Cedric
general
0Oxford University Press
1Oxford University Press (OUP)
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
87X8
91XC
10BOBZL
11CLFQK
sort
creationdate20181101
titleGeneralized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
authorChatzou, Maria ; Floden, Evan W ; Di Tommaso, Paolo ; Gascuel, Olivier ; Notredame, Cedric
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-c535t-3b088479431063f82168416c2fdeeb7cf2f0bf15705189e004852a6c639f45900
rsrctypearticles
prefilterarticles
languageeng
creationdate2018
topic
0Biodiversity
1Bioinformatics
2Bootstrap analysis
3Classification - methods
4Computer Science
5Evolution
6Life Sciences
7Models, Genetic
8Phylogenetics
9Phylogeny
10Populations
11Populations and Evolution
12Proteins - chemistry
13Proteins - genetics
14REGULAR ARTICLES
15Sequence Alignment
16Software
17Systematics
18Systematics, Phylogenetics and taxonomy
19taxonomy
20Uncertainty
toplevelpeer_reviewed
creatorcontrib
0Chatzou, Maria
1Floden, Evan W
2Di Tommaso, Paolo
3Gascuel, Olivier
4Notredame, Cedric
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7MEDLINE - Academic
8Hyper Article en Ligne (HAL)
9OpenAIRE (Open Access)
10OpenAIRE
jtitleSystematic biology
delivery
delcategoryRemote Search Resource
fulltextno_fulltext
addata
au
0Chatzou, Maria
1Floden, Evan W
2Di Tommaso, Paolo
3Gascuel, Olivier
4Notredame, Cedric
formatjournal
genrearticle
ristypeJOUR
atitleGeneralized Bootstrap Supports for Phylogenetic Analyses of Protein Sequences Incorporating Alignment Uncertainty
jtitleSystematic biology
addtitleSyst Biol
date2018-11-01
risdate2018
volume67
issue6
spage997
epage1009
pages997-1009
issn1063-5157
eissn1076-836X
abstractPhylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive multiple alignment methods are numerically unstable when dealing with amino-acid sequences. They produce significantly different output when changing sequence input order. We used the HOMFAM protein sequences dataset to show that on datasets larger than 100 sequences, this instability affects on average 21.5% of the aligned residues. The resulting Maximum Likelihood (ML) trees estimated from these MSAs are equally unstable with over 38% of the branches being sensitive to the sequence input order. We established that about two-thirds of this uncertainty stems from the unordered nature of children nodes within the guide trees used to estimate MSAs. To quantify this uncertainty we developed unistrap, a novel approach that estimates the combined effect of alignment uncertainty and site sampling on phylogenetic tree branch supports. Compared with the regular bootstrap procedure, unistrap provides branch support estimates that take into account a larger fraction of the parameters impacting tree instability when processing datasets containing a large number of sequences.
copEngland
pubOxford University Press
pmid30295908
doi10.1093/sysbio/syx096
oafree_for_read