schliessen

Filtern

 

Bibliotheken

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

Abstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement metho... Full description

Journal Title: Systematic biology 2019-03-01, Vol.68 (2), p.365-369
Main Author: Barbera, Pierre
Other Authors: Kozlov, Alexey M , Czech, Lucas , Morel, Benoit , Darriba, Diego , Flouri, Tomáš , Stamatakis, Alexandros
Format: Electronic Article Electronic Article
Language: English
Subjects:
Publisher: England: Oxford University Press
ID: ISSN: 1063-5157
Link: https://www.ncbi.nlm.nih.gov/pubmed/30165689
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6368480
title: EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
format: Article
creator:
  • Barbera, Pierre
  • Kozlov, Alexey M
  • Czech, Lucas
  • Morel, Benoit
  • Darriba, Diego
  • Flouri, Tomáš
  • Stamatakis, Alexandros
subjects:
  • Algorithms
  • Classification - methods
  • Metabarcoding
  • metagenomics
  • microbiome
  • phylogenetic placement
  • phylogenetics
  • Phylogeny
  • Sequence Analysis, DNA
  • Software
  • Software for Systematics and Evolution
ispartof: Systematic biology, 2019-03-01, Vol.68 (2), p.365-369
description: Abstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.
language: eng
source:
identifier: ISSN: 1063-5157
fulltext: no_fulltext
issn:
  • 1063-5157
  • 1076-836X
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.805415
LOCALfalse
PrimoNMBib
record
control
sourceidproquest_pubme
recordidTN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6368480
sourceformatXML
sourcesystemPC
oup_id10.1093/sysbio/syy054
sourcerecordid2098765699
originalsourceidFETCH-LOGICAL-1453t-241293860c7cd15e76e0d89492c5452bb3a7f585716fe168ebf5ad9e3557bf580
addsrcrecordideNqFkM9LwzAYhoMozl9Hr9Kjl2rSNGniQZCxTUFxoIK3kGZfZyRrZtMO9t-bsTlUEE_fR76XNw8PQqcEXxAs6WVYhtL6OJaY5TvogOCCp4Ly193VzmnKCCt66DCEd4wJ4Yzsox7FceFCHqDhYHyT1tOr5EGHYBfglslYN9o5cMlg4V3XWl_rJr46bWAGdZv4KhlBDa01yRN8dFAbCMdor9IuwMlmHqGX4eC5f5veP47u-jf3KckZbdMsJ5mkgmNTmAlhUHDAEyFzmRmWs6wsqS4qJlhBeAWECygrpicSKGNFXAU-Qtfr3nlXzmBiIk-EVfPGziKk8tqqn5favqmpXyhOucgFjgXnm4LGR_bQqpkNBpzTNfguqAxLUUQ3UsZouo6axofQQLX9hmC1cq_W7tXafcyffWfbpr9kxwD9VWhsq1eCI6p1f9ZuiH03_4fgE81Uo0I
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid2098765699
display
typearticle
titleEPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
creatorBarbera, Pierre ; Kozlov, Alexey M ; Czech, Lucas ; Morel, Benoit ; Darriba, Diego ; Flouri, Tomáš ; Stamatakis, Alexandros
contributorPosada, David
creatorcontribBarbera, Pierre ; Kozlov, Alexey M ; Czech, Lucas ; Morel, Benoit ; Darriba, Diego ; Flouri, Tomáš ; Stamatakis, Alexandros ; Posada, David
descriptionAbstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.
identifier
0ISSN: 1063-5157
1EISSN: 1076-836X
2DOI: 10.1093/sysbio/syy054
3PMID: 30165689
languageeng
publisherEngland: Oxford University Press
subjectAlgorithms ; Classification - methods ; Metabarcoding ; metagenomics ; microbiome ; phylogenetic placement ; phylogenetics ; Phylogeny ; Sequence Analysis, DNA ; Software ; Software for Systematics and Evolution
ispartofSystematic biology, 2019-03-01, Vol.68 (2), p.365-369
rightsThe Author(s) 2018. Published by Oxford University Press on behalf of the Society of Systematic Biologists. 2018
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1453t-241293860c7cd15e76e0d89492c5452bb3a7f585716fe168ebf5ad9e3557bf580
citesFETCH-LOGICAL-1453t-241293860c7cd15e76e0d89492c5452bb3a7f585716fe168ebf5ad9e3557bf580
links
openurl$$Topenurl_article
thumbnail$$Usyndetics_thumb_exl
backlink$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30165689$$D View this record in MEDLINE/PubMed
search
contributorPosada, David
creatorcontrib
0Barbera, Pierre
1Kozlov, Alexey M
2Czech, Lucas
3Morel, Benoit
4Darriba, Diego
5Flouri, Tomáš
6Stamatakis, Alexandros
title
0EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
1Systematic biology
addtitleSyst Biol
descriptionAbstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.
subject
0Algorithms
1Classification - methods
2Metabarcoding
3metagenomics
4microbiome
5phylogenetic placement
6phylogenetics
7Phylogeny
8Sequence Analysis, DNA
9Software
10Software for Systematics and Evolution
issn
01063-5157
11076-836X
fulltextfalse
rsrctypearticle
creationdate2019
recordtypearticle
recordideNqFkM9LwzAYhoMozl9Hr9Kjl2rSNGniQZCxTUFxoIK3kGZfZyRrZtMO9t-bsTlUEE_fR76XNw8PQqcEXxAs6WVYhtL6OJaY5TvogOCCp4Ly193VzmnKCCt66DCEd4wJ4Yzsox7FceFCHqDhYHyT1tOr5EGHYBfglslYN9o5cMlg4V3XWl_rJr46bWAGdZv4KhlBDa01yRN8dFAbCMdor9IuwMlmHqGX4eC5f5veP47u-jf3KckZbdMsJ5mkgmNTmAlhUHDAEyFzmRmWs6wsqS4qJlhBeAWECygrpicSKGNFXAU-Qtfr3nlXzmBiIk-EVfPGziKk8tqqn5favqmpXyhOucgFjgXnm4LGR_bQqpkNBpzTNfguqAxLUUQ3UsZouo6axofQQLX9hmC1cq_W7tXafcyffWfbpr9kxwD9VWhsq1eCI6p1f9ZuiH03_4fgE81Uo0I
startdate20190301
enddate20190301
creator
0Barbera, Pierre
1Kozlov, Alexey M
2Czech, Lucas
3Morel, Benoit
4Darriba, Diego
5Flouri, Tomáš
6Stamatakis, Alexandros
generalOxford University Press
scope
0TOX
1CGR
2CUY
3CVF
4ECM
5EIF
6NPM
7AAYXX
8CITATION
97X8
105PM
sort
creationdate20190301
titleEPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
authorBarbera, Pierre ; Kozlov, Alexey M ; Czech, Lucas ; Morel, Benoit ; Darriba, Diego ; Flouri, Tomáš ; Stamatakis, Alexandros
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1453t-241293860c7cd15e76e0d89492c5452bb3a7f585716fe168ebf5ad9e3557bf580
rsrctypearticles
prefilterarticles
languageeng
creationdate2019
topic
0Algorithms
1Classification - methods
2Metabarcoding
3metagenomics
4microbiome
5phylogenetic placement
6phylogenetics
7Phylogeny
8Sequence Analysis, DNA
9Software
10Software for Systematics and Evolution
toplevelpeer_reviewed
creatorcontrib
0Barbera, Pierre
1Kozlov, Alexey M
2Czech, Lucas
3Morel, Benoit
4Darriba, Diego
5Flouri, Tomáš
6Stamatakis, Alexandros
collection
0Oxford Journals Open Access Collection
1Medline
2MEDLINE
3MEDLINE (Ovid)
4MEDLINE
5MEDLINE
6PubMed
7CrossRef
8MEDLINE - Academic
9PubMed Central (Full Participant titles)
jtitleSystematic biology
delivery
delcategoryRemote Search Resource
fulltextno_fulltext
addata
au
0Barbera, Pierre
1Kozlov, Alexey M
2Czech, Lucas
3Morel, Benoit
4Darriba, Diego
5Flouri, Tomáš
6Stamatakis, Alexandros
formatjournal
genrearticle
ristypeJOUR
atitleEPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
jtitleSystematic biology
addtitleSyst Biol
date2019-03-01
risdate2019
volume68
issue2
spage365
epage369
pages365-369
issn1063-5157
eissn1076-836X
abstractAbstract Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.
copEngland
pubOxford University Press
pmid30165689
doi10.1093/sysbio/syy054
oafree_for_read