schliessen

Filtern

 

Bibliotheken

Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants

Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for... Full description

Journal Title: Systematic biology 2017, Vol.66 (3), p.399-412
Main Author: Eaton, Deren A R
Other Authors: Spriggs, Elizabeth L , Park, Brian , Donoghue, Michael J
Format: Electronic Article Electronic Article
Language: English
Subjects:
DNA
Quelle: Alma/SFX Local Collection
Publisher: England: Oxford University Press
ID: ISSN: 1063-5157
Link: https://www.ncbi.nlm.nih.gov/pubmed/27798402
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_proquest_miscellaneous_1834998119
title: Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
format: Article
creator:
  • Eaton, Deren A R
  • Spriggs, Elizabeth L
  • Park, Brian
  • Donoghue, Michael J
subjects:
  • Base Sequence
  • Computer Simulation
  • Conserved sequence
  • Coverage
  • Deoxyribonucleic acid
  • DNA
  • DNA sequencing
  • Enzymes
  • Flowering
  • Flowers & plants
  • Magnoliopsida - classification
  • Magnoliopsida - genetics
  • Models, Biological
  • Mutation
  • Phylogenetics
  • Phylogeny
  • Sequence Analysis, DNA
  • Stochasticity
  • Taxa
  • Trees
ispartof: Systematic biology, 2017, Vol.66 (3), p.399-412
description: Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10× the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies.
language: eng
source: Alma/SFX Local Collection
identifier: ISSN: 1063-5157
fulltext: fulltext
issn:
  • 1063-5157
  • 1076-836X
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.766741
LOCALfalse
PrimoNMBib
record
control
sourceidjstor_opena
recordidTN_cdi_proquest_miscellaneous_1834998119
sourceformatXML
sourcesystemPC
jstor_id26408940
sourcerecordid26408940
originalsourceidFETCH-LOGICAL-1458t-7e7dea6819a9f9593f5da10a18d69f45e7318b08f113dfd12c436fc2d51b56a43
addsrcrecordideNp1kc9rFDEcxQdRbK0ePSqBXryM5juZZJJj6bb-oGIRBW8hk3yzTZmdTJNZ1v3vzTK1QsHTC-Hlw3t5VfUa6Hugin3I-9yHWGRHVfOkOgbaiVoy8evp4SxYzYF3R9WLnG8pBRAcnldHTdcp2dLmuFp_DdnG0eI0hzhmEkdSbnIY12RlZkPCSL6freqMd-T6Zj_ENY44B5vJLsw3xJAV4lRnawYkF7_NZirqU9yQyyHuMB0w14MZ5_yyeubNkPHVvZ5UPy8vfpx_qq--ffx8fnZVQ8vlXHfYOTRCgjLKK66Y584ANSCdUL7l2DGQPZUegDnvoLEtE942jkPPhWnZSfVl4cYJRxMS6imFjUl7HU3QroTXrvSyYUYNToDtRNMJ2nMUznBlve-hk0w1QtoCe7fAphTvtphnvSm_hUNphHGbNUjWKiUBVLGePrLexm0aS1UNSlClVEtlcdWLy6aYc0L_kA6oPsyplzn1Mmfxv72nbvsNugf33_2KgT0ClmbmMOWcTBj-i32zvLrNc0z_qKJkLDnZH855uM0
sourcetypeOpen Access Repository
isCDItrue
recordtypearticle
pqid1960999408
display
typearticle
titleMisconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
sourceAlma/SFX Local Collection
creatorEaton, Deren A R ; Spriggs, Elizabeth L ; Park, Brian ; Donoghue, Michael J
creatorcontribEaton, Deren A R ; Spriggs, Elizabeth L ; Park, Brian ; Donoghue, Michael J
descriptionRestriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10× the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies.
identifier
0ISSN: 1063-5157
1EISSN: 1076-836X
2DOI: 10.1093/sysbio/syw092
3PMID: 27798402
languageeng
publisherEngland: Oxford University Press
subjectBase Sequence ; Computer Simulation ; Conserved sequence ; Coverage ; Deoxyribonucleic acid ; DNA ; DNA sequencing ; Enzymes ; Flowering ; Flowers & plants ; Magnoliopsida - classification ; Magnoliopsida - genetics ; Models, Biological ; Mutation ; Phylogenetics ; Phylogeny ; Sequence Analysis, DNA ; Stochasticity ; Taxa ; Trees
ispartofSystematic biology, 2017, Vol.66 (3), p.399-412
rights
0Copyright © 2017 Society of Systematic Biologists
1The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
lds50peer_reviewed
oafree_for_read
citedbyFETCH-LOGICAL-1458t-7e7dea6819a9f9593f5da10a18d69f45e7318b08f113dfd12c436fc2d51b56a43
citesFETCH-LOGICAL-1458t-7e7dea6819a9f9593f5da10a18d69f45e7318b08f113dfd12c436fc2d51b56a43
links
openurl$$Topenurl_article
openurlfulltext$$Topenurlfull_article
thumbnail$$Usyndetics_thumb_exl
backlink$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27798402$$D View this record in MEDLINE/PubMed
search
creatorcontrib
0Eaton, Deren A R
1Spriggs, Elizabeth L
2Park, Brian
3Donoghue, Michael J
title
0Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
1Systematic biology
addtitleSyst Biol
descriptionRestriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10× the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies.
subject
0Base Sequence
1Computer Simulation
2Conserved sequence
3Coverage
4Deoxyribonucleic acid
5DNA
6DNA sequencing
7Enzymes
8Flowering
9Flowers & plants
10Magnoliopsida - classification
11Magnoliopsida - genetics
12Models, Biological
13Mutation
14Phylogenetics
15Phylogeny
16Sequence Analysis, DNA
17Stochasticity
18Taxa
19Trees
issn
01063-5157
11076-836X
fulltexttrue
rsrctypearticle
creationdate2017
recordtypearticle
recordideNp1kc9rFDEcxQdRbK0ePSqBXryM5juZZJJj6bb-oGIRBW8hk3yzTZmdTJNZ1v3vzTK1QsHTC-Hlw3t5VfUa6Hugin3I-9yHWGRHVfOkOgbaiVoy8evp4SxYzYF3R9WLnG8pBRAcnldHTdcp2dLmuFp_DdnG0eI0hzhmEkdSbnIY12RlZkPCSL6freqMd-T6Zj_ENY44B5vJLsw3xJAV4lRnawYkF7_NZirqU9yQyyHuMB0w14MZ5_yyeubNkPHVvZ5UPy8vfpx_qq--ffx8fnZVQ8vlXHfYOTRCgjLKK66Y584ANSCdUL7l2DGQPZUegDnvoLEtE942jkPPhWnZSfVl4cYJRxMS6imFjUl7HU3QroTXrvSyYUYNToDtRNMJ2nMUznBlve-hk0w1QtoCe7fAphTvtphnvSm_hUNphHGbNUjWKiUBVLGePrLexm0aS1UNSlClVEtlcdWLy6aYc0L_kA6oPsyplzn1Mmfxv72nbvsNugf33_2KgT0ClmbmMOWcTBj-i32zvLrNc0z_qKJkLDnZH855uM0
startdate20170501
enddate20170501
creator
0Eaton, Deren A R
1Spriggs, Elizabeth L
2Park, Brian
3Donoghue, Michael J
general
0Oxford University Press
1Oxford University Press (OUP)
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
8K9.
97X8
10CLFQK
sort
creationdate20170501
titleMisconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
authorEaton, Deren A R ; Spriggs, Elizabeth L ; Park, Brian ; Donoghue, Michael J
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-1458t-7e7dea6819a9f9593f5da10a18d69f45e7318b08f113dfd12c436fc2d51b56a43
rsrctypearticles
prefilterarticles
languageeng
creationdate2017
topic
0Base Sequence
1Computer Simulation
2Conserved sequence
3Coverage
4Deoxyribonucleic acid
5DNA
6DNA sequencing
7Enzymes
8Flowering
9Flowers & plants
10Magnoliopsida - classification
11Magnoliopsida - genetics
12Models, Biological
13Mutation
14Phylogenetics
15Phylogeny
16Sequence Analysis, DNA
17Stochasticity
18Taxa
19Trees
toplevel
0peer_reviewed
1online_resources
creatorcontrib
0Eaton, Deren A R
1Spriggs, Elizabeth L
2Park, Brian
3Donoghue, Michael J
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7ProQuest Health & Medical Complete (Alumni)
8MEDLINE - Academic
9OpenAIRE
jtitleSystematic biology
delivery
delcategoryRemote Search Resource
fulltextfulltext
addata
au
0Eaton, Deren A R
1Spriggs, Elizabeth L
2Park, Brian
3Donoghue, Michael J
formatjournal
genrearticle
ristypeJOUR
atitleMisconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
jtitleSystematic biology
addtitleSyst Biol
date2017-05-01
risdate2017
volume66
issue3
spage399
epage412
pages399-412
issn1063-5157
eissn1076-836X
abstractRestriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10× the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies.
copEngland
pubOxford University Press
pmid27798402
doi10.1093/sysbio/syw092
oafree_for_read