schliessen

Filtern

 

Bibliotheken

PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades

Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits t... Full description

Journal Title: Journal of molecular evolution 2008-10-15, Vol.67 (5), p.465-487
Main Author: Alexe, G
Other Authors: Vijaya Satya, R , Seiler, M , Platt, D , Bhanot, T , Hui, S , Tanaka, M , Levine, A. J , Bhanot, G
Format: Electronic Article Electronic Article
Language: English
Subjects:
Publisher: New York: Springer-Verlag
ID: ISSN: 0022-2844
Link: https://www.ncbi.nlm.nih.gov/pubmed/18855041
Zum Text:
SendSend as email Add to Book BagAdd to Book Bag
Staff View
recordid: cdi_gale_infotracacademiconefile_A231688907
title: PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades
format: Article
creator:
  • Alexe, G
  • Vijaya Satya, R
  • Seiler, M
  • Platt, D
  • Bhanot, T
  • Hui, S
  • Tanaka, M
  • Levine, A. J
  • Bhanot, G
subjects:
  • Analysis
  • Animal Genetics and Genomics
  • Animals
  • Article
  • Biomedical and Life Sciences
  • Cell Biology
  • Cluster Analysis
  • Computer science
  • Computer Simulation
  • Continental Population Groups - genetics
  • Databases, Nucleic Acid
  • DNA, Mitochondrial - genetics
  • Emigration and Immigration
  • Evolution
  • Evolution, Molecular
  • Evolutionary Biology
  • Genetics
  • Humans
  • Life Sciences
  • Microbiology
  • Mitochondrial DNA
  • Molecular biology
  • Molecular genetics
  • Mutation - genetics
  • Pan paniscus - genetics
  • Pan troglodytes - genetics
  • Phylogeny
  • Plant Genetics and Genomics
  • Plant Sciences
  • Polymorphism
  • Polymorphism, Genetic
  • Principal Component Analysis
  • Principal components analysis
  • Universities and colleges
ispartof: Journal of molecular evolution, 2008-10-15, Vol.67 (5), p.465-487
description: Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a “common” polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k  = 2,3,4,…, k max clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a “European R clade” containing the haplogroups H, V, H/V, J, T, and U and a “Eurasian N subclade” including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For com
language: eng
source:
identifier: ISSN: 0022-2844
fulltext: no_fulltext
issn:
  • 0022-2844
  • 1432-1432
url: Link


@attributes
NO1
SEARCH_ENGINEprimo_central_multiple_fe
SEARCH_ENGINE_TYPEPrimo Central Search Engine
RANK2.1668553
LOCALfalse
PrimoNMBib
record
control
sourceidgale_proqu
recordidTN_cdi_gale_infotracacademiconefile_A231688907
sourceformatXML
sourcesystemPC
galeidA231688907
sourcerecordidA231688907
originalsourceidFETCH-LOGICAL-c3457-dd4bcc22c6f2dcf2a3a81a046e55ed88c7ca6de02ca50950d0bb6b60778b23330
addsrcrecordideNp9kV9r1TAYh4Mo7jj9AN5I8UKvur1Jmj-9LGdTB9scotchTd4eO9L22LTC-fZL7QFl4AgkJHmelzf5EfKWwhkFUOcRgPEyB9B5SQudq2dkQwvO8mV6TjbpmuVMF8UJeRXjPQBVouQvyQnVWggo6IZc3m2rzPY-24Y5Tji2_S77hr_RhqwKad_bCbNuuritsrufhzDssD9kQ5Pd_pFukmY9xtfkRWNDxDfH9ZT8-HT5ffslv_76-WpbXeeOF0Ll3he1c4w52TDvGma51dRCIVEI9Fo75az0CMxZAaUAD3UtawlK6ZpxzuGUfFzr7sfh14xxMl0bHYZgexzmaLSSiaNcJvLDk6QsVcnTfyTw_SPwfpjTs0M0tFRCSa4X6GyFdjagaftmmEbr0vDYtW7osWnTecU4lVqXsAh0Fdw4xDhiY_Zj29nxYCiYJTuzZmdSdmbJzizOu2Mnc92h_2scw0qAelTUtZOd2qFP3bThydJsNeN-SRjHfx75X-kB3rGyWw
sourcetypeAggregation Database
isCDItrue
recordtypearticle
pqid197576387
display
typearticle
titlePCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades
creatorAlexe, G ; Vijaya Satya, R ; Seiler, M ; Platt, D ; Bhanot, T ; Hui, S ; Tanaka, M ; Levine, A. J ; Bhanot, G
creatorcontribAlexe, G ; Vijaya Satya, R ; Seiler, M ; Platt, D ; Bhanot, T ; Hui, S ; Tanaka, M ; Levine, A. J ; Bhanot, G
descriptionPhylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a “common” polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k  = 2,3,4,…, k max clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a “European R clade” containing the haplogroups H, V, H/V, J, T, and U and a “Eurasian N subclade” including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1–20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 ± 14,000 years before present.
identifier
0ISSN: 0022-2844
1EISSN: 1432-1432
2DOI: 10.1007/s00239-008-9148-7
3PMID: 18855041
languageeng
publisherNew York: Springer-Verlag
subjectAnalysis ; Animal Genetics and Genomics ; Animals ; Article ; Biomedical and Life Sciences ; Cell Biology ; Cluster Analysis ; Computer science ; Computer Simulation ; Continental Population Groups - genetics ; Databases, Nucleic Acid ; DNA, Mitochondrial - genetics ; Emigration and Immigration ; Evolution ; Evolution, Molecular ; Evolutionary Biology ; Genetics ; Humans ; Life Sciences ; Microbiology ; Mitochondrial DNA ; Molecular biology ; Molecular genetics ; Mutation - genetics ; Pan paniscus - genetics ; Pan troglodytes - genetics ; Phylogeny ; Plant Genetics and Genomics ; Plant Sciences ; Polymorphism ; Polymorphism, Genetic ; Principal Component Analysis ; Principal components analysis ; Universities and colleges
ispartofJournal of molecular evolution, 2008-10-15, Vol.67 (5), p.465-487
rights
0Springer Science+Business Media, LLC 2008
1COPYRIGHT 2008 Springer
lds50peer_reviewed
citedbyFETCH-LOGICAL-c3457-dd4bcc22c6f2dcf2a3a81a046e55ed88c7ca6de02ca50950d0bb6b60778b23330
citesFETCH-LOGICAL-c3457-dd4bcc22c6f2dcf2a3a81a046e55ed88c7ca6de02ca50950d0bb6b60778b23330
links
openurl$$Topenurl_article
thumbnail$$Usyndetics_thumb_exl
backlink$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18855041$$D View this record in MEDLINE/PubMed
search
creatorcontrib
0Alexe, G
1Vijaya Satya, R
2Seiler, M
3Platt, D
4Bhanot, T
5Hui, S
6Tanaka, M
7Levine, A. J
8Bhanot, G
title
0PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades
1Journal of molecular evolution
addtitle
0J Mol Evol
1J Mol Evol
descriptionPhylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a “common” polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k  = 2,3,4,…, k max clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a “European R clade” containing the haplogroups H, V, H/V, J, T, and U and a “Eurasian N subclade” including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1–20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 ± 14,000 years before present.
subject
0Analysis
1Animal Genetics and Genomics
2Animals
3Article
4Biomedical and Life Sciences
5Cell Biology
6Cluster Analysis
7Computer science
8Computer Simulation
9Continental Population Groups - genetics
10Databases, Nucleic Acid
11DNA, Mitochondrial - genetics
12Emigration and Immigration
13Evolution
14Evolution, Molecular
15Evolutionary Biology
16Genetics
17Humans
18Life Sciences
19Microbiology
20Mitochondrial DNA
21Molecular biology
22Molecular genetics
23Mutation - genetics
24Pan paniscus - genetics
25Pan troglodytes - genetics
26Phylogeny
27Plant Genetics and Genomics
28Plant Sciences
29Polymorphism
30Polymorphism, Genetic
31Principal Component Analysis
32Principal components analysis
33Universities and colleges
issn
00022-2844
11432-1432
fulltextfalse
rsrctypearticle
creationdate2008
recordtypearticle
recordideNp9kV9r1TAYh4Mo7jj9AN5I8UKvur1Jmj-9LGdTB9scotchTd4eO9L22LTC-fZL7QFl4AgkJHmelzf5EfKWwhkFUOcRgPEyB9B5SQudq2dkQwvO8mV6TjbpmuVMF8UJeRXjPQBVouQvyQnVWggo6IZc3m2rzPY-24Y5Tji2_S77hr_RhqwKad_bCbNuuritsrufhzDssD9kQ5Pd_pFukmY9xtfkRWNDxDfH9ZT8-HT5ffslv_76-WpbXeeOF0Ll3he1c4w52TDvGma51dRCIVEI9Fo75az0CMxZAaUAD3UtawlK6ZpxzuGUfFzr7sfh14xxMl0bHYZgexzmaLSSiaNcJvLDk6QsVcnTfyTw_SPwfpjTs0M0tFRCSa4X6GyFdjagaftmmEbr0vDYtW7osWnTecU4lVqXsAh0Fdw4xDhiY_Zj29nxYCiYJTuzZmdSdmbJzizOu2Mnc92h_2scw0qAelTUtZOd2qFP3bThydJsNeN-SRjHfx75X-kB3rGyWw
startdate20081015
enddate20081015
creator
0Alexe, G
1Vijaya Satya, R
2Seiler, M
3Platt, D
4Bhanot, T
5Hui, S
6Tanaka, M
7Levine, A. J
8Bhanot, G
general
0Springer-Verlag
1Springer
2Springer Nature B.V
scope
0CGR
1CUY
2CVF
3ECM
4EIF
5NPM
6AAYXX
7CITATION
8BSHEE
93V.
107QL
117QP
127QR
137T7
147TK
157U9
167X7
177XB
1888A
1988E
208AO
218FD
228FE
238FH
248FI
258FJ
268FK
278G5
28ABUWG
29AZQEC
30BBNVY
31BENPR
32BHPHI
33C1K
34DWQXO
35FR3
36FYUFA
37GHDGH
38GNUQQ
39GUQSH
40H94
41HCIFZ
42K9.
43LK8
44M0S
45M1P
46M2O
47M7N
48M7P
49MBDVC
50P64
51PADUT
52PQEST
53PQQKQ
54PQUKI
55PRINS
56Q9U
57RC3
587X8
sort
creationdate20081015
titlePCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades
authorAlexe, G ; Vijaya Satya, R ; Seiler, M ; Platt, D ; Bhanot, T ; Hui, S ; Tanaka, M ; Levine, A. J ; Bhanot, G
facets
frbrtype5
frbrgroupidcdi_FETCH-LOGICAL-c3457-dd4bcc22c6f2dcf2a3a81a046e55ed88c7ca6de02ca50950d0bb6b60778b23330
rsrctypearticles
prefilterarticles
languageeng
creationdate2008
topic
0Analysis
1Animal Genetics and Genomics
2Animals
3Article
4Biomedical and Life Sciences
5Cell Biology
6Cluster Analysis
7Computer science
8Computer Simulation
9Continental Population Groups - genetics
10Databases, Nucleic Acid
11DNA, Mitochondrial - genetics
12Emigration and Immigration
13Evolution
14Evolution, Molecular
15Evolutionary Biology
16Genetics
17Humans
18Life Sciences
19Microbiology
20Mitochondrial DNA
21Molecular biology
22Molecular genetics
23Mutation - genetics
24Pan paniscus - genetics
25Pan troglodytes - genetics
26Phylogeny
27Plant Genetics and Genomics
28Plant Sciences
29Polymorphism
30Polymorphism, Genetic
31Principal Component Analysis
32Principal components analysis
33Universities and colleges
toplevelpeer_reviewed
creatorcontrib
0Alexe, G
1Vijaya Satya, R
2Seiler, M
3Platt, D
4Bhanot, T
5Hui, S
6Tanaka, M
7Levine, A. J
8Bhanot, G
collection
0Medline
1MEDLINE
2MEDLINE (Ovid)
3MEDLINE
4MEDLINE
5PubMed
6CrossRef
7Academic OneFile (A&I only)
8ProQuest Central (Corporate)
9Bacteriology Abstracts (Microbiology B)
10Calcium & Calcified Tissue Abstracts
11Chemoreception Abstracts
12Industrial and Applied Microbiology Abstracts (Microbiology A)
13Neurosciences Abstracts
14Virology and AIDS Abstracts
15Health & Medical Collection
16ProQuest Central (purchase pre-March 2016)
17Biology Database (Alumni Edition)
18Medical Database (Alumni Edition)
19ProQuest Pharma Collection
20Technology Research Database
21ProQuest SciTech Collection
22ProQuest Natural Science Collection
23Hospital Premium Collection
24Hospital Premium Collection (Alumni Edition)
25ProQuest Central (Alumni) (purchase pre-March 2016)
26Research Library (Alumni Edition)
27ProQuest Central (Alumni Edition)
28ProQuest Central Essentials
29Biological Science Collection
30ProQuest Central
31Natural Science Collection
32Environmental Sciences and Pollution Management
33ProQuest Central Korea
34Engineering Research Database
35Health Research Premium Collection
36Health Research Premium Collection (Alumni)
37ProQuest Central Student
38Research Library Prep
39AIDS and Cancer Research Abstracts
40SciTech Premium Collection
41ProQuest Health & Medical Complete (Alumni)
42ProQuest Biological Science Collection
43Health & Medical Collection (Alumni Edition)
44Medical Database
45Research Library
46Algology Mycology and Protozoology Abstracts (Microbiology C)
47Biological Science Database
48Research Library (Corporate)
49Biotechnology and BioEngineering Abstracts
50Research Library China
51ProQuest One Academic Eastern Edition
52ProQuest One Academic
53ProQuest One Academic UKI Edition
54ProQuest Central China
55ProQuest Central Basic
56Genetics Abstracts
57MEDLINE - Academic
jtitleJournal of molecular evolution
delivery
delcategoryRemote Search Resource
fulltextno_fulltext
addata
au
0Alexe, G
1Vijaya Satya, R
2Seiler, M
3Platt, D
4Bhanot, T
5Hui, S
6Tanaka, M
7Levine, A. J
8Bhanot, G
formatjournal
genrearticle
ristypeJOUR
atitlePCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades
jtitleJournal of molecular evolution
stitleJ Mol Evol
addtitleJ Mol Evol
date2008-10-15
risdate2008
volume67
issue5
spage465
epage487
pages465-487
issn0022-2844
eissn1432-1432
abstractPhylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a “common” polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k  = 2,3,4,…, k max clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a “European R clade” containing the haplogroups H, V, H/V, J, T, and U and a “Eurasian N subclade” including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1–20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 ± 14,000 years before present.
copNew York
pubSpringer-Verlag
pmid18855041
doi10.1007/s00239-008-9148-7