Poster (Scientific congresses and symposiums)
LD-based haplotype encoding scheme with iterative pruning principal component analysis (ipPCA) to retrieve population substructures
Chaichoompu, Kridsadakorn; Fouladi, Ramouna; Wangkumhang, Pongsakorn et al.
2014The Human Genome Meeting (HGM 2014)
 

Files


Full Text
poster_ldippca_14_04_2014.pdf
Author preprint (7.77 MB)
Request a copy

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
LD; haplotype; principal component analysis; PCA; ipPCA
Abstract :
[en] Objective To identify and differentiate between subpopulations using a rich set of genetic markers, as using reduced sets of genetic markers for these purposes can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist. Method Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. Indeed, knowledge about haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Here, we use iterative pruning principal component analysis (ipPCA) [1] to identify and characterize subpopulations in an unsupervised way. As input data, either pruned genome-wide SNP data are used (using PLINK 1.9 with the "indep-pairwise" option, window size = 100k, r2 < 0.25) or multilocus haplotype information derived from the genome-wide SNP panel (using BEAGLE 3.3.2 to infer haplotype). These approaches are applied to real-life data from 992 Thai individuals [2]. Result Preliminary results indicate that ipPCA applied to pruned SNP data or ipPCA that explicitly uses multilocus information (haplotypes) give complementary information about population substructure for geographically confined populations such as the Thai samples in this study. Both methods address different aspects of population structure. Detailed simulation studies are needed to identify the optimal scenarios for haplotype-based ipPCA. Conclusion In this work, we propose to combine an LD-based haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. Despite the complexities that are associated with haplotype inference, added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. References 1. Intarapanich, A., et al., Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics, 2009. 10: p. 382. 2. Wangkumhang, P., et al., Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure. PLoS One, 2013. 8(11): p. e79522.
Research center :
Systems and Modeling Unit
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Chaichoompu, Kridsadakorn ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Fouladi, Ramouna ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Wangkumhang, Pongsakorn;  National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Wilantho, Alisa;  National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Chareanchim, Wanwisa;  National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Tongsima, Sissades;  National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Sakuntabhai, Anavaj;  Institut Pasteur, France > Functional Genetics of Infectious Diseases Unit
Van Steen, Kristel  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
English
Title :
LD-based haplotype encoding scheme with iterative pruning principal component analysis (ipPCA) to retrieve population substructures
Publication date :
29 April 2014
Number of pages :
A0
Event name :
The Human Genome Meeting (HGM 2014)
Event organizer :
University of Geneva
Event place :
Geneva, Switzerland
Event date :
27-04-2014 to 30-04-2014
By request :
Yes
Audience :
International
Name of the research project :
Foresting in Integromics Inference
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
Available on ORBi :
since 16 May 2014

Statistics


Number of views
234 (24 by ULiège)
Number of downloads
1 (1 by ULiège)

Bibliography


Similar publications



Contact ORBi