Results from genome-wide association studies (GWAS) suggest that complex diseases are often affected by many variants with small effects, known as polygenicity. Bayesian methods provide attractive tools for identifying signal in data where the effects are small but clustered. For example, by incorporating biological pathway membership in the prior they are able to integrate the ideas of gene set enrichment to identify groups of biologically significant genetic variants. Accumulating evidence suggests that genetic variants may affect multiple different complex diseases, a phenomenon known as pleiotropy.
In this work we propose frequenstist and Bayesian statistical method to leverage pleiotropic effects and incorporate prior pathway knowledge to increase statistical power and identify important risk variants. We offer novel feature selection methods for the group variable selection in multi-task regression problem. We develop methods using both penalised likelihood methods and Bayesian spike and slab priors to induce structured sparsity at a gene and SNP level. We implement Gibbs sampling algorithms for the Bayesian analysis and an alternating direction method of multipliers (ADMM) algorithm for the penalised regression methods. The performances of the proposed approaches are compared to state-of-the-art variable selection strategies on simulated data sets.
The penalised likelihood approaches are computationally efficient using alternating direction method of multipliers algorithm. These approaches perform reasonably well in variable selection but the reconstructed signal is underestimated. The multivariate Bayesian sparse group selection with spike and slab prior performed the best in terms of signal recovery. The Bayesian method provides a natural method for quantifying the variability of the estimated coefficients. Simulation results suggest that when computationally possible the Bayesian estimators should be used.
The developed statistical approaches is applied for enriching our insights about the genetic mechanisms of thyroid and breast cancer types. The analysed data come from case-control studies including 6677 SNPs from 618 genes from 10 non-overlapping gene pathways. The thyroid cancer data set includes 482 cases and 463 controls. The breast cancer data set includes 1172 cases and 1125 controls.Le texte complet de cet article est disponible en PDF.
Keywords : Pleiotropy, Pathway, Sparsity, Bayesian, Penalized regression