Methods Outline

  • Genotyping and phenotyping of ~500,000 individuals performed by the UK Biobank. Accessed under application 31063.

  • Phenotypes processed for GWAS using PHESANT and by extracting ICD10 codes (letter + digits) and recreating FinnGen endpoints.

  • GWAS performed by the Neale Lab using Hail on the Google Cloud Platform. A full technical discussion on the limitations of this analysis and recommendations for intepreting the results based on the Round 1 results, as well as more general details about our analysis of UK Biobank, are available elsewhere. Code is also available on Github.

  • GWAS summary statistics for HapMap3 sites passing QC (MAF > 0.01, INFO > 0.9, excluding multiallelics and the MHC region) in UK Biobank extracted in the proper format for ldsc using Hail.

  • SNP Heritability estimated using partitioned LD Score regression (LDSR; Bulik-Sullivan et al. 2015, Finucane et al. 2015) run using the MTAG implementation of ldsc with the Baseline-LD v1.1 model (Gazal et al. 2017) on the Google Cloud Platform.

  • Restricted primary SNP heritability analysis to one GWAS per phenotype. This includes identifying sex-specific phenotypes, choice of normalization for continuous phenotypes, evaluation of the dilution fraction for biomarker phenotypes, and pruning of FinnGen endpoints that are identical within current UKB data as detailed here. LDSR results for all primary GWAS are available from the SNP heritability browser.

  • Results for each primary GWAS evaulated for confidence based on power at potential biases as a function of sample size, the expected magnitude of the standard error on the SNP heritability estimate, and potential limitations to interpretation based on sex differences or ordinal encodings in phenotyping.

  • Among results with acceptable confidence, tested statistical significance of the LDSR SNP heritability estimate accounting for the effective multiple testing burden. More detail on the evaluation of potential thresholds for statistical significance in this analysis here.

  • Partitioning results from LDSR reported for phenotypes with strong SNP heritability results (\(z > 7\)). See phenome-wide plots and access per-phenotype results from the SNP heritability browser.

  • All hertiability results are reported on the liability scale, assuming that the population prevalence matches the prevalence in the UK Biobank analysis set. The significance of the estimate is not affected by this conversion, and the point estimate is only affected for binary phenotypes. The liablity scale value may be unreliable if the outcome was limited to a subset of individuals, or if the UK Biobank study population is not representative for this phenotype (which is likely in many cases). Observed scale \(h^2_g\) estimates are available on the per-phenotype results pages accessible from the SNP heritability browser and are included in the downloadable results.

  • Full code for the LDSR analysis is available on Github

Additional LDSR Details

  • Sumstats files were extracted directly from the UKB results files rather than running munge_sumstats.py from ldsc. This includes directly extracting HapMap3 sites rather than using the precomputed w_hm3.snplist reference file.

  • LDSR was run using the implementation in MTAG because it provides a convenient interface to ldsc from within python rather than via the command line.

  • The partitioned LDSR analysis was run using the v1.1 of the Baseline-LD annotations described by Gazal et al. 2017. The Baseline-LD v1.1 model contains 75 annotations including functional regions, histone marks, GERP scores, LD-related annotations of population genetics forces, and MAF bins. LD scores were computed from 1000 Genomes Phase 3 data from European ancestry populations, along with weights and allele frequencies. The LD scores and corresponding reference files are available from the ldsc reference downloads page).

  • Default settings were used with the exception of removing the maximum \(\chi^2\) filter in order to avoid bias in phenotypes that have extremely large \(\chi^2\) for loci with strong effect sizes in the large sample size of UKB.

  • We previously ran both univariate and partitioned heritability analyses. For the reasons written about here, we’re currently presenting the partitioned heritability LDSR results as the primary analysis in the summary browser. You can see how the current results compare to the Round 1 version here.

Site Design

Contact

If you want more information, please contact us at: nealelab.ukb@gmail.com.