The UKB Round 2 GWAS contains 11685 GWAS of 4236 unique phenotype codes (3011 PHESANT + 559 FinnGen + 633 ICD10 + 31 biomarkers + 2 covariates). For many of these phenotypes, however, there are multiple GWAS, due to:

- Consideration of both inverse rank-normal transformed (IRNT) and raw untransformed versions of 305 continuous phenotypes
- Consideration of whether to include the dilution factor as a covariate for the 31 biomarker phenotypes
- Available sex-stratified results for most phenotypes
- Redundancy among the FinnGen codes constructed from ICD diagnoses

Our first task for the LD Score regression analyses, then, is to get a “primary” result for each phenotype.

For the biomarker assays, UK Biobank has reported that some samples were unintentionally diluted (pdf) during processing. Although an effort has been made to estimate the dilution fraction and correct assay values accordingly, the estimated dilution fraction has also been reported for potential use in additional modelling. For instance, other initial analyses of the biomarker data have opted to include the dilution fraction as a covariate in regression analyses.

For the Neale Lab GWAS, the analysis was performed both with and without the dilution fraction as a covariate (along with the same GWAS covariates for age, sex, and PCs used for all phenotypes). We evaluate here whether to use the GWAS results with or without the dilution fraction covariate as the primary GWAS for the purposes of the ldsc \(h^2_g\) analyses here.

If the dilution fraction covariate controls for substantial noise in the phenotype, we may anticipate stronger \(h^2_g\) results (higher point estimate, more significant) when the covariate is included.

*Takeaway:* \(h^2_g\) estimate and significance are not meaningfully affected by the addition of the dilution factor covariate.

If the dilution fraction is somehow correlated with the genetic data in a way that would lead to overall inflation of the GWAS results (e.g. some correlation with residual population structure), we may anticipate genome-wide inflation to be evident in the intercept results (higher point estimate, more significant) when the covariate is omitted.

*Takeaway:* The intercept estimate and significance are not substantially affected by the addition of the dilution factor covariate. If anything, adding the dilution factor covariate reduces the stability of the intercept (lower significance) without affecting the point estimate.

Overall, the inclusion of the dilution factor covariate has minimal impact on the results for the biomarkers. Therefore in the interest of simplicity we treat the GWAS without the dilution factor covariate as the primary analysis for the biomarker phenotypes. (Results for the GWAS with the dilution factor covariate still appear in the complete results file though.) The only remaining variation in covariates across the analyses is that sex is omitted as a covariate in sex-specific GWAS.

The Round 1 GWAS rank-normalized (IRNT) all continuous phenotypes. In Round 2, untransformed copies of the continuous phenotypes were also GWASed for the purposes of evaluating whether rank-normalizing was beneficial. Here we compare the raw and IRNT versions of all of the continuous phenotypes from PHESANT that were GWASed in `both_sexes`

(i.e. that aren’t sex-specific).

Specifically, we evaluate whether:

- IRNT vs. raw provide better \(h^2_g\) results (i.e. higher \(h^2_g\), smaller \(h^2_g\) SE, and/or stronger \(h^2_g\) significance)
- IRNT vs. raw provide better control of stratification (i.e. higher intercept, and/or stronger intercept significance)

We first look at heritability:

*Takeaway:* \(h^2_g\) results are generally consistent, but with higher \(h^2_g\) for the IRNT versions of each phenotype.