Overview

The large sample size of UK Biobank, coupled with the large number of phenotypes only completed by a subset of participants and the number of rarer binary phenotypes with lower statistical power, provides and excellent opportunity to observe how LDSR results behave across GWAS sample sizes. We consider that relationship here, including providing estimated approximate power curves for detecting \(h^2_g\) in LD Score regression (more details in Methods).

In looking at these trends we largely focus on effective sample, defined for binary phenotypes as

\[N_{eff} = \frac{4}{\frac{1}{N_{cases}}+\frac{1}{N_{controls}}}\] and as the standard sample size for non-binary phenotypes. We expect this effective N to better capture statistical power for rare binary phenotypes. We compare how \(N\) and \(N_{eff}\) relate to the significance of \(h^2_g\) results below.


Observed SNP Heritabilty

The relationship between the SNP heritability estimate and sample size is of interest given we previously observed evidence that \(h^2_g\) estimates may be downwardly biased at low sample sizes. We’ve revisited that question in our evaluation of confidence in the current results, and reproduce the relevant figures here.

Estimates

Effective N

Note: Gray line is loess fit of liability scale \(h^2_g\) as a function of effective sample size. Colors indicate our confidence rating as described in the Methods section.


Total N

Note: Colors indicate our confidence rating as described in the Methods section.


Split by Intercept

Note: Plot restricted to \(N_{eff} <\) 200,000 and \(h^2_g\) between -0.2 and 0.4 for visibility.

Under the expectation of the LDSR model that the intercept equals \(1 + N\alpha\), we estimate \(\alpha = \frac{\text{intercept}-1}{n}\) as the amount of confounding or model misspecification adjusted for sample size. Compared to the ratio, this quantity shouldn’t depending on the amount of polygenic signal contributing to the mean \(\chi^2\). Then among phenotypes with at least some confidence in the LDSR results, we split the phenotypes according to deciles of the estimated \(\alpha\) and fit loess curves for the relationship between effective sample size \(N_{eff}\) and estimated SNP hertiability \(h^2_g\). Read more about this analysis in the Methods section.


Significance and Precision

As with most statistics, we expect potentially stronger significance and more precise estimates of the SNP heritability as the GWAS sample size increases. As shown below we generally observe that trend, but also note that phenotypes with higher \(h^2_g\) estimates tend to have less precise estimates. One possible explanation is that the regression weights used by LDSR are iterated with the \(h^2_g\) estimates and optimized assuming an infinitesimal model, and so LDSR estimates may become less efficient as causal variants become more sparse and have stronger effects.

We can also confirm here our expectation that the power of LDSR depends primarily on the effective sample size, rather than total sample size or the prevalence.

Inverse SE

Note: Colors reflect the magnitude of the \(h^2_g\) estimate truncated to values between 0 and 0.5.


Z by Total N

Note: Colors are proportional to effective sample size \(N_{eff}\). The utility of using \(N_{eff}\) instead of total N as a measure of power is especially evident when restricting this plot to binary variables using the filters on the right.


Z by Neff and Prevalence

Note: Plot restricted to binary variables. For visibility, the prevalence depicted in the color scaling is the smaller of {prevalence, 1-prevalence}. We observe no clear relationship between prevalence and the Z score for \(h^2_g\) at a given effective N (i.e. within vertical bands of the plot); restricting to higher or lower total N using the filters on the right does not change this result.


Estimated Power Curves

As discussed when evaluating confidence in the LDSR results, we can model the expected standard error of LDSR SNP heritability estimates as a function of effective sample size, and use that predicted standard error to provide a rough estimate of LDSR’s statistical power to detect \(h^2_g > 0\) as a function of \(h^2_g\) and effective sample size.

Note that these power curves are a rough estimate, and are specific to estimation with the baseline-LD v1.1 model used in the current analysis. They also do not account for the above relationship between \(h^2_g\) and the standard error that may relate to the genetic architecture of the phenotype.

Nevertheless these curves may be useful to get a ballpark estimate of the effective GWAS sample size required to detect significant SNP heritability (p < .05), robustly estimate genetic correlations (z > 4), or robustly test annotations in partitioned heritability (z > 7).

p < .05


z > 4


z > 7


Intercept

We may also be interested in the behavior of the LDSR intercept term across sample sizes. When the true intercept is non-null (due to confounding in the GWAS or misspecification of the LDSR model) we expect the intercept the scale linearly with sample size. In constrast, we don’t expect a relationship between sample size and the ratio estimate. Reassuringly, we see no clear sample size trend in the ratio here.

Intercept

Note: Gray line is loess fit of the LDSR intercept as a function of effective sample size. Colors indicate our confidence rating as described in the Methods section.


Intercept Z score

Note: Colors indicate our confidence rating as described in the Methods section.


Intercept Ratio

Note: Ratio estimates restricted to phenotypes with mean \(\chi^2 > 1\). Clearer results are observed by restricting to high confidence results, continuous phenotypes, and/or higher effective sample sizes using the filter on the right. Colors indicate our confidence rating as described in the Methods section. The LDSR ratio estimate is:

\[\frac{\text{intercept}-1}{\text{mean}(\chi^2)-1}\]