Interpreting GWAS Results to Determine Risk Allele


Many GWAS provide their data in supplemental tables, these data are available on GRASP, the GWAS catalog, and GWASdb. GRASP is the most comprehensive of these with over 2 million SNPs, compared to closer to 20k for the other 2. Unfortunately, it is also the only database that does not include risk allele data.

I asked the GRASP mailing list at [email protected] if there is any way to get risk allele data from GRASP, and they confirmed that there is not:

GRASP Mailing List Response

In short, no we do not track modeled alleles. In our experience this information has been too inconsistent in the way it was reported (e.g., reference strand, terminology used) and/or often missing in the literature (particularly when you dig deeper into Supplemental Tables that contain many results as we have done). Thus, we made a decision early on not to struggle with this. This is potentially a major limitation depending on your intended use and other catalogs like NHGRI-EBI can be helpful in filling in these gaps, though also may miss many results that are in GRASP.

A few points that could be helpful depending on your end use:
1) The minor allele in the general European ancestry population may often (though certainly not always) be the one modeled
2) GRASP provides the source context of each result so if you are interested in a limited number of results it is generally not time-consuming to go back to the original source for additional information (where available)
3) We have started posting full GWAS results as a separate feature in GRASP. While we do not reformat these results they nearly always contain information on alleles:

Thanks for your question.

AndrewAndrew from [email protected]

Getting Risk Alleles from GRASP

To get risk alleles from GRASP, the fastest way is to pull the pubmed ID and paper location for the study for the rsID/trait of interest. If you need to query many SNPs, the fastest way is going to be to use to dump a pandas dataframe of the SNPs including only rsID, trait, pubmed ID, and paper location. You can then get the paper directly from pubmed by prepending to the ID and following links to the tables.

Important If the paper_loc field is ‘FullData‘, you can get the data file from the GRASP updates site.

Tables can be extracted from PDFs using tabula and can usually be copied directly into vim->libreoffice from there.

As the data is going to be intersected with the GRASP database, the only data to keep is the minimal set of data to get the risk allele, as well as sufficient data (rsID, trait, and PMID) to do the intersection with GRASP.

Getting raw data from GRASP updates

GRASP hasn’t been updated since 2013, but the raw results from several papers is available on the GRASP updates webpage. The easiest way to get the data is to check the readmes first to make sure the study of interest has sufficient data to get the risk allele.

Getting Risk Alleles From GWAS Tables

In many cases it just isn’t possible to use data from published GWAS to find the risk allele, many studies just do not publish enough information by either skipping allele information altogether and just publishing p-values, or by publishing an odds-ratio or beta without stating how it was calculated.

However, there are a few tricks/heuristics that can be used to get the data:

  • Sometimes the way the OR was calculated (e.g. with respect to the minor allele) is given in the methods or table description. The allele they identify as the coded allele is then allele1 for the OR calculation and thus the risk allele is the coded allele if the OR is greater than 1.
  • A large number of studies provide an allele1, allele2, and OR. In these cases it is almost always the case that the OR was calculated with respect to allele1, and thus the risk allele is allele1 if the OR is greater than 1. However, as the authors give no explicit information in this case, you need to treat this data with a large grain of salt, you could easily have it backwards.
  • Many studies give the MAF for both cases and controls along with the minor allele, sometimes with an OR as well. In this case the risk allele is the minor allele if the MAF is higher in the the case vs control. When studies have both the MAFs and the OR, in my experience of a few dozen studies, the MAF calculation and OR direction always match, which gives me more confidence in the last method I mentioned (bullet 2).
  • Sometimes studies do not provide the minor allele, but just provide the MAFs. In these cases you can still get the risk allele in cases where you can query the SNP on dbSNP. The key is that the population studied should be the same as the population data in dbSNP, and dbSNP should have only 1 alternate allele for that SNP. Sometimes there are multiple alternate alleles, in this case you can never have confidence that you have the right minor allele. In the cases where there is obviously one major and one minor allele for your population though, you can use that to pick the risk allele using the technique mentioned above.
  • If the effect size is given and explicitly stated to be for one allele (usually A1), then a negative effect means A2 is the risk allele; A1 would thus be the risk allele given a positive effect.
  • Sometimes a direction column is given, usually in meta-analyses. This will have a format something like ‘++’, ‘+-‘, ‘–‘. This tells you the direction of effect in each of the meta-analyzed sub-components. Thus, a straight run of pluses means A1 is the risk allele, a straight run of minuses means A2 is the risk allele, and anything else requires more complex interpretation.


GWAS catalog
GRASP updates

Related Articles