PGS: Quality Score (PGS:QS)

This page corresponds to the PGS:QS subsection in the manuscript Methods. Related pages: Imputation panel comparison (HRC vs TOPMed) · Missing SNPs and PGS-impute.

PGS:Quality Score (PGS:QS) is defined to provide a quantitative assessment of the proportion of variation of the score that is captured by the SNPs genotyped/imputed for the individual.

The SBayesRC output file includes estimates (posterior means) of effect sizes of each SNP, and also an estimate of the variance explained by each SNP for the trait.

The SBayesRC estimated effect size are joint effect sizes, hence the variances explained by SNPs are independent given the LD between causal variants and can be summed for each trait to give the total estimated variance captured by the PGS in the SNP list.

This R script can be applied to any SNP list (bim file as existing SNP list, or a nopred file from Plink profiling function as a missing list) and predictor file easily.

args=commandArgs(trailingOnly = TRUE)
snp.list.file=args[2]  ## this is the .nopred file generated by plink --score function 
predictor.file=args[1] ## this is the predictor file you used to profile PGS.

library(data.table)
pred = data.frame(fread(predictor.file))
snp.list = read.table(snp.list.file)

y = formatC(sum(pred$VarExplained), format="e", digits =2 ) 
print(paste0("Genetic variation of this trait is ", y, ", captured by the ", nrow(pred), " SNPs." ))

# Check the extension of snp.list.file and run the corresponding function
if (grepl("\\.nopred$", snp.list.file)) {
  
  pred.subset = pred[!pred$Name %in% snp.list$V2,]
  x = round(100 * sum(pred.subset$VarExplained) / sum(pred$VarExplained), digits = 2)
  print(paste0("The ",  nrow(pred.subset )  , " SNPs in the target data can capture ", x , "% of it." ))
  
} else if (grepl("\\.bim$", snp.list.file)) {
  
  pred$targetA1 = snp.list[match(pred$Name, snp.list$V2) , "V5"]
  pred$targetA2 = snp.list[match(pred$Name, snp.list$V2) , "V6"]
  pred.subset = pred[pred$Name %in% snp.list$V2,]
  pred.subset = pred.subset[which(pred.subset$A1==pred.subset$targetA1 | pred.subset$A1 == pred.subset$targetA2),]
  x = round(100 * sum(pred.subset$VarExplained) / sum(pred$VarExplained), digits = 2)
  print(paste0("The ", nrow(pred.subset)  , " SNPs in the target data can capture ", x , "% of it." ))
  
} else {
  stop("Error: second input must end with .nopred or .bim")
}

Also see: Imputation panel comparison · Missing SNPs and PGS-impute