Loh Lab



The HI-CNV software (Hujoel et al. 2022 Cell) calls copy-number variants (CNVs) from SNP-array genotyping probe intensity data in large cohorts. HI-CNV uses a haplotype-informed approach that improves power to detect shorter CNVs (spanning as few as two genotyping probes) by incorporating probe data from multiple individuals who share an extended SNP haplotype—and are thus likely to also share any CNVs co-inherited within these shared genomic segments. We have tested HI-CNV on the UK Biobank (N=500K) and BioBank Japan (N=200K) data sets.


The Eagle software estimates haplotype phase either using a phased reference panel or within a genotyped cohort. Eagle2 (Loh et al. 2016b Nat Genet) uses a new, very fast HMM-based algorithm that improves speed and accuracy over existing methods via two key ideas: a new data structure based on the positional Burrows-Wheeler transform and a rapid search algorithm that explores only the most relevant paths through the HMM. Compared to the Eagle1 algorithm (Loh et al. 2016a Nat Genet), Eagle2 has similar speed but much greater accuracy at sample sizes <50,000. Users of the Eagle software include the Haplotype Reference Consortium’s public phasing and imputation servers (at Sanger and Michigan), the NIH TOPMed and GTEx consortiums, and the VA Million Veterans Program.


The BOLT-LMM algorithm (Loh et al. 2015a Nat Genet) rapidly computes statistics for association between phenotype and genotypes using a linear mixed model (LMM). The BOLT-REML algorithm partitions SNP-heritability and estimates genetic correlations using a Monte Carlo algorithm for fast multi-component, multi-trait modeling (Loh et al. 2015b Nat Genet). By default, BOLT-LMM association analysis assumes a Bayesian mixture-of-normals prior for the random effect attributed to SNPs other than the one being tested. This model generalizes the standard “infinitesimal” mixed model used by existing mixed model association methods, providing an opportunity for increased power to detect associations while controlling false positives. The BOLT-LMM and BOLT-REML packages have been widely adopted for GWAS analyses of >50,000-sample cohorts, e.g,. UK Biobank (Loh et al. 2018 Nat Genet).


ALDER (Loh, Lipson et al. 2013 Genetics) is a software package that computes weighted linkage disequilibrium (LD) curves, which can be used to infer admixture parameters including dates, mixture proportions, and phylogeny. This package extends the methodology of the ROLLOFF software in ADMIXTOOLS (Patterson et al. 2012).


MixMapper (Lipson, Loh et al. 2013 MBE) is a software package that analyzes allele frequency correlations among multiple populations simultaneously to build a tree (or “admixture graph”) of population relationships that incorporates the possibility of mixture. This package complements the qpgraph software in ADMIXTOOLS, with the key difference that it semi-automatically searches the space of possible admixture graph topologies to find the best fit for the data.