GWAS Dataset

Genome-Wide Association Study (GWAS) Dataset Released

This dataset includes genetic variations found in 882 poplar trees, and provides useful information to scientists studying plants as well as researchers more generally in the fields of biofuels, materials science, and secondary plant compounds.

For nearly 10 years, researchers with DOE’s BioEnergy Science Center (BESC), a multi-institutional organization headquartered at ORNL, have studied the genome of Populus — a fast-growing perennial tree recognized for its economic potential in biofuels production. This Genome-Wide Association Study (GWAS) dataset includes more than 28 million single nucleotide polymorphisms, or SNPs that have been derived from 17 trillion bases of sequence data generated from 882 undomesticated Populus genotypes. Each SNP represents a variation in a single DNA nucleotide, or building block, that can act as a biological marker and/or causal allele within a protein sequence, helping scientists locate genes associated with certain characteristics, conditions or diseases.

The results of this analysis have been used, among other things, to 1) seek genetic control of cell-wall recalcitrance — a natural characteristic of plant cell walls that prevent the release of sugars under microbial conversion and restricts biofuels production and 2) identify the molecular mechanisms controlling deposition of lignin in plant structures. Lignin is a polyphenolic polymer that strengthens plant cell walls and acts as a barrier to microbial access to cellulose during saccharfication — the process of breaking cellulose down into simple sugars for fermentation.

Although the dataset’s most immediate applications are in fundamental plant sciences, ORNL researchers plan to use the GWAS data to inform applied work in areas such as cleaner, sustainable transportation biofuels, carbon fiber for lightweight vehicles and alternatives to conventional plastics and building insulation materials.

Press Release

Download the GWAS Dataset


Papers using the Poplar GWAS dataset are requested to include the following text in their acknowledgments:

“Support for the Poplar GWAS dataset is provided by the U.S. Department of Energy, Office of Science Biological and Environmental Research (BER) via the Bioenergy Science Center (BESC) under Contract No. DE-PS02-06ER64304.  The Poplar GWAS Project used resources of the Oak Ridge Leadership Computing Facility and the Compute and Data Environment for Science at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725”

Bioinformatic Resources for Populus Genome Editing

Populus VariantDB

Tools available to enable variant-free gRNA or primer design for genome editing in Populus tremula x alba INRA 717-IB4, Populus deltoides WV-94, and Populus trichocarpa Nisqually-1

Probe Search

Blast Search

Gene Model Search

AGEseq (Analysis of Genome Editing by Sequencing)

A stand-alone, self-executable program for efficient analysis of genome editing patterns from high-throughput sequencing of multiplexed samples. Sanger sequence data can also be used.

Download the program and instructions for PC or Mac installation, or use AGEseq on Galaxy for web-based analysis.

Additional Data

BESC genomes, BESC engineered nonselected candidate genes, other data and tools (CAT, dbCAN, DOOR, QUBIC and DOE KBase) are available at

Supported by the DOE Office of Science, Biological and Environmental Research