Article Text

Original article
Mapping of hepatic expression quantitative trait loci (eQTLs) in a Han Chinese population
  1. Xiaoliang Wang1,
  2. Huamei Tang2,
  3. Mujian Teng3,
  4. Zhiqiang Li4,
  5. Jianguo Li5,
  6. Junwei Fan1,
  7. Lin Zhong1,
  8. Xing Sun1,
  9. Junming Xu1,
  10. Guoqing Chen1,
  11. Dawei Chen1,
  12. Zhaowen Wang1,
  13. Tonghai Xing1,
  14. Jinyan Zhang1,
  15. Li Huang1,
  16. Shuyun Wang1,
  17. Xiao Peng1,
  18. Shengying Qin4,
  19. Yongyong Shi4,
  20. Zhihai Peng1
  1. 1Department of General Surgery, Shanghai First People's Hospital, Medical College, Shanghai Jiaotong University, Shanghai, China
  2. 2Department of Pathology, Shanghai First People's Hospital, Medical College, Shanghai Jiaotong University, Shanghai, China
  3. 3Department of General Surgery, Shandong University Affiliated Qianfoshan Hospital, Jinan, China
  4. 4Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Bio-X Institutes, Ministry of Education, Shanghai Jiao Tong University; Shanghai genome Pilot Institutes for Genomics and Human Health, Shanghai, China
  5. 5Department of Hepatobiliary Surgery, Zhangzhou Hospital Affiliated to Fujian Medical University, Zhangzhou, China
  1. Correspondence to Dr Zhihai Peng, Department of General Surgery, Shanghai First People's Hospital, Medical College, Shanghai Jiaotong University. 100 Haining Road, Shanghai, 200080, The People's Republic of China; pengzhihai{at}sjtu.edu.cn Or Dr Yongyong Shi, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Bio-X Institutes, Ministry of Education, Shanghai Jiao Tong University; Shanghai genome Pilot Institutes for Genomics and Human Health. 50 West Guangyuan Road, Shanghai, 200030, The People's Republic of China; shiyongyong{at}gmail.com

Abstract

Background Elucidating the genetic basis underlying hepatic gene expression variability is of importance to understand the aetiology of the disease and variation in drug metabolism. To date, no genome-wide expression quantitative trait loci (eQTLs) analysis has been conducted in the Han Chinese population, the largest ethnic group in the world.

Methods We performed a genome-wide eQTL mapping in a set of Han Chinese liver tissue samples (n=64). The data were then compared with published eQTL data from a Caucasian population. We then performed correlations between these eQTLs with important pharmacogenes, and genome-wide association study (GWAS) identified single nucleotide polymorphisms (SNPs), in particular those identified in the Asian population.

Results Our analyses identified 1669 significant eQTLs (false discovery rate (FDR) < 0.05). We found that 41% of Asian eQTLs were also eQTLs in Caucasians at the genome-wide significance level (p=10−8). Both cis- and trans-eQTLs in the Asian population were also more likely to be eQTLs in Caucasians (p<10−4). Enrichment analyses revealed that trait-associated GWAS-SNPs were enriched within the eQTLs identified in our data, so were the GWAS-SNPs specifically identified in Asian populations in a separate analysis (p<0.001 for both). We also found that hepatic expression of very important pharmacogenetic (VIP) genes (n=44) and a manually curated list of major genes involved in pharmacokinetics (n=341) were both more likely to be controlled by eQTLs (p<0.002 for both).

Conclusions Our study provided, for the first time, a comprehensive hepatic eQTL analysis in a non-European population, further generating valuable data for characterising the genetic basis of human diseases and pharmacogenetic traits.

  • Clinical genetics
  • Genetics
  • Genome-wide
  • Molecular genetics

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

The liver is a vital human organ for a variety of physiological processes, and plays a major role in drug metabolism. Elucidating the basis for the genetic variation in hepatic gene expression will significantly further our understanding of human diseases and pharmacogenomics.

Expression quantitative trait loci (eQTLs) is one of the most effective ways to discover gene regulation networks.1 The eQTLs method measures the variance in gene transcription, followed by mapping the genetic loci affecting the expression of mRNA.2 To date, eQTLs mapping has been conducted in many species and in different tissues.3 Thus far, four genome-wide eQTLs studies in human liver tissue have been performed,4–7 and numerous eQTLs have been identified. However, no studies have been carried out in an East Asian population, one of the major ethnicities in the world. Detailed eQTLs mapping in different populations is crucial to understand the genetic heterogeneity in gene regulation, and the evolutionary predisposition to diseases. Notably, the hepatic metabolising capacity of Caucasian populations has already been shown to be different from that of Asians, as exemplified in metabolism variability of alcohol, testosterone, bilirubin, etc.8–10 Hepatic eQTLs studies in East Asians will be also crucial for understanding the genetic basis underlying various diseases and drug response variability, particularly in the East Asian population. For this reason, we have carried out a genome-wide eQTLs mapping in 64 normal livers of Han Chinese. Detailed comparison between the Asian and Caucasian eQTLs was conducted. Experimental validation in an independent sample set (n=54) was also performed. The relationship between Asian eQTLs and trait-associated single nucleotide polymorphisms (SNPs) as well as pharmacogene expression was investigated.

Materials and methods

Tissue sample collection

Normal (non-diseased) liver tissues were previously collected from 64 Chinese donors (all male) who provided informed consent. The average age of the subjects was 34.52±5.98 years. The independent sample set of liver tissue (n=54) (non-diseased healthy male donors, aged 35.65±7.34 years) were newly collected in Shanghai Jiao Tong University Affiliated First People's Hospital (Shanghai, China), Zhangzhou Hospital Affiliated to Fujian Medical University (Zhangzhou, China), and Shandong University Affiliated Qianfoshan Hospital (Jinan, China). This study was approved by the ethics committees of the medical faculties of Shanghai Jiao Tong University Affiliated First People's Hospital.

RNA sample preparation, hybridisation

Total RNA of the human liver tissue samples was extracted and 1.65 µg of each RNA sample was labelled and hybridised to the Agilent 44 K G4112F arrays (GPL4133) at Shanghai genomePilot Technology, Inc (Shanghai, China).

Gene expression microarray data preprocessing and normalisation

The dye-normalised and post-surrogate processed signal for the green channel, gProcessedSignal, obtained from Agilent's Feature Extraction Software was used for downstream analyses. The raw expression data for the 64 samples were evaluated for individual array quality (MA plots), array intensity distributions (box plots and density plots), and between-array differences (heat maps representing the distance between arrays) using the arrayQualityMetrics package. One outlier sample was dropped based on the arrayQualityMetrics default criteria.11 The quantile normalisation method was used to normalise the data,12 and the average values were obtained for the replicate spots. The expression intensity values were log2 transformed. The processes were implemented using Limma. After data collection, we ran a pipeline to re-annotate the Agilent G4112F microarray for the current reference assembly (NCBI Build 37.3). A total of 29 190 oligonucleotides on the array were validated for subsequent analyses.

DNA extraction, GWAS genotyping, and quality control

Total DNA of the human liver tissue samples were extracted by using the RNA/DNA Mini Kit (Qiagen, Hilden, Germany). DNA was diluted to working concentrations of 50 ng/μL for SNP chip genotyping. The genome-wide scan was performed using the Affymetrix Genome-Wide Human SNP Array 6.0. Quality control (QC) filtering of the GWAS data was performed by excluding arrays with a contrast QC<0.4 from further data analysis. The sex of each sample was determined using Genotyping Console, and none of them mismatched established and annotated sexes. Genotype data were generated using the birdseed algorithm.13 SNPs were further filtered based on annotation, call rate, Hardy-Weinberg equilibrium, and allele frequency information. As a result, among the initially genotyped 909 622 SNPs on the Affymetrix Genome-Wide Human SNP Array 6.0 platform, 4023 duplicated SNPs and 1175 SNPs that did not have chromosomal annotation were first removed from further analysis. We then removed 54 656 SNPs that had a genotyping call rate <95%, and 737 SNPs that deviated significantly from Hardy-Weinberg equilibrium at the threshold (p<10−4). Considering the small sample size of our dataset, we removed a further 54 656 SNPs that had minor allele frequency (MAF) <25%. In total, 302 483 SNPs remained for further analysis.

Quantification of ancestry and sample independence test

All the individuals are of self-reported Han Chinese ancestry. The multidimensional scaling (MDS) analysis was used to calculate the genome-wide IBS (identify-by-state) pairwise distances in PLINK V.1.07. All the samples resided in a single cluster, and no outlier was detected; thus all samples were used for analyses. Sample independence was also examined by using the IBD analysis in PLINK. The results showed that all individuals were independent of each other. No sample contamination, duplication, and significant family relationship were identified.

eQTLs mapping

We tested all expression traits for their associations with each of the QC passed SNPs using PLINK, which correlates allele dosage with changes in the trait. Only the SNPs within ± 2 megabase (Mb) of the transcription start or stop of the corresponding gene were tested for putative cis-eQTL. All the rest of the SNPs were tested for association to each expression trait for trans-eQTLs. To correct for the number of tests performed, a false discovery rate (FDR) of 0.05 was used as a cut-off for statistical significance.

There were incomplete records for demographic and clinical information for these samples. In order to assess the effect of hidden cofactors on eQTLs mapping, we performed corrections for gene expression variance using PEER (probabilistic estimation of expression residuals).14 Interestingly, the majority of significant cis-eQTLs disappeared when using the PEER corrected residuals for gene expression, suggesting that covariates actually had a minimum effect on the gene expression variance in our data.

To assess the statistical power to detect eQTLs, a power analysis was performed using the GWAPower program.15 We found that using our 63 samples (one sample was removed after data cleanup), with a validation p value of 10−4 (a threshold at which we expect to be able to follow-up with at least in silico replication studies), we have about 80% power to detect an eQTLs accounting for 25% of the phenotypic variation in expression levels.

Comparison of eQTLs between Asian and Caucasian populations

To check the effect of ethnicity on eQTLs mapping, we compared the results with that of a Caucasian dataset (GEO accession: GSE26106) by Innocenti et al,5 which was conducted with the same gene expression microarray platform (GEO#GPL4133). The GSE26105 dataset has 205 Caucasian liver samples which have both genotype and gene expression data. The same preprocessing and eQTLs mapping methods used in the Asian dataset were applied to the raw data of the gene expression and genotype downloaded from GEO. We focused on 64 964 SNPs and 24 340 gene probes which passed the QC in both datasets and have the genomic loci annotations.

To assess the eQTLs replication among the two populations more completely, we also performed a post-hoc genome-wide imputation to infer genotypic information for SNPs that had not been genotyped in our platform using the IMPUTE2 program16 after prephasing the genotypes with SHAPEIT.17 The 1000 genome phase I data (NCBI build 37) was used as a reference panel and default parameters were used in prephasing and imputation. After imputation, we obtained imputed data for 37 574 750 and 38 049 377 SNPs in the Han Chinese and Caucasian populations, respectively. The imputed data were further filtered based on the imputation quality (>30%) and MAF (>25% for Chinese and >5% for Caucasians), after which information for 2 624 722 and 6 763 377 SNPs remained for Chinese and Caucasian datasets, respectively, for further analysis.

Statistical analysis for enrichment tests and population divergence comparison

Trait-associated SNPs were obtained from the NIGHR catalogue (http://www.genome.gov/26525384). We downloaded all catalogued SNPs associated with human traits with genome-wide significance (10−8) (n=6437), among which we defined the Asian GWAS-SNPs (n=748) as the subset of SNPs identified in East Asian populations, mainly Chinese and Japanese populations. The very important pharmacogenetic (VIP) genes (n=49) were obtained from the Pharmacogenomics Knowledgebase (http://www.PharmGKB.org). The list of major pharmacokinetic genes (n=409) was obtained from a previous study.18

The enrichment significance of one dataset in another was calculated by the χ2 test, with all SNPs or genes from the Asian dataset as a background. A value of p<0.05 was considered significant. The binomial test was used for evaluating the enrichment of correlation with multiple gene probes in eQTLs hotspots.

To assess the influence of allele frequency in eQTLs replication between populations, we compared the distribution of FST (fixation index) values for the significant eQTLs between overlapped and non-overlapped groups, using the χ2 test. An FST value of 0.5 was used as a cut-off to compare the difference in the number of SNPs with or without overlap between the two populations. FST values were obtained from the FstSNP-HapMap3 database.19

Experimental confirmation of the gene expression and SNP gene expression association

The mRNA levels of seven genes (DDT, ERAP2, MRPL43, FADS1, BRCA1, CCND2, and PTPRE) were quantified in the livers using quantitative PCR (qPCR). Primers sequences are provided in online supplementary table S5. SNPs that were significantly associated with these gene expression profiles were also genotyped using PCR sequencing. Primer sequences are also listed in online supplementary table S5. The relationships between qPCR data and microarray data or SNP genotypes were determined using linear regression, with the significance cut-off set as 0.05.

Results

Gene expression profiling and SNP genotyping

We conducted a genome-wide eQTLs mapping in a set of Han Chinese liver tissue samples (n=63). SNPs were genotyped using Affymetrix SNP 6.0 chip, and genome-wide gene expression levels were profiled using Agilent G4112F array. After QC, 302 483SNPs remained for analysis. Microarray expression probes were re-annotated using a previous reported pipeline.9 A total of 29 190 probes were considered to be valid for subsequent analyses.

eQTLs mapping

At the 5% FDR level (p<9.45×10−9), we identified a total of 1669 eQTLs with 1322 SNPs significantly associated with the expression of 282 genes. Among these eQTLs, 1465 were classified as cis-eQTLs including 1198 SNPs for 217 genes, and 204 trans-eQTLs, with 178 SNPs significantly associated with 68 genes. The full list of association results at a study-wide significance level are provided in online supplementary table S1. Figure 1 shows the distribution of cis- and trans-eQTLs at the p<10−5 level, eQTLs hotspots, as well as the location of genes in the entire genome.

Figure 1

The combined plot depicting all eQTLs for hepatic gene expression traits with p<10−5 in our study. The bottom plot shows the genome-wide distribution of eQTLs results, with the SNP distribution on the x axis and expression probes on the y axis. Each dot represents a significant SNP-expression pair. Cis-eQTLs associations are shown in a diagonal direction and trans-eQTLs are shown in a vertical direction. Darker colour indicates more significant association. The middle plot shows hotspot eQTLs enrichment in correlation with expression probes, indicating the eQTLs as possible key regulators affecting expression of multiple genes. The size of the red circles indicate the number of gene expression traits correlated with the particular SNP. The top plot is the enrichment scale (–log p value), which shows the enrichment of correlation with multiple genes (–log p value from binomial test after Bonferroni correction). eQTL, expression quantitative trait loci; SNP, single nucleotide polymorphism.

We also studied the physical distance distribution (base pairs) of the most significant eQTLs for each gene to the gene's transcription start site (TSS). We found that the most significant eQTLs were enriched around TSS (figure 2), which is also consistent with previous reports.5 ,20

Figure 2

Histogram showing distances from each gene's best associated SNP to its TSS. Negative and positive values denote SNPs 5′ and 3′ of TSS, respectively. The unit interval was set as 10 KB. eQTL, expression quantitative trait loci; SNP, single nucleotide polymorphism; TSS, transcription start site.

Comparison between Caucasian and Asian eQTLs

We set out to compare the eQTLs in the Han Chinese population with the previously published data in an American population.5 Given the different platforms used in the two studies, we focused our analyses only on the common SNPs (n=64 964) and gene probes (27 340) between the two datasets. We found that at the general genome-wide significance level (p<10−8), 41% (113 out of 277) of Asian SNP-gene association pairs were also significant pairs in the Caucasian population (table 1). To confirm whether there was a significant overlap in eQTLs between the two populations, we used a liberal p value cut-off of 10−4 for both populations, and found that the eQTLs in the Asian population was also significantly more likely to be an eQTL in Caucasians (p<2.2×10−16). This enrichment remained to be significant for both cis- and trans-eQTLs (p<2.8×10−5 for both) (see online supplementary table S2).

Table 1

Overlap of eQTLs (p<10−8) between Asian and Caucasian populations

To further evaluate the eQTL overlapping between the two populations, we performed a genome-wide post-hoc imputation analysis to infer genotypes for the SNPs that had not been genotyped in both sample sets. After imputation, 19 703 SNPs were found to be significantly (p<9.45×10−9) associated with gene expression in the Han Chinese dataset, which included 18 186 cis-eQTLs and 1517 trans-eQTLs. After comparison with the Caucasian dataset, we found that 11 841 (60%) SNPs were also significantly associated with expression of the same genes at 10−4 level in the Caucasian dataset, which included 11 284 (62%) cis-eQTLs and 557 (36.7%) trans-eQTLs.

Given the allele frequency difference for the SNPs between the two populations, it is possible that population divergence may have affected the identification and confirmation of eQTLs in different populations. To test this hypothesis, we calculated the FST (fixation index) value for the significant eQTLs overlapped and non-overlapped between the two populations. Interestingly, using an FST value of 0.5 as a cut-off, we found that about 4% of the overlapped eQTLs had an FST > 0.5 compared to about 7% in non-overlapped eQTLs. Despite a small number, this comparison was statistically significant (p=7.6×10−4), suggesting that the SNPs with less population divergence were more likely to be eQTLs shared by the two populations (data not shown).

Enrichment of trait-associated GWAS SNPs in Asian eQTLs

To test the hypothesis that the GWAS-identified SNPs significantly associated with human traits are also more likely to be eQTLs,21 we checked the enrichment of trait-associated GWAS SNPs in Asian eQTLs. We first focused our analysis on the genome-wide significant (p≤10−8) SNPs associated with any trait in all populations deposited in the National Human Genome Research Institute (NHGRI) database (n=6437). Among 1322 SNPs significantly (FDR<0.05) associated with gene expression in Han Chinese livers, 17 were also trait-associated SNPs (see online supplementary table S1), which was significantly enriched compared to the SNPs that were not significant eQTLs (FDR>0.05) (p<1.4×10−8) (see online supplementary table S3). We further divided the GWAS-SNPs into Asian (Chinese and Japanese population only) (n=748) and European (n=5689) populations. Out of these 17 SNPs, four SNPs were also trait-associated SNPs in the Asian population, which was significantly enriched in the Han Chinese eQTLs (p=0.0009) (see online supplementary table S3). Examples here included rs12506899 located in the α-fetoprotein gene (AFP), that was significantly associated with the cancer antigen 19-9 (CA19-9) in a GWAS conducted in a Han Chinese population.22 In our data this SNP was significantly associated with AFP gene expression (p=1.42×10−10). More interestingly, rs3077 located in the HLA-DPA1 gene—which exhibited a relatively low allele frequency (MAF=0.11) in Caucasians but was reported to be more common in Asians with an allele frequency of 0.62—was significantly associated with increased risk for hepatitis B virus (HBV) infection in an Asian population in a recent GWAS.23 Similarly, we also noticed rs9277378—a significant HLA-DPB1 eQTL which is in complete linkage disequilibrium (LD) with rs9277535 from HapMap Asian data—was significantly associated with HBV infection in Asian populations in two previously published GWAS.24 ,25 On the other hand, 15 out of the aforementioned 19 SNPs were GWAS SNPs in European populations, and were also significantly enriched in the Han Chinese eQTLs (p<0.0001) (see online supplementary table S1).

Association between Asian eQTLs and expression variability of pharmacogenes

The liver is the most important organ for drug metabolism. We hypothesised that genetic polymorphisms can explain inter-individual variability in pharmacogene expression, which would be of importance to pharmacogenetics. To confirm this, we tested the overlap between Asian eQTLs and the VIP genes (n=49), a list of genes drawing most attention in the pharmacogenetic research area and recently identified by the Pharmacogenomics Knowledgebase (http://www.PharmGKB.org). As a result, 44 genes were profiled in our platform. Four genes, BRCA1, CYP2D6, CYP3A5, and GSTT1, were significantly (FDR<0.05) associated with at least one eQTLs in our dataset (see online supplementary table S1). Compared to the number of genes that were not significant eQTLs, the expression of VIP genes were significantly more likely to be controlled by eQTLs (p=0.002) (see online supplementary table S4). To expand this analysis to other important pharmacogenes, we tested the association between eQTLs and 409 genes encoding major phase I/phase II drug metabolism enzymes, transporters, as well as nuclear factors regulating pharmacogene expression.17 Among these 409 genes, 341 were profiled in our study, and 17 genes (see online supplementary table S1) were found to be significantly associated with at least one eQTL. Similar to the VIP genes, this was also a significant enrichment (p=2.8×10−8) (see online supplementary table S4). Notably, a number of glutathione-s-transferase genes (GST) including GSTA4, GSTM1, GSTM2, GSTM2P1, GSTM4, GSTM5, GSTT1, and GSTT2 were found to be significantly associated with at least one eQTLs (see online supplementary table S1).

Experimental validation of the gene expression and SNP gene expression associations

In order to confirm the findings using independent techniques, we quantified mRNA levels of seven randomly selected genes that were significantly associated eQTLs, using qPCR. Among these seven genes, five (DDT, ERAP2, MRPL43, FADS1, and BRCA1) were significantly associated with cis-SNPs, and two (CCND2 and PTPRE) were associated with trans-SNPs. The qPCR measurements were then correlated with gene expression profiles in the microarray. We found that qPCR measurements of all seven genes were significantly correlated with their microarray profiles (p<5×10−4 for all). To confirm the SNP genotypes determined by the DNA chip, we also performed Sanger sequencing for the seven SNPs significantly associated with these genes’ expression. We found that the genotype of one SNP (rs1006771) had 100% concordance with the DNA chip data, while the remaining six SNPs all had 98% concordance, with discrepancy in the genotype for one sample for each SNP between sequencing and DNA chip results. Note that this discrepancy randomly occurred in different samples. Meanwhile, the qPCR measurements of six genes (DDT, ERAP2, MRPL43, FADS1, BRCA1, and PTPRE) were also significantly associated with the originally identified SNPs (p<0.03 for all), while no association between the qPCR level of CCND2 and the originally identified trans-SNPs was observed (p>0.18 for all) (table 2).

Table 2

qPCR validation of gene expression and SNP-gene correlation

To further validate the reliability of the eQTLs mapping in our study, we also experimentally validated the aforementioned seven SNP-gene pairs in an independent liver tissue set (n=54). Again, gene expression and SNP genotype were determined using qPCR and sequencing. We found that all five cis-eQTLs identified in the original sample set were also significantly associated with gene expression profiles in the new sample set (p<0.008 for all), while the two trans-eQTLs were not (p>0.4 for both).

Discussion

Previous studies on eQTLs mapping in human liver have demonstrated its power for understanding inter-individual variability in disease aetiology and response to therapeutic treatments.5 ,21 We have performed, for the first time, an eQTLs study in human livers of a Han Chinese population. This unique dataset may extend our understanding of the genetic basis underlying the variability in gene expression, aetiology of human diseases, and various traits in pharmacotherapy.

Although we have a relatively small sample size (n=64), our study was able to replicate a large proportion of hepatic eQTLs identified in previous studies, suggesting the high quality of our dataset. This is also confirmed by our experimental validation. Without considering the SNPs in linkage disequilibrium, 41% of significant eQTLs in Asians were consistent with those found in Caucasians. However, by including imputed genotypic data, this number increased to 62%, further indicating that the regulatory mechanism underlying many SNP-gene expression correlations is actually common across the entire human population. By dividing the eQTLs into cis (±2 Mb) and trans, at 10−4 level, about 30% of Asian cis-eQTLs were also Caucasian cis-eQTLs, while only 0.023% of Asian trans-eQTLs were consistent with Caucasian findings. This further highlighted the reliability of cis-eQTLs mapping as indicated in previous studies.5 This inference was also confirmed by our experimental validation studies conducted in an independent sample cohort. However, in spite of the small proportion of overlapping trans-eQTLs between the two populations, this overlap still represented a statistically significant enrichment, suggesting that these trans-eQTLs are likely to be true signals.

eQTLs information was deemed to be important to establish the causal role of genetic variants and genes involved in human disease.4 ,5 ,26–28 Our data further confirmed the previous findings that trait-associated GWAS-SNPs are more likely eQTLs.21 More importantly, we hypothesised that the Asian eQTLs may be particularly useful for understanding the mechanism underlying genotype–phenotype correlations in Asian populations. Indeed, we found that GWAS-SNPs identified in Asian populations were significantly enriched in our eQTLs. One interesting example is the SNPs associated with HBV infection. HBV is endemic in East Asian population, and over 75% of the world's estimated 350 million carriers are located in Western Pacific and South East Asian countries.29 The prevalence rate of HBV infection in Han Chinese is extremely high (up to 12%).29 Previous GWAS identified two HLA-DP loci, rs3077 in HLA-DPA1 and rs9277378 in HLA-DPB1, that were significantly associated with increased risk for HBV infection in Asian populations.23–25 Our data revealed that these two SNPs were actually significant eQTLs for HLA-DPA1 and HLA-DPB1 gene expression. More interestingly, the allele frequencies of rs3077 and rs9277378 are significantly different between Caucasian, Asian, and African populations, with the rare alleles among Caucasians being common alleles in Asians and Africans according to the HapMap data (MAF for rs3077 is 0.11, 0.61, and 0.76, and for rs9277378 is 0.29, 0.54, 0.82 among the three populations, respectively). This may indicate that altered hepatic gene expression of these two genes among these populations confers differential risks for HBV infection. This is further supported by the epidemiological observation that both Asian and African populations have much higher HBV infection rates than Caucasians.29

We also observed a few SNPs (rs174547, rs174548, and rs174549) consistently identified from GWAS in both European and Asian populations were associated with multiple metabolic perturbations and lipid metabolism traits.30–32 We found that these SNPs were significantly associated with gene expression of FADS1, one of the fatty acid desaturase genes. Our finding is further supported by studies in Caucasian populations.26 Both our original and validation studies in additional samples confirmed the significant association between rs174547 and FADS1 gene expression. This further indicates that mechanisms underlying expression regulation of many genes are actually shared by different populations, and consequently the associated diseases and traits may have a common natural history.

Besides diseases and trait-associated SNPs and genes, we also tested the association between Asian eQTLs and important pharmacogenes. As expected, four VIP genes and 17 major genes involved in pharmacokinetics were significantly associated with at least one eQTLs. This underscored the importance of our dataset in understanding the inter-individual difference in drug response and toxicities. We observed that hepatic expression levels of a group of GST genes were significantly affected by eQTLs. Expression of several important P450 genes including CYP2D6, CYP3A5, CYP3A7, and CYP4V2 were also found to be controlled by eQTLs. Although high impact polymorphisms from these genes have been identified in previous studies,33 our study provided new candidate polymorphisms that may be important to pharmacogenetics in Asian populations. CYP3A7, the most important P450 gene in fetal liver,34 was significantly associated with multiple SNPs; further investigations are therefore warranted to address the question of whether these polymorphisms confer susceptibility to the inter-patient differences in drug efficacy or toxicity in paediatric populations.

Replication of eQTLs results between populations were often observed to be highly variable in different studies, which could be attributed to many reasons.4 We found in our study that differences in allele frequency can significantly affect eQTL replication in different populations. An SNP with higher allele frequency in one population may also have greater power in association with gene expression compared to a relatively lower allele frequency in another population. In addition, as we calculated in the power analysis, the small sample size in our study might also limit the power for detecting eQTLs with moderate effect, which further led to non-replicable eQTLs. Our sample set was also limited by the incomplete covariate information (demographic, clinical, etc) collected during the sample procurement process. Nevertheless, our future goal is to collect a larger sample set, as well as to perform multi-population meta-analyses to address these questions.

In conclusion, our first eQTLs analysis in the East Asian Han Chinese population revealed both homogeneity and heterogeneity in genetic variations in gene expression among different human populations. Many of our findings provided further supportive evidence for recent genomic discoveries in human diseases and pharmacogenetic traits, and more importantly fostered new rationales for continued investigation. Our data thus provide an additional valuable resource to the existing data in other populations.

Acknowledgments

We are deeply grateful to all the participants as well as to the doctors working on this project. This work was supported by the National Natural Science Foundation of China (No. 81000188, 81270557).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • XW, HT, and MT contributed equally to the work.

  • Contributors ZHP supervised sample recruitment. XLW, HMT, MJT and ZQL conducted data analyses and drafted the manuscript. JWF, LZ, XS, JMX, GQC, DWC and ZWW recruited samples. THX, JYZ, LH, SYW, XP and SYQ performed or contributed to the main experiments. All authors critically reviewed the manuscript and approved the final version.

  • Funding The Natural Science Foundation of China.

  • Competing interests None.

  • Patient consent Obtained.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Additional data file The full eQTLs dataset is accessible in an online data resource (http://analysis2.bio-x.cn/SHEsisMain.htm). The data were also deposited in the NCBI GEO database (accession numbers: GSE53792). The following additional data are available. Supplemental file 1 contains table S1, which provides a full list for association results at a study-wise significance level (FDR <0.05) for gene expression and genotyping data. Supplemental tables S2–S5 include enrichment analyses and primer sequences.