In African-Americans (AAs), the SNP8 G/G genotype was never observed in any of the control subjects in Sample 1 (n = 48), the newly-recruited unrelated controls in Sample 2 (n = 125), or the additional related unaffected pedigree subjects (n = 128; see Table 3) (i.e., in total there were 0/301 observations); this genotype was, however, observed in unrelated cases (e.g. 4/370, for AD). We cannot completely exclude the possibility that the difference between cases and controls is attributable to sampling bias and the findings are false positive, but we conjecture that it is more likely that this genotype is related to phenotype in AAs, although there is not enough information in this observation alone to support such a conclusion. Because this genotype is so rare (1%), some association methods with insufficient power might not detect a true genotype-phenotype association (in fact, evaluated by exact test as a case-control genotype frequency distribution comparison, this comparison shows p = 0.391 and 0.225 for AD vs. controls and DD vs. controls, respectively). Thus, in the present study, several association methods with differing power and that take different views of the data were applied and compared. The most powerful HWD test suggests that the ADH4 variant (rs2226896) might play an important role in risk for substance dependence (including AD and DD) in AAs, probably via a recessive genetic mechanism. The association between this variant and phenotype is population-specific, that is, it appears in AAs, but not in EAs. This association herein first discovered in AAs is a complementary finding to a previous set of genotype-phenotype relations we described for other markers at this locus in EAs [6, 7]; based on this result, we can now provisionally conclude that ADH4 affects SD risk in both EAs and AAs, but different variants are important in the different populations. It would be of great interest to study this variant in other populations, e.g., Asians, to further characterize the population specificity we report here. This variant is independent of the other seven polymorphisms that were reported previously to have no association with substance dependence in AAs [6, 7].
ADH4 gene variation is thought to influence the risk for AD by modulating ethanol metabolism. However, we find that it is associated with DD too. This is reasonable because DD has many features in common with AD which are reviewed above and because the development of AD and DD might have some similar pathophysiological mechanisms. ADH4 enzyme (π ADH) catalyzes synthesis from substrates (which include, e.g., norepinephrine aldehydes, including 3,4-dihydroxymandelaldehyde (DHMAL) and 4-hydroxy-3-methoxymandelaldehyde (HMMAL)) to create the intermediary glycols of norepinephrine metabolism, including 3,4-dihydroxyphenylglycol (DHPG) and 4-hydroxy-3-methoxyphenylglycol (HMPG), respectively. This catalysis is considerably more efficient via this isozyme than for any of the class I isozymes (α, β and γ ADHs); and class III ADH (χ ADH) does not have any detectable catalytic activity towards these substrates at all [37]. Increased π ADH activity – e.g., through genetic variation, such as, potentially, the SNP8 G/G genotype – could lead to increasing levels of DHPG and HMPG, and a very high turnover of norepinephrine aldehydes. To block the turnover of norepinephrine aldehydes, perhaps one might self-administer ethanol to compete DHPG and HMPG, because ethanol is an external competitor for internal DHPG and HMPG on π ADH [37]. This mechanism could lead to AD. Cocaine, which partially functions as a norepinephrine re-uptake inhibitor, can activate the noradrenergic system [38]. Plasma epinephrine and norepinephrine concentrations were significantly increased in response to cocaine injection [38]. Intravenous opioids stimulate norepinephrine and acetylcholine release in cerebrospinal fluid [39]. Therefore, self-administration of drugs (cocaine and opiates) could elevate norepinephrine aldehydes too, which may lead to DD.
A family-based association study is immune from population stratification effects. Thus, in the present study, subjects with different ethnicities, including EA, AA, Hispanic, and others, and the affected and unaffected parents, were combined in the analysis, to increase the statistical power. Allowing for the possible population-specificity of association, we also performed this analysis separately within EAs and AAs. However, the family-based association studies revealed no significant association between this ADH4 variation and substance dependence (both in Samples 3 and 4). This is likely due to the limited statistical power, given the small sample size (Sample 3: 115 affected offspring and 221 parents), in the context of the fact that the SNP8 variant has a rare minor allele (Sample 3: frequency ≤ 0.049 in AAs and ≤ 0.077 in EAs; Sample 4: frequency ≤ 0.020 in AAs and ≤ 0.053 in EAs;). Additionally, only heterozygous parents yield TDT information, which further limits the power for the family-based association studies.
In the present study, our case-control sample (820 cases; 483 controls) has approximately five times the power of the family Sample 3 [10]. However, neither allelewise nor genotypewise case-control comparisons showed any significant association between ADH4 variation and substance dependence. The case-control design is theoretically vulnerable to population stratification that could result in false negative findings. We therefore used the structured association (SA) method [15] to exclude population stratification and admixture effects on associations. The results did not change substantively after controlling for population stratification and admixture effects, i.e., both were negative. Similarly, despite taking into account the potential confounding effects of age and sex via regression analysis, no association between this variation and phenotypes was detected. Additionally, the low detected admixture degrees in EAs (1.6%) and AAs (3.9%) (which may have appeared especially low, particularly in the AAs, because of lack of inclusion of an ancestral African population) suggest that admixture effects should not have substantially affected the analysis in this study. It is possible that the negative findings from the case-control association might, like the TDT analysis and the FBAT analysis, result from insufficient power.
For this particular marker, the allele frequency of the rare allele is higher for EAs than for AAs (0.077 vs. 0.046, for the control subjects). EAs are therefore expected to have a substantially higher frequency of rare homozygotes than AAs – 0.006 vs. 0.002, i.e. about three times as many. Therefore, we specifically considered possible European admixture in the four homozygous AA patients. We found that the European ancestry proportions in these four AA subjects were less than 0.72%, indicating these observations of the rare homozygote are unlikely to be related to the genomewide European admixture in these AA subjects.
HWD at SNP loci in the case sample could be an indicator of gene-phenotype association [7, 9, 35–42]. Cases are ascertained due to their "affected" status, so disease susceptibility genotypes or alleles should be present at high rates in the case sample, which might violate HWE. Further, because cases are not randomly sampled from the general population where there is random mating and 2N alleles among N subjects are independent, HWE of disease-related marker loci in cases could be violated, and 2N alleles could become dependent. Only when the marker has no LD with the disease locus, i.e., the marker genotype frequency distributions are independent of the diagnosis, can the case group and the control group have the same genotype frequency distributions, with both in HWE. Therefore, the HWD of SNP8 in AD and DD among AAs in the present study suggests an association between SNP8 and both AD and DD. Usually, susceptibility loci are in HWD in cases, but in HWE in controls [7], as observed for SNP8. This is because a much greater sample size is needed to detect HWD in controls than in cases [9]. If the predisposing effect of the disease susceptibility allele is strong enough and the sample size for controls is large enough, this locus could also be in HWD in controls, but with an excess of the protective genotype, the opposite of the situation for cases [9]. SNP8 does not, apparently, have a strong enough effect on risk to distort HWE in controls; alternatively, the size of the control sample is not large enough to detect HWD in that sample.
Additionally, substance dependence significantly increases mortality [43–45], leading to age cohort-related dropout of the disease-associated genotypes or alleles from the population (i.e., natural selection). Selection by mortality may violate an assumption for HWE and cause altered distribution of genotype frequencies (i.e., HWD) [46]. This dropout makes the risk genotype or allele rarer, but the risk genotype or allele is still more common in cases than in controls, consistent with what we observed for SNP8, and providing additional evidence that SNP8 might be a disease-associated locus.
The magnitude of the HWD test statistic varies with the distance between the marker loci and the disease locus; that is, deviations from HWE are greatest at trait susceptibility loci, and can also be detected for benign polymorphisms that are in LD with the susceptibility locus [35, 40]. In the present study, SNP8 is in HWD in cases, which suggests that the risk locus for substance dependence might be located in the SNP8-containing haplotype block or be SNP8 itself.
The association evident from the HWD test was not detected by case-control frequency comparison. This is because the HWD test as an association method is, sometimes, much more powerful than case-control comparison [7], for which, the present study provides an extreme example. One reason is that, from a statistical perspective, the HWD test in cases has one degree of freedom (df = 1), rather than df = 2 for case-control genotype frequency comparisons. Another reason for greater power of the HWD test in the present study is the age difference between cases and controls, from an epidemiological viewpoint, as discussed in our previous study [7]. In this study, the average age for controls was 29.7 years, about 10 years younger than that for cases, 39.6 years. Many healthy controls, although presently unaffected, have not completely passed through the age of risk to manifest AD or DD. The healthy controls have a probability (≈ lifetime prevalence of disease; less than the cumulative prevalence by the subject's age) to develop disease at some point in the future, and this probability increases with the residual prevalence, so that the case-control association design may be less powerful than a case-only study using the HWD test. That is, some associations that can be discovered by the use of a case-only study might not be detected using a case-control design. Meanwhile, because the dropout of disease-related genotypes or alleles increases with the age of cases (due to increasing mortality), an HWD test that reflects the dropout could be more sensitive to detect this disease-related locus, especially when cases are much older than controls.
The HWD signal of a marker locus decays more rapidly with distance from a causative locus than the LD signal [35]. The closer a marker is to the causative locus, the greater the excess of power for the HWD test over the LD test. Nielsen et al. [40] demonstrated that the HWD method was more powerful than the LD method under certain conditions (recessive and additive models), which was also supported by many other studies [7, 9, 35, 47–49]. Kocsis et al. [50] demonstrated that even in the absence of significant differences in genotype frequency distribution between cases and controls, associations can be detected by HWD, as observed for SNP8 in present study. This is particularly true for a trait-associated marker that acts via a recessive mode of inheritance, because the effect of the recessive allele (i.e., the disease-risk allele) can be "masked" by the dominant allele (i.e., the non-disease-risk allele), which yields negative results in case-control frequency comparisons [under HWD, the two alleles are dependent and affect each other]. From the formula of goodness-of-fit χ2 test for HWD, the χ2 value (HWD statistic) is proportional to the sample size (N) and the squared difference between observed genotype frequency and the expected frequency (Δ2), and inversely proportional to the expected genotype frequencies. Thus, even when the Δ is small, one rare genotype frequency could generate a high HWD statistic. If the expected count of this rare genotype is less than 5, we use an exact test; the exact p value is usually consistent with that from the goodness-of-fit χ2 test, as seen in additional file 1. This is why HWD test is especially sensitive to a marker with one rare genotype.
HWD might not be more powerful than LD method in detecting gene-disease association when a trait-associated marker acts via a multiplicative mode of inheritance, because HWD test would have very little power under this disease model [35, 40, 41, 47–49, 51]. However, SNP8 unlikely acts via a multiplicative model in the present study (p ≤ 0.007, see additional file 1). A new method, the weighted average (WA) statistic test, has been reported to be even more powerful than the HWD test to detect association between disease susceptibility and marker loci under many genetic inheritance models, including the recessive, additive and multiplicative models [49]. However, application of this method is beyond the scope of the present study.
The HWD test can not only detect gene-phenotype association, but can also reflect a genetic disease model, because the direction of HWD statistics (Δ, F and J) varies with the genetic model [9, 35, 36]. In the present study, the Δ, F and J for SNP8 are positive, suggesting that SNP8 appears to follow a recessive genetic disease model.
We also identified the genetic disease model for SNP8 with the best fit to the genotypic proportions observed in patients and controls using the Mathematica Notebooks written by Wittke-Thompson et al. [9]. Consistent with the above inference, the "best-fit" model for SNP8 is a recessive model. This model-fitting method can not only identify the genetic model, but can also tell us that other explanations for the observed HWD, including chance, genotyping error, and/or violations of the requisite assumptions of HWE, are less likely, if one "best-fit" model can be identified [9]. However, it should also be noted that just because an observed HWD is consistent with a "best-fit" genetic model does not completely guarantee that errors, missing data patterns, or violations of HWE assumptions do not generate or contribute to the observed HWD. Actually, HWD can be attributable to a combination of factors [9]. But the following analyses further support the interpretation that non-disease factors underlying HWD are less likely to be important in explaining our data. We note, though, that it might take a very large sample to fully support this conclusion.
Signal intensity, background noise, and clustering properties all play a role in the ability to assign genotypes correctly, and in determining the types of errors that occur [52]. Genotyping error is one of the greatest concerns for causing spurious HWD observation. First, DNA contamination can result in the lack of one homozygote in the PCR product, which leads to a deficiency of observed homozygotes [52]. This runs counter to our data and thus is not a possible explanation for the present study. Second, incomplete digestion of PCR product (relevant only when using the RFLP technique, not the TaqMan technique) or poor amplification of one of the alleles will lead to heterozygous genotypes being read as homozygous genotypes [53]. This kind of allele dropout can lead to an excess of apparent homozygous genotype observations, which does fit our data and thus needs to be considered. Also, when genotypes are read, heterozygote genotypes could, theoretically, be more ambiguous, and therefore more likely to be scored as "missing," than homozygote genotypes. To detect possible genotyping error, for family data, we assessed the data for Mendelian consistencies by the program PEDCHECK [54], with no non-Mendelian inconsistencies detected. For all subjects, including family and case-control subjects, we also replicated the genotypes (the most accurate way to estimate genotyping error rates), so that all genotypes were matched. Missing genotype data rate was not significantly different between cases and controls. Additionally, controls were tested for HWE and did not show the same direction of HWD statistics as cases. Together, the evidence suggests that genotyping error as an explanation for the observed HWD is improbable.
Violation of one of the other HWE assumptions (besides selection of alleles by disease) can also cause HWD. First, genetic drift can cause HWD. Genetic drift is the effect of finite population size [55], such that the smaller the population, the more noticeable the effects of drift. All populations are finite and all genetic variation is subject to genetic drift. In a finite population, allele frequencies fluctuate by chance randomly and the fluctuation leads to deviations from HWE (in this context, this is "sampling error") [56]. If a population is small enough, the effects of drift may overwhelm the other forces described below, even selection. Our AA case sample is large enough such that genetic drift at the disease susceptibility locus and the marker locus can reasonably be ignored. Additionally, in our AA control sample, which is smaller than that of cases, HWE was not violated. Further, our AA samples are representative of the general population [57], which supports the interpretation that HWD in AA cases is probably not due to a sample size issue; however, considering the small number of homozygote observations that are critical in driving the finding, we cannot exclude this possibility. Second, inbreeding can cause HWD. Inbreeding is a type of positive assortative mating which is non-random. Most populations are geographically divided, and mating is local, so inbreeding could be common, but to varying extents. During inbreeding, individuals are more likely to mate with relatives than with non-related individuals. One common consequence of inbreeding is that the number of heterozygotes decreases and the number of homozygotes increases [58], which leads to HWD. Another common consequence of inbreeding is that the expression of deleterious recessive alleles in the population increases, which reduces average fitness and increases mortality ("inbreeding depression") [59], which, as described above, can also contribute to HWD. However, our case and control samples are ascertained as unrelated, and we have no evidence for the existence of overlapping generations, making inbreeding unlikely in the present study. Third, gene flow may result in HWD. Gene flow is the result of migration. Immigrants carrying new alleles into the population may change the genotype frequency distribution of that population with resulting HWD in that generation. Contrary to selection and genetic drift, gene flow eventually homogenizes allele frequencies among populations. Although gene flow occurs in most populations, its contribution to major shifts in allele frequencies is usually negligible. The AA population has been in the US for an average of about five generations and we do not have evidence of major immigration for the current AA generation, so gene flow resulting in HWD in our AA sample is unlikely. Fourth, mutation may increase the genetic variability due to genetic drift and might cause HWD. But because change in allele frequencies induced by mutation is so small from one generation to the next, we can safely ignore mutation as a factor in HWD. Unless mutation rates are abnormally high, for which we have no evidence in the present data, the change in allele frequencies is believed to be virtually nil. In conclusion, violation of one of the above HWE assumptions causing HWD is unlikely. However, the caution that these results are driven primarily by a small group of subjects, and that our conclusions would be different if just a few of them were omitted or somehow changed diagnosis, bears repeating; this reliance on a small number of subjects requires us to be very tentative in our conclusions.
In addition to the factors that have been discussed, other factors can also cause HWD. For example, population stratification and admixture can cause HWD, as demonstrated by Luo et al. [6]. However, we have demonstrated that the admixture degrees in our EA and AA samples are relatively low, suggesting that this factor can be ignored as the sources of HWD in our case samples. In addition, the cases and the controls are drawn from the same populations, but the controls are in HWE for SNP8, reducing the possibility that HWD in cases results from an effect of admixture. Nonrandom patterns of missing data may also generate a relatively consistent pattern of HWD (e.g., disproportionate missing data in heterozygotes may lead to a consistent pattern of HWD, with an excess of homozygotes, as observed for SNP8). This possibility is common and inevitable and thus should not be ignored. However, our cases and controls were genotyped using same genotyping systems, reducing the possibility that non-random patterns of missing data cause HWD only in cases but not in controls. Further, if the cases or controls are old enough, the marker can be in HWD because a specific mortality-related allele drops out due to death associated with advancing age [51]. However, the ages of our subjects range from 17 to 78 years, making this explanation unlikely. An unrecognized polymorphism in primer sequences used in PCR may also lead to HWD, with an excess of homozygotes, as observed for SNP8, particularly when the primer polymorphism is in LD with the tested marker [9]. Finally, genomic duplications or deletions (a copy deletion could lead to hemizygosity) can also lead to HWD [9]. We believe that these explanations are not appropriate to explain our data in the present study, but these factors can be excluded only through extensive sequence analysis.
In conclusion, the presence of HWD for SNP8 suggests that this polymorphism might be a risk locus for substance dependence in AAs, although our direct evidence for this conclusion is weak and the false positivity cannot completely excluded. SNP8 is located at the putative 5' regulatory region of ADH4. It might indirectly modulate risk for disease via LD with an unknown nearby functional variant, e.g., in ABI and HapMap database, it is 2.5 kb far from and in LD with rs7434491 which could significantly alter the secondary structure of ADH4 mRNA (IDT SciTools: http://www.idtdna.com/SciTools/SciTools.aspx); it might also alter the transcription initiation site or the capacity of transcription factors to bind to the DNA sequence, and consequently, directly affect transcription levels; it might result in mRNA instability, altered translational efficiency, or even different protein expression levels in different tissues. Considering our sample size limitations, we believe that replication of these results is critical. Nevertheless, given our findings, we believe that it would be productive to study the effect of this variation directly on protein expression, in order to provide convergent validation of the findings reported here and to elucidate the specific mechanism underlying the association of SNP8 at ADH4 to both AD and DD.