Evidence of inflammatory immune signaling in chronic fatigue syndrome: A pilot study of gene expression in peripheral blood

Background Genomic profiling of peripheral blood reveals altered immunity in chronic fatigue syndrome (CFS) however interpretation remains challenging without immune demographic context. The object of this work is to identify modulation of specific immune functional components and restructuring of co-expression networks characteristic of CFS using the quantitative genomics of peripheral blood. Methods Gene sets were constructed a priori for CD4+ T cells, CD8+ T cells, CD19+ B cells, CD14+ monocytes and CD16+ neutrophils from published data. A group of 111 women were classified using empiric case definition (U.S. Centers for Disease Control and Prevention) and unsupervised latent cluster analysis (LCA). Microarray profiles of peripheral blood were analyzed for expression of leukocyte-specific gene sets and characteristic changes in co-expression identified from topological evaluation of linear correlation networks. Results Median expression for a set of 6 genes preferentially up-regulated in CD19+ B cells was significantly lower in CFS (p = 0.01) due mainly to PTPRK and TSPAN3 expression. Although no other gene set was differentially expressed at p < 0.05, patterns of co-expression in each group differed markedly. Significant co-expression of CD14+ monocyte with CD16+ neutrophil (p = 0.01) and CD19+ B cell sets (p = 0.00) characterized CFS and fatigue phenotype groups. Also in CFS was a significant negative correlation between CD8+ and both CD19+ up-regulated (p = 0.02) and NK gene sets (p = 0.08). These patterns were absent in controls. Conclusion Dissection of blood microarray profiles points to B cell dysfunction with coordinated immune activation supporting persistent inflammation and antibody-mediated NK cell modulation of T cell activity. This has clinical implications as the CD19+ genes identified could provide robust and biologically meaningful basis for the early detection and unambiguous phenotyping of CFS.

subsets are examined. For example Klimas et al. [2] report a significant expansion CD26+ (DPP-IV) activated T cells in CFS subjects. This multifunctional molecule plays a major role in the regulation, development, maturation and migration of T helper (Th) and natural killer (NK) cells as well as in B cell immunoglobulin switching [5]. Moreover abnormal expression of CD26+ is found in autoimmune diseases [6]. More recently CFS patients were also reported to have significantly fewer CD3+/CD25-T cells and significantly more CD20+/CD5+ B cells [7], a subset associated with auto-antibodies. Significantly fewer CD56+ NK cells were also observed in recent work by Racciatti et al. [8]. Though important, flow cytometry results such as these leave many questions regarding cellular state unanswered. Microarray profiling of gene expression on the other hand offers a glimpse of pathway activation in disease pathogenesis at molecular resolution. Microarray analysis of cDNA profiles in peripheral blood mononuclear cells (PBMC) have revealed altered expression in CFS of several immune genes [9,10] involved in response to oxidative stress, NK cell activity and elements of antigen processing. Instability in immune response and restructuring of immune cell signaling under exercise challenge has also been observed [11]. Unfortunately microarray profiling is commonly performed on mixed cell populations producing an average profile from which it is very difficult to dissect the contributions of relative cell abundance, cell activation state and cell-cell signaling. More importantly, this averaging can obscure significant changes in the state of minority cell subpopulations.
These challenges notwithstanding, a review of this evidence strongly suggests that CFS pathogenesis is likely to include a characteristic immunologic component in at least one subset of the patient population [12]. However the exact nature of this immunologic component remains the object of considerable debate at least in part because of an inability to cast gene expression profiles in the useful context of immune cell demographics. In an attempt to address this issue methods have been proposed to dissect global gene expression profiles into discrete elements assignable to biologic processes [13][14][15]. The assignment of genes to discrete modules or sets has been successful in several respects. A first contribution involves simply reducing the dimensionality of >55,000 gene expression measures to that of say 10 or so gene sets. The interpretability of results is further enhanced by associating sets with basic cellular functions. Finally the numerical robustness is greatly improved through the averaging of changes in expression over many genes. In addition gene sets are transportable across microarray platforms making it possible to compare studies based on different technologies.
In this work we explore the use of discrete gene sets in extracting useful information regarding immune dysfunc-tion in CFS from gene expression profiles of mixed lymphocyte populations. In particular we construct gene sets that capture elements of abundance and activity assignable to specific immune cell subsets thereby facilitating direct integration with flow cytometry results. Data from a large population-based study of CFS [16] is then examined for changes of immune set expression across two separate CFS classification approaches. In addition, patterns of coordinated expression linking these immune sets were investigated using simple correlation networks. These networks were examined for shifts in topology and point to patterns of immune signaling in CFS that are consistent with chronic inflammation. These observations could constitute a signature of CFS or a component thereof.

Subjects and diagnostic classes
Recently a dataset for a 2-day in-hospital study of CFS in the general population of Wichita Kansas was made available [16]. Referred to as the Wichita Clinical study, this investigation included a highly comprehensive spectrum of detailed clinical and laboratory measures and PBMC expression profiles for 20,000 genes. From this dataset a final analysis group of 111 female subjects was obtained by excluding the few male subjects and subjects with confounding medical or psychiatric conditions. Subjects in this dataset were classified as CFS using the CDC Symptom Inventory, Multidimensional Fatigue Inventory (MFI) and Short Form 36 (SF-36) instruments [17,18]. This classification will be referred to as "empiric" and resulted in 39 CFS, 37 non-fatigued (NF), and 35 subjects with insufficient symptoms or fatigue (ISF). A second classification proposed by Vollmer-Conna and colleagues [19] used latent class analysis (LCA) of 440 clinical and biological measurements to delimit 5 fatigue classes, a non-fatigued class and 2 unassigned individuals. Obese subjects with prominent post-exercise fatigue, hypnoea and disturbed sleep formed Class 1. Reasonably healthy subjects with few symptoms, low depression scores and good sleep composed Class 2. Subjects in Class 3 resembled those in class 1 but also displayed low heart rate variability during sleep and low 24-hour cortisol levels. Class 4 was populated with healthier, less depressed individuals having restful sleep but suffering muscle pain. Finally Classes 5 and 6 both captured less obese but highly symptomatic and depressed individuals with prominent postexercise fatigue. Individuals in Class 6 also displayed disturbed sleep with low heart rate variability and low cortisol. The patient demographics for each of these classification systems are summarized in Table 1 and the alignment between these systems is described in Table 2. The collection and processing of PBMCs including hybridization to MWG microarrays (MWG Biotech, Ebersberg, Germany) are described in Vernon and Reeves [16]. Details of the microarray data preprocessing including normalization, outlier detection and false discovery correction are available in Broderick et al. [9].

Gene set development
Extracting elements that represent the abundance and activity of a specific leukocyte subset was approached by identifying discrete sets of genes that are uniquely or predominantly expressed in a given cell type [20][21][22]. Currently discrete gene sets offer the simplest and most immediately accessible method for analysis across microarray technological platforms. We constructed a number of gene sets a priori for CD4+ T cells, CD8+ T cells, CD19+ B cells, CD14+ monocytes and CD16+ neutrophils using data collected on Affymetrix microarrays (Affymetrix, Santa Clara, CA, USA) by Lyons et al. [23]. Of the 12,022 genes surveyed, 2,641 were differentially expressed between individual lymphocyte subsets. Of these original 2,641 distinguishing genes, 268 were present on the MWG microarrays used in the Wichita Clinical study. We further dissected these subset-specific profiles into discrete non-overlapping sets composed of genes at least 2fold up-regulated or 2-fold down-regulated preferentially in each cell lineage. An additional gene set was defined for NK cell activity and regulatory T cell activity was estimated from the expression of the FoxP3 gene (AF277993). Individual MWG gene probes belonging to each immune gene set as well as NCBI gene annotation and PANTHER functional annotation [24,25] are listed in the supplementary data file [Additional file 1].

Statistical analysis
The aggregate expression G a of each gene set a was computed as the average of the Ln-transformed expression Ln(g i, a ) of each gene i across the k member genes in the set (Equation 1). In a first level of analysis a classical Wilcoxon non-parametric test was used to evaluate the differ-     ential expression of immune gene sets for both classification systems. As suggested by Efron and Tibshirani [15] the performance of these gene sets was also compared to that obtained with randomly populated sets of the same size. A null distribution was computed from the analysis of 1000 instances of random gene selections and 1000 random permutations of the diagnostic labels.
To examine the patterns of association linking immune gene sets simple linear association networks were constructed using the Pearson correlation coefficient r a, b as the metric describing similarity in the expression of gene set a with that of gene set b. Statistical significance of correlation was assessed using the t a, b statistic in Equation (2). This statistic has a Student's t-distribution with degrees of freedom n-2 under the null hypothesis of no correlation [26], where n is the number of microarray measurements. C a, b is the covariance in the expression of gene set a with gene set b and E() is the expected value operator or the mean.

Where,
A cutoff for the resulting probability p a, b (t > t a, b ), above which we accept the null hypothesis, can be established in a variety of ways [27]. It should be noted however that these require specific assumptions regarding network topology such as network edge sparseness or the appearance of highly cliquish disconnected sub-networks. As such they are generally more relevant to the study of large networks. Instead we examined the dependency of the network size S, or the sum of the edge weights w a, b on the choice of threshold p-value (Equation 3). We compared curves obtained for NF and CFS networks, identifying threshold p-values where networks differed primarily in structure from those where they differed in both structure and size.

Alignment of empiric and LCA classifications
A cross tabulation of the empiric classification and LCA classification is presented in Table 2. There was good alignment of non-fatigued subjects with 90% of empiric NF controls residing in LCA classes 0 (Well) and 2 (Unassigned). Together LCA classes 1 (40%) and 5 (26%) contained two thirds of the subjects assigned to the empiric CFS class. However ISF subjects were distributed almost equally across LCA classes 1 (23%), 3 (31%), and 4 (26%). Conversely most LCA class 3 and 4 subjects were identified as ISF and most subjects in LCA classes 1, 5 and 6 were assigned an empiric CFS classification.

Differential expression of a priori defined immune cell gene sets
In a first level of analysis the differential expression of immune gene sets across disease phenotypes and control groups for both classification systems was evaluated. Results in Table 3 show that the median expression of the CD19+ B cell up-regulated gene set was significantly lower in CFS (p = 0.01) and ISF (p = 0.05) subjects when compared to the NF group. Expression of this gene set was also significantly repressed in LCA class 3 (p = 0.04) and marginally so in LCA class 5 (p = 0.09) when compared to control subjects in LCA classes 0 and 2. Recall that 11 of 17 cases in LCA class 3 were also designated ISF. Similarly 10 of the 14 LCA class 5 cases were designated CFS. NK gene set expression was marginally increased in the CFS group (p = 0.07). Though not significant the null probability for NK cell expression was lowest among the LCA classes for LCA-3 (p = 0.11). Finally expression of the T regulatory set (FoxP3) was marginally repressed in LCA class 1 (p = 0.09) which contained 40% of the CFS subjects though no significant difference was found for the larger CFS group (p = 0.31).
The performance of these gene sets was compared to that obtained with randomly populated sets of the same size as well as by random classification assignment of each subject. Null distribution results indicated that both the CD19+ B cell (up-regulated) and NK cell gene sets performed significantly better than random sets of equivalent size in discriminating CFS from NF (p < 0.05) (Figure 1). Performance of the T regulatory gene set (FoxP3) was marginal at best (p~0.15) in terms of uniqueness in differential expression. In addition a detailed analysis of individual genes in the CD19+ up-regulated set indicated that no single gene was differentially expressed even though the parent set was expressed at the p = 0.01 level. This reaffirms that high levels of measurement noise can be effectively managed by aggregating genes into biologically relevant sets. Details of this analysis are listed in Table 4 and illustrated graphically in Figure 2 and Figure 3.

Emergence of characteristic patterns of association between immune gene sets in CFS
Conventional analysis of microarray data remains focused on the detection of differentially expressed genes or gene sets. However it is important to realize that genes expressed at similar levels across patient groups may still play an important role in the disease process. To examine the patterns of association linking immune gene sets simple linear correlation networks were constructed. Results in Figure 4 show network size for the empiric NF and CFS classes as a function of cutoff p-value for edge weight significance. Both NF and CFS networks were identical in overall size at a cutoff p-value of 0.05. Changes at this level of edge weight significance consisted therefore of a Cumulative probability plot of Δ differential expression of CFS versus NF for random gene sets similar in size to the CD19+ B cell up-regulated gene set and the NK cell gene set re-organization of edges only. The curves in Figure 4 diverge at p~0.10 and maintain a similar offset form one another as p-value increases. As a result comparisons of network topology conducted at the p < 0.10 level included edge re-assignment as well as the addition of new edges to the CFS network. Topologies emerging at both p < 0.05 and p < 0.10 thresholds were examined as they contain complementary information. Detailed results of pair-wise correlation between gene sets may be found in Table 5 for  Ln Transformed Gene Expression  Tables 6 and 7 for the LCA classification system.

3-Class (NF Controls
Heat maps depicting the edge weights ra, b linking gene sets are presented in Figure 5 for both empiric and LCA classes. The LCA control classes 0 and 2 exhibited a pattern of gene set co-expression at the p < 0.10 identical to that of the empiric NF group though in the latter this pattern was also retained at the p < 0.05 level. In both the NF and LCA-0/LCA-2 networks CD 19+ B cell up-regulated and down-regulated gene sets correlated tightly behaving as one set (r = 0.43, p = 0.008). T regulatory and NK cell gene sets both supported significant positive interaction with one or both of the CD19+ B cell sets. In addition CD8+ T cell activity and CD14+ monocyte activity were significantly antagonistic. In contrast the network obtained for the CFS subjects displayed a shift in interactions towards the upper left hand corner of the heat map. Indeed significant interactions appeared linking the expression of the CD14+ monocyte gene set with that of the B cell (CD19+ up-regulated) and the CD16+ neutrophil gene sets. The neutrophil set also shared significant co-expression with the CD8+ T cell gene set (p < 0.05) in CFS. Interestingly interactions with the neu-trophil gene set were completely absent in NF even at the p < 0.10 level. Also apparent in CFS was the emergence of a significant negative correlation between the expression of CD8+ and CD19+ up-regulated gene sets (p = 0.02). Moreover CD19+ B cells appeared altered with up and down-regulated sets no longer maintaining a strong direct correlation in ISF or CFS. Interaction with NK cell gene set expression was also a distinguishing feature in particular for ISF. Instead of appearing as a transitional state between NF and CFS, ISF exhibited a distinct co-expression pattern characterized by a significant interaction of NK cell and monocyte gene sets (p < 0.05). Contrary to NF, the NK and CD19+ down-regulated gene sets correlated negatively (p < 0.10) in ISF.
In much the same way as the ISF group, several of the LCA groups were characterized by a lack of coordinated activity between immune gene sets. Indeed no significant correlations existed for LCA-4 even at the p < 0.10 level. This was also true of LCA-3 and LCA-6 classes at the p < 0.05 level. Though also quite sparse, heat maps for LCA-1 and LCA-5 each recovered specific features of the CFS and ISF heat maps. LCA-1 demonstrated a significant positive correlation (p < 0.05) between CD14+ monocyte and CD16+ neutrophil sets, a CFS feature. At the p < 0.10 level the same heat map showed a negative correlation between the NK and CD19+ B cell down-regulated set, an ISF feature. Unique to LCA-1 was a positive correlation between T reg (FoxP3) and CD14+ monocyte gene set expression (p < 0. 10

Ln Transformed Gene Expression
Gene Set CD19+ Up-regulated correlation linking CD19+ B cell with CD14+ monocyte up-regulated sets (p < 0.05) along with a strong negative correlation linking the former with CD8+ gene set expression. These 2 features were not shared with the LCA-1 group. These results reaffirmed the strong links between the CFS subject group and LCA classes 1 and 5 in addition to suggesting that immune set co-expression might offer insight into the distinct nature of these apparent subclasses of CFS.

Discussion
In this work we dissected PBMC gene expression profiles into components that were preferentially expressed in several isolated lymphocyte subpopulations. We also used 2 systems to stratify subjects into illness groups. The LCA class structure was inferred directly from a comprehensive set of clinical and biological indicators. All indicators were equally weighted and contrary to common practice no subset was assigned greater relevance a priori. In contrast the empiric classification which was based on a consensus of opinions from expert clinicians. Results confirm strong links between both systems with the LCA classifica-tion providing additional insight into potential subclasses of CFS. The commonalities between these classification systems are readily observed in the patterns of gene set coexpression. Indeed the empiric CFS group seems to present an aggregation of the gene set co-expression patterns observed in LCA classes 1 and 5. However, the differential expression of gene sets only achieves statistical significance in the case of the coarser empiric classes with the larger group sizes providing better noise reduction. Specifically in the empiric CFS class we found a significant decrease in the median expression for a set of 6 genes preferentially up-regulated in isolated CD19+ B cells compared to non-fatigued controls. Expression of this CD19+ B cell up-regulated gene set also discriminated ISF from controls at 0.05 confidence level. In a recent study of CFS occurrence both in the presence and absence of viral infection Racciati et al. [8] found no significant differences in CD19+ cell abundance. Robertson et al. [7] recently reported significantly higher abundance of CD20+/CD5+ B cells, a subset associated with the production of autoantibodies, in patients with depression. These findings together with our observations of depressed CD19+ gene  correlation r a, b (null probability p a, b )   expression and altered association between up and downregulated B cell functions would suggest that the function of these cells might be compromised in CFS subjects. Cole et al. [28] reported a selective reduction of mature B lymphocyte function in subjects who experienced chronic high levels of social isolation including suppression of several transcription factors involved B cell differentiation such as Ikaros/ZNF1A1. Genes encoding for members of the zinc finger protein family were also identified in previous work by this group as prominent contributors to the CFS symptom space [9]. A closer look at the 6 genes that constitute the CD19+ up-regulated set showed that the PTPRK and TSPAN3 genes, both associated with immune cell adhesion and development, were the most suppressed. Down-regulation of PTPRK, a TGF-β target gene, is known to be down-regulated by the Epstein-Barr virus (EBV) [29], an infectious agent known to trigger CFS [30,31]. Down-regulation of TGF-β has been reported in CFS by Tomoda et al. [32].
NK cell activity is suppressed in CFS [33] and this decreased cytotoxity has been associated with reduced intracellular perforin [34]. In this work we observe an increased expression of the NK cell gene set. Of the 4 genes used to capture NK cell function the expression of NKG2A/C (NM 002260) was most increased. The binding of NKG2A to its natural ligand, human non-classic class I leukocyte antigen (HLA) E, is known to induce its immunoreceptor tyrosine-based inhibition motif (ITIM) and suppress cytotoxic cell effector activity [35]. Moreover NKG2A is also known to be co-expressed on activated Th2 but not Th1 lymphocytes [36]. A bias towards Th2-type immune response in CFS patients has also been suggested on the basis of intracellular T cell cytokine profiles by Skowera et al. [37]. Interestingly this also aligns with altered expression of the PTPRK gene mentioned above as Asano et al. [38] report impaired Th1 function with PTPRK deletion in rats. Therefore our observations supported findings of increased suppression of cytotoxic activity in CFS and hinted at increased Th2 activity though the latter were not specifically addressed in this analysis.
Neutrophils for their part are only found at trace and contaminating amounts in most PBMC preparations [39] so  correlation r a, b (null probability p a, b )  it is interesting to note that the neutrophil gene set arose as a core element in the emergence of coordinated immune activity. In particular the CD16+ neutrophil gene set and the CD14+ monocyte gene set shared significant co-expression. Not only do these arise from the same hematopoietic CD34+ progenitor cell [40] but since the immune community is highly integrated the presence or absence of neutrophils will also be mirrored in the state of the remaining cell population. The CD14+ monocyte set also shared significant co-expression with the CD19+ B cell gene sets. Together this neutrophil-monocyte-B cell immune interaction triad is highly consistent with a model of chronic inflammation proposed by Lefkowitz and Lefkowitz [41]. According to this model once an event initiates inflammation, neutrophils are among the first cells to arrive at the site. They degranulate releasing MPO into the microenvironment which together with iMPO binds to macrophage MMR receptor and induces release of TNF-α. The latter functions in an autocrine manner and along with iMPO initiates a cytokine cascade (IL-1, IL-6, IL-8, GM-CSF). IL-8 attracts more neutrophils and together with GM-CSF causes these to once again degranulate. With the corresponding release of additional MPO, the cycle starts once again. The TNF-α initiated cascade induces IL-6 which is used by B cells for maximum antibody secretion usually IgM. In addition to the present analysis, a preliminary examination of cytokine data collected in the Wichita study pointed to an increase in TNFα in CFS subjects (data not shown) as documented previously by Moss et al. [42].
In addition to this core network, we also observed that CD8+ T cell set expression correlated negatively with that of the NK and CD19+ up-regulated B cell sets. In one possible mechanism linking these three cell types, IgG antibodies binding to GD3 on the surface of CD4+ and CD8+ T cells could elicit signals for proliferation of these subsets and expression of the IL-2 receptor CD25. NK cells have been shown to selectively inhibit this antibody-mediated proliferation of CD8+ T cells by Claus et al. [43] perhaps  correlation r a, b (null probability p a, b )  through down-regulation of autologous mixed lymphocyte reaction (MLR). This basic analysis of immune gene set co-expression points therefore to the existence of immune signaling processes in CFS that adhere to at least one known mechanism of chronic inflammation and support possible antibody-mediated NK cell modulation of T cell activity. Furthermore association networks con-structed for LCA classes 1 and 5 suggested that B cell involvement in these processes may serve as factor for discriminating between distinct subsets of CFS subjects.
Although several very plausible immune response mechanisms were recovered by this analysis it must be emphasized that the use of discrete gene sets has several Heat maps of gene set co-expression expressed as linear correlation coefficient r a, b at cutoff significance p a,b <0.10 (ᮀ) and at cutoff significance p a,b <0.05 (l) for empiric classes for non-fatigued (NF) controls, insufficient fatigue symptoms (ISF) and CFS as well as for LCA control classes (LCA-0, 2) and for all LCA disease classes Figure 5 Heat maps of gene set co-expression expressed as linear correlation coefficient r a, b at cutoff significance p a,b <0.10 (ᮀ) and at cutoff significance p a,b <0.05 (l) for empiric classes for non-fatigued (NF) controls, insufficient fatigue symptoms (ISF) and CFS as well as for LCA control classes (LCA-0, 2) and for all LCA disease classes. limitations. In particular it becomes increasingly difficult to identify genes that are exclusively or even predominantly expressed in specific cell lineages when these share many commonalities of function and goal. This issue was reflected in by the small size of the gene sets identified in this work from lymphocyte subset expression profiles. An approach that promises to be more robust and more revealing still involves the direct use of the genome-wide expression for these cell populations. This remains an active area of research [44]. However, even this simple analysis points to dramatic differences in immune network topology and cell signaling in CFS and we expect these differences to be largely conserved in more elaborate analyses. Furthermore the methodology outlined and issues raised in this work demonstrate the importance of developing approaches that effectively integrate flow cytometry with cytokine and gene expression profiling. In particular it underscores the importance of looking beyond differential expression of individual components towards changes in their patterns of coordinated activity and formally recognizing the network properties of the immune system.