New complexities in helper T Cell fate determination and the implications for autoimmune diseases Abstract Recently, new complexities in cell fate decision for helper T cells have emerged. One new lineage, which has come to be called Th17 cells, selectively produces proinflammatory cytokines including interleukin-17 (IL-17, A and F), IL-21, and IL-22. In conjunction with transforming growth factor ß-1 (TGFß-1), IL-6, IL-21, and IL-23, which activate the transcription factor, signal transducer, and activator of transcription 3 (Stat3), the expression of another transcription factor, retinoic acid-related orphan receptor-γt (RORγt) leads to the differentiation of Th17 cells in mice. Other cytokines including IL-2, IL-4, interferon-γ (IFN-γ) and IL-27 inhibit Th17 differentiation. However, IL-2 acting with TGFß-1 induces differentiation of naive CD4+ T cells to become regulatory T cells (Tregs). Th17 cells are now known to play an important role not only in the pathogenesis of inflammation and autoimmune diseases, but also host defense against extracellular bacteria. Conversely, extensive data substantiate the role of Tregs as essential in maintenance of peripheral tolerance. Selectively targeting Tregs and Th17 cells are likely to be important strategies in the treatment of inflammatory and autoimmune diseases in humans. Keywords: Th17 cells, autoimmune diseases, inflammation, IL-17, Rheumatoid arthritis Introduction It is well-established that the innate immune response leads to activation of adaptive immunity, shaping the proper response, depending upon the pathogen eliciting the reaction. Conversely, though, it is also clear that adaptive immune response, especially, CD4+ or helper T cells orchestrate responses to effectively eliminate pathogens. Classically, helper T cells were considered to differentiate into two lineages, T helper 1 (Th1) cells or Th2 cells (1). In addition to ligation of the T-cell receptor (TCR) and co-stimulatory receptors, a suitable cytokine milieu is required for fate decision of helper T cells (1). In particular, interleukin-12 (IL-12), which is produced by activated antigen-presenting cells (APCs), promotes Th1 differentiation (2), whereas IL-4, produced by activated T cells and some innate immune cells, drives Th2 differentiation (1). While the Th1/Th2 paradigm provided invaluable insights for immunologists, its limitations were also clear. What was the most obvious was the inability of this dichotomy to explain autoimmune diseases. More recent studies have shown the existence of the additional subpopulations of T helper cells that are also important in immunoregulation and host defense. One important subset is CD4+CD25+, regulatory T cells (Tregs) (3, 4). Tregs suppress the proliferation and function of effector T cells, and attenuate immune responses against self or nonself-antigens (3, 4). Another recently identified new fate of T cells is Th17 cells that selectively produce IL-17 (A and F) (5, 6). In the past three years, discoveries pertaining to this newly identified T helper subset in humans and mice have accumulated with tremendous speed (7-10). It is also clear that the notion of a single type of Th17 cell may not be correct, as there seems to be further complexity in terms of the cytokines produced by these cells. In this review, we will discuss the historical developments that led to a new understanding of the role of T cells in autoimmune diseases and recent advances that have led to even newer insights into T cell biology. Pathogenicity of IL-23, but not IL-12 in autoimmune diseases Until a few years ago we thought we knew which subset of helper T cells was responsible for host defense or exacerbation of autoimmune diseases and we thought that answer fit well with the standard Th1/Th2 cells paradigm (11). Th1 cells produce interferon-γ (IFN-γ) and lymphotoxin, and are responsible for protection against intracellular pathogens such as viruses, mycobacteria, and protozoa (11). Produced by activated APCs, IL-12 activates the transcription factor, signal transducer, and activator of transcription 4 (Stat4) (2). Stat4-dependent signaling in conjunction with TCR-dependent signals induces the expression of T-box-expressed-in-T-cells (T-bet), the so-called master regulator of Th1 differentiation (2). Recently, in humans, it has been shown that STAT4 polymorphisms are associated with rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) (12). IFN-γ also activates Stat1 which further induces T-bet to form an autocrine feedback loop for IFN-γ production (2). T-bet promotes not only IFN-γ production but also suppression of Th2 cytokines production by T cells (2). On the other hand, Th2 cells produce IL-4, IL-5, and IL-13 (1). IL-4 activates Stat6, which up-regulates the expression of GATA-binding protein 3 (GATA-3) (1). Th2 cytokines are potent activators of B-cell Immunoglobulin E production, eosinophil recruitment, and mucosal expulsion mechanisms, and are essential for promoting host defense against helminths and other parasites (1). In addition, Th2 cells have been shown to mediate allergic diseases such as asthma, rhinitis, and atopic dermatitis (11). Traditionally, autoimmune diseases had been assumed to be associated with dysregulated Th1 responses (13). Because the treatment with anti-IL-12p40 antibody was effective in Crohn's disease (CD) and psoriasis (14, 15), it was further assumed that IL-12-mediated IFN-γ production and Th1 response were involved in the pathogenesis of autoimmunity. However, it was shown that IFN-γ deficiency exacerbated rather than ameliorated mouse models of autoimmune diseases such as experimental autoimmune encephalomyelitis (EAE) (16). Subsequently, a new cytokine, which consists of IL-23p19 and IL-12p40, was identified and termed IL-23 (17). Interestingly, the IL-23 receptor (IL-23R) forms dimers with the IL-12Rß1 chain shared by IL-12 and IL-23 (18). IL-23R is predominantly expressed on T and natural killer (NK) cells (18, 19). IL-23 activates Janus kinase 2 (Jak2) and tyrosine kinase 2 (Tyk2), which in turn leads to activation of Stat1, Stat3, Stat4, and Stat5, among which Stat3 is the predominant factor phosphorylated by this cytokine (17, 18). Of note, IL-23p19-deficient mice or IL-12p40-deficient (IL-12/IL-23-deficient) mice were resistant to collagen-induced arthritis (CIA) and EAE (20, 21). However, IL-12p35-deficient mice had increased susceptibility to CIA and EAE (20, 21). These studies directly negated the pathogenic role of the IL-12/IFN-γ axis in autoimmunity. This was an important piece of information that led to the unraveling of the Th1/Th2 paradigm as an explanation of autoimmunity. Accordingly, additional studies on IL-23p19-deficient mice and the anti-IL-23p19 antibody revealed that IL-23, but not IL-12, was the culprit in autoimmunity, at least in mouse disease models (21-23). The pathogenic role of IL-23 was also confirmed by the phenotype of IL-23p19 transgenic mice, in which premature death, systemic inflammation, anemia, and elevated serum levels of inflammatory cytokines and acute phase proteins were observed (22). In human disease, the mRNA levels of both IL-23p19 and IL-12p40 were shown to be increased in skin lesions of psoriatic patients (24), suggesting that elevated IL-23 contributed to the pathogenesis of psoriasis. Therefore, IL-23 began to attract attention as a factor crucial to inflammation and autoimmune diseases. The IL-23/IL17 axis in inflammation While it was becoming clear that IL-23 was the culprit in autoimmune and inflammatory diseases, the question remained, how was it acting? Another inflammatory cytokine, IL-17, was noted to be produced by activated purified T cells in the presence of LPS-stimulated dendritic cells (DCs) (5). It was also found that IL-23 promoted IL-17 production from memory T cells and this production was suppressed by treatment with anti-IL-12p40 antibody (5). Now IL-17 is recognized as the founding member of a family of proinflammatory cytokines: IL-17 (IL-17A), IL-17B, IL-17C, IL-17D, IL-17E (known as IL-25), and IL-17F (6). IL-17A and IL-17F induce the production of various proinflammatory cytokines such as tumor necrosis factor-α (TNF- α), IL-1ß, and IL-6, and CXC chemokines from monocytes, airway epithelial cells, vein endothelial cells, and fibroblasts to mount defense against extracellular bacteria such as Klebsiella pneumonia, Bacteroides fragilis, and Mycobacterium tuberculosis (6, 25, 26). These responses result in the recruitment, activation and migration of neutrophils to the sites of inflammation and infection (6). CD4+ T cells are major producers of IL-17, but in addition CD8+ T cells, neutrophils, γδ T cells, and invariant natural killer T (NKT) cells also produce this cytokine (26-28). Various reports associated IL-23 with IL-17 production with the onset or exacerbation of inflammation (20-23, 29). For instance, IL-23p19-deficient mice showed reduced incidence of CIA, with fewer IL-17-producing CD4+ T cells (20). Consistent with this report, treatment with anti-IL-23p19 antibody reduced serum levels of IL-17 and inhibited development of EAE (29). Moreover, IL-23 was found to be essential for production of IL-17 and IL-6 in colitis mouse models (22, 23). Regardless of exactly how IL-23 works, these reports supported the idea that an IL-23/IL-17 axis, but not IL-12/IFN-γ axis, was a crucial pathway in the pathogenesis of various autoimmune diseases (21, 30). Regulation of Th17 differentiation Although many results linked IL-23 with IL-17 production in CD4+ T cells, less was known about how Th17 cells arose from naive CD4+ T cells. It was very clear that development of IL-17-producing T cells was not dependent on the cytokines and the transcription factors required for Th1 or Th2 differentiation and then it was proposed that IL-17-producing CD4+ T cells represented a new subset of helper T cells (Th17 cells), which were crucial in mediating inflammatory responses (30-32). While there was evidence that IL-23 stimulated CD4+ T cells to express various genes, including IL-17, and that IL-23-stimulated T cells were important in the pathogenesis of EAE (30), it was less clear whether IL-23 was the factor that drove naive CD4+ T cells to become IL-17 producers. It was especially puzzling that naive CD4+ T cells did not express IL-23R (18). In a search for optinum culture condition for Th17 differentiation it was recognized that naive CD4+ T cells cultured with activated DCs produced IL-17 (16). It was then recognized that the combination of IL-6 and transforming growth factor ß-1 (TGFß-1), with appropriate stimulation of TCR, were the optinum stimuli inducing Th17 differentiation (Fig. 1a) (16). Accordingly, TGFß-1-deficient mice showed impaired Th17 development, whereas addition of TGFß-1 rescued the number of Th17 cells (7, 33). Thus it became clear that IL-23 was not the direct initiator of Th17 production (16), but rather was an important factor for expanding and maintaining Th17 cells in vivo (34). New T cell lineages in mice (a) and humans (c). When stimulated by APCs such as DCs and macrophages, and cognate peptide, naive helper T cells differentiate into lineages determined by the cytokine milieu. TGFß-1 is a critical factor in the differentiation of new two lineages in mice T cells. In conjunction with TCR stimulation, the combination of TGFß-1 and IL-2 (and other γc cytokines) induces the expression of Foxp3, leading to the differentiation of naive CD4+ T cells into anti-inflammatory Tregs (a). However, in the presence of TGFß-1 with IL-6 or IL-21, which activates Stat3 and up-regulates RORγt, naive CD4+ T cells develop to become pro-inflammatory Th17 cells (a). Th17 differentiation is suppressed by IFN-γ, IL-4, IL-2, IL-25, IL-27, and retinoic acid in mice (b). Both Th17 cells and Tregs are also present in humans (c). TGFß-1, IL-1ß, and IL-2 in combination with IL-6, IL-21 or IL-23 are necessary for human Th17 differentiation (c). APCs, antigen-presenting cells; iTregs, inducible Tregs Autocrine regulation of Th17 differentiation by IL-21 In the classic Th1/Th2 paradigm, T cell subsets produce factors that promote their differentiation and constrain the differentiation of the opposing lineage. It was of interest in this regard that the Th17 cells were found to selectively produce IL-21 (35-38). IL-21 has structural homology with IL-2 and IL-15, and the initial report showed the effects of IL-21 produced by CD4+ T cells and NKT cells on the proliferation and function of NK cells, B cells, and T cells (39, 40). Recently though, it has also been found that the autocrine production of IL-21 by Th17 cells promotes Th17 differentiation while inhibiting IFN-γ production (35-38). Accordingly, the deficiency of IL-21 (or its receptor), or blocking of IL-21, attenuates the generation of Th17 cells (35-37). Transcription factors for Th17 differentiation To precisely understand the molecular mechanisms of Th17 differentiation, it is important to define the transcription factors that directly regulate IL-17 production. All of IL-6, IL-21, and IL-23 activate the transcription factor Stat3 in T cells (38, 41, 42); accordingly, Th17 differentiation is abrogated in Stat3-deficient T cells (38, 42-45). Moreover, conditional deletion of Stat3 in T cells also abrogates the development of autoimmune diseases such as EAE and autoimmune pneumonitis in mice (43). The role of Stat3 in Th17 differentiation seems to be remarkably direct – i.e. Stat3 binds to the Il17 and Il21 loci, as detected by using chromatin immunoprecipitation (ChiP) assays (38, 41). In addition, IL-6 and IL-23 promote IL-23R expression and this also is Stat3-dependent (10, 36, 37). Recently, the importance of Stat3 in human Th17 differentiation has been documented (46). Like other T cell subsets, Th17 cells also have a lineage-specific transcription factor, namely the retinoic acid-related orphan receptor-γt (RORγt) (8). Interestingly, RORγt-deficient T cells produce fewer Th17 cells, whereas overexpression of RORγt in T cells promotes IL-17 expression (8). Furthermore, RORγt-deficient mice are less susceptible to EAE, suggesting that RORγt is a key regulator of Th17 differentiation (8). Recent studies have shown that RORγt expression is significantly reduced in Stat3-deficient T cells (43-45). However, the precise molecular mechanisms of Stat3-mediated expression of RORγt are still unclear. Even so, RORγt could be a good candidate to target in treating inflammatory diseases (discussed in further detail below). Also, it has been shown that expression of another retinoic acid related nuclear receptor, RORα also induces Th17 differentiation (47). RORα seems to synergize with RORγt to promote development of Th17 lineage (47). Counter-regulation of IL-17 and the Yin/Yang of Th17 cells and Tregs Because of their potential for inducing immune-mediated damage to the host, it is perhaps not surprising that multiple mechanisms exist to inhibit the production of Th17 cells (Fig. 1b). Indeed, it has been recognized that the Th1-related cytokine IFN-γ inhibits Th17 differentiation (31, 32). Consistent with this concept, T-bet-deficient mice show enhanced Th17 differentiation (48). Similarly, the Th2-related IL-4 is also shown to down-regulate Th17 differentiation (49). However, how IFN-γ and IL-4 inhibit IL-17 production is not yet known. Interestingly, IL-27, an IL-12-related heterodimeric cytokine, also has anti-inflammatory effect by inhibiting the generation of Th17 lineage and leads to limitation of EAE (50). Like IFN-γ, IL-27 suppresses Th17 differentiation through a STAT1-dependent manner (51, 52). IL-25 (IL-17E) was also shown to inhibit IL-17 production by promoting Th2 differentiation (53). These reports suggest a cross-regulatory relationship among Th1, Th2, and Th17 differentiation. However, it still remains to be determined how Th1-related and Th2-related transcription factors negatively regulate IL-17 production. Another new subset of helper T cells that provides considerable insight into the pathogenesis of autoimmunity is the CD4+CD25+ regulatory T cells (Tregs) (3, 4, 54). Tregs suppress proliferation of effector T cells and maintain self tolerance by down-regulating immune responses against self or nonself-antigens (3, 4, 54). The mechanism by which Tregs preserve peripheral tolerance is still not entirely clear; however, they preferentially express cytotoxic T-lymphocyte antigen 4 (CTLA-4), and immunosuppressive cytokines including TGFß-1 and IL-10 (55-57). Naturally arising Tregs (nTregs) are generated in the thymus, whereas inducible Tregs (iTregs) arise from naive T cells after antigen exposure and cytokine stimulation in the periphery, especially in the gut (58). Recently, it has been shown that CD4+CD25+ T cells express the transcription factor, forkhead box protein 3 (Foxp3), which is necessary and sufficient for Tregs to develop and function (59, 60). In mice, IL-2 and TGFß-1 were able to differentiate CD4+CD25- naive T cells into CD4+ CD25+ Foxp3+ Tregs (61, 62). Therefore, the importance of TGFß-1 for both Tregs and Th17 lineage led to the speculation that these lineages were related. This putative relationship also led to investigation of the importance of IL-2 in Th17 differentiation (44). In fact, IL-2 was found to inhibit the expression of RORγt while the enhancing expression of Foxp3 (44). Furthermore, blockade of IL-2 or deletion of Stat5 promoted Th17 differentiation, suggesting that IL-2 inhibited Th17 lineage by affecting the RORγt-Foxp3 balance (44). Therefore, helper T cells seem to have a unique way of achieving a balance between Tregs and Th17 cells during ongoing immune responses. Recently, it has been reported that TGFß-1 can promote either Th17 or Tregs lineage differentiation, depending on its local concentration (63). Interestingly, Foxp3 may inhibit RORγt activity on its target genes, at least in par,t through direct interaction with RORγt (63). Further complexities of Th17 cells The roles of Th17 cells are complicated by the fact that these cells also have the ability to produce anti-inflammatory cytokines such as IL-10 (34). This cytokine has anti-inflammatory properties that suppress functions of DCs and macrophages (64). IL-10 can be produced by many cell types such as B cells, macrophages, DCs, Th2 cells, and distinct populations of Tregs (64). Recently, it has been reported that de-novo generation of Th17 cells in the presence of IL-6 and TGFß-1 leads to the production of IL-10 with up-regulation of RORγt and IL-17 (34). Furthermore, IL-1ß, but not IL-23, potentiates IL-10 production in Th17 cells along with IL-6 and TGFß-1 (34). Paradoxically, IL-6 and TGFß-1 seem to drive initial Th17 lineage commitment but also restrain the pathogenic potential of Th17 cells by producing IL-10 also (34). Another cytokine produced by Th17 cells is IL-22 (25, 65, 66). IL-22 belongs to the IL-10 family of cytokines which include IL-19, IL-20, IL-24, and IL-26. Expressed in activated T cells, Th1 cells and NK cells (67), IL-22 induces expression of various antimicrobial peptides such as ß-defensins in skin, respiratory, and digestive tissues, which express IL-22R (68). In fact, expression of IL-22 and ß-defensins is higher in skin from patients with psoriasis and atopic dermatitis than in that from healthy individuals (69). While it is clear that IL-22 and IL-17 can contribute to host defense and to pathogenesis of various autoimmune diseases (25), it is also notable that IL-22 also has important anti-inflammatory effects (70). Specifically, IL-22-expressing Th17 cells provided protection in models of hepatitis (70). Recently, IL-22 also has been shown not to be required for the development of EAE in mice (71). These facts indicate the complexity of Th17 cells, which produce multiple cytokines and have diverse functions during inflammation. As such, their relevance may vary among different autoimmune diseases. Human Th17 differentiation While a large body of evidence has documented that the best recipe for generating Th17 cells from murine naive CD4+ T cells is the combination of TGFß-1 and IL-6 or IL-21, several groups initially reported that the regulation of human Th17 differentiation was quite distinct (Fig. 1c) (9, 10, 66, 72-74). Specifically, TGFß-1 was initially considered to have inhibitory effect on human IL-17 production; however, the role of TGFß-1 in human Th17 differentiation has recently become clearer. While TGFß-1 alone had an inhibitory effect on production of IL-17 by T cells, culturing human naive CD4+ T cells from cord blood in serum-free medium revealed a requirement of TGFß-1 for Th17 differentiation (72). TGFß-1 seems to have dual effects on human Th17 differentiation in a dose-dependent manner. While TGFß-1 is required for the expression of RORγt in human naive CD4+ T cells from cord blood, TGFß-1 can inhibit the function of RORγt at high doses (72). However, inflammatory cytokines (IL-1ß, IL-6, and IL-21 or IL-23) relieve the latter effect of TGFß-1 (72). By using serum-free medium it has thus been clarified that the optimum conditions for human Th17 differentiation are TGFß-1, IL-1ß, and IL-2 in combination with IL-6, IL-21 or IL-23 (72-74). Of these latter cytokines, IL-23 seems to be the most effective (10, 66, 72, 73). A key aspect of Th17 regulation is the regulation of IL-23R (10). Naive CD4+ T cells express low levels of this receptor, but the combination of TGFß-1, IL-1ß, and IL-23 can strongly induce this receptor (10, 72). Thus, IL-23 promotes expression of its own receptor, allowing it to be an effective inducer of IL-17. It is of note is that CCR6+IL-23R+ human memory CD4+ T cells are major producers of IL-17 (75, 76). Because CCR6+CD4+ T cells, but not CCR6-CD4+ T cells, secrete IL-17 and expression of IL-23R, CCR6, and RORγt is up-regulated during Th17 differentiation (72), these factors seem to be essential for development of human Th17 lineage. Importantly, it has been shown that mutations presumed to underlie hyper-IgE syndrome (HIES, “Job's syndrome”) are identified in the STAT3 gene (46). Purified naive CD4+ T cells from HIES individuals can not differentiate into Th17 cells in vitro and have lower expression of RORγt (46), suggesting that Stat3 signaling is also a crucial factor for the generation of human Th17 cells. Th17 cells in autoimmune diseases Although Th17 cells are important and essential subsets participating in host defense against extracellular antigens, evidence implicating Th17 cells as a causal factor of autoimmune diseases is also mounting (25). Following extensive analysis of mouse disease models such as EAE, CIA, and IBD, it has also become evident that IL-17 is a contributing factor in human autoimmune diseases. For instance, perivascular IL-17-producing T cells are present in brain lesions of active MS patients and these cells are reduced in quiescent MS patients (77). Moreover, DCs from MS patients secrete more IL-23 than healthy controls, and T cells from MS patients secrete more IL-17 than healthy controls (78). Also, elevated levels of IL-17 have been noted in the sera and in tissues of patients with RA, IBD, and psoriasis (25). We will see more evidence confirming the roles of IL-17 in human diseases and inflammation in the future. Recent genetic data has linked polymorphisms of the Il23r gene to various human diseases such as ankylosing spondylitis, CD, and psoriasis (79-81). Accordingly treating CD and psoriasis patients with anti-IL-12p40 antibody has resulted in some success (14, 15). These results argue strongly for a role of the IL-23/IL-17 axis in human autoimmune diseases. Th17 cells in rheumatoid arthritis It is of particular interest to evaluate the contribution of Th17 cells to the pathogenesis of RA, because targeting pathogenetic Th17 might be a reasonable strategy for treating RA. First, IL-17-deficient mice fail to develop CIA when challenged (82). Consistent with this, treatment with neutralizing IL-17 monoclonal antibody ameliorates joint inflammation in CIA models (83). Furthermore, it has been reported that active immunization using virus-like particles conjugated with recombinant IL-17 leads to high levels of anti-IL-17 antibodies in mice and reduces scores of disease severity in CIA (84). These results highlight the role of IL-17 as an exacerbation factor in rodent arthritis models. Because the expression of IL-17 and IL-23p19 was observed in the synovial fluid and serum of RA patients, these factors were presumed to be pathogenic (85). In particular, IL-17 up-regulates the production of IL-1ß and TNF-α from APCs in arthritic joints (86, 87). Moreover, IL-17 induces IL-6, IL-8, and IL-23 from RA synovial fibroblasts and chemokines such as CCL20/MIP3α from synoviocytes to induce migration of T cells and DCs into inflammatory lesions (86, 87). IL-17 also promotes cartilage breakdown by inducing matrix metalloproteinases-1 (MMP-1) from RA synovium (86, 87). Additionally, IL-17 induces cyclooxygenase-2 (COX-2)-dependent prostaglandin E2 (PGE2) synthesis by osteoblasts and it then induces expression of receptor activator of NF-κB ligand (RANKL) to differentiate osteoclast progenitors into mature osteoclasts (86, 87). Pro-oxidants such as nitric oxide are also produced by chondrocytes and osteoblasts in response to IL-17 (86, 87). Taken together, these multiple factors induced by IL-17 seem to promote bone resorption, extracellular matrix degradation, synovium proliferation, angiogenesis, and recruitment and activation of immune cells for bone erosion and articular destruction in RA joints. This being said, there is little evidence of T cell proliferation in RA and we still do not know for certain the cellular source of IL-17 in RA synovium (88). For these reasons, IL-17 seems to be a rational target in the treatment of RA, and Phase I/II clinical trials of anti-IL-17 monoclonal antibody (AIN457) have commenced (http://www.clinicaltrials.gov/). It has been reported that combining inhibitors to block TNF-α and IL-17 has the synergistic effect of suppressing IL-6 production and collagen degradation in the synovium and bone of RA patients (89). Therefore, anti-IL-17 therapy may become an alternative for treating patients unresponsive to conventional anti-TNF therapy. It still remains to be seen whether blocking IL-17 will affect other proinflammatory cytokines for example IL-6, IL-1, and IL-22 in RA patients. More studies will shed light on the validity of anti-IL-17 therapy. While the role of IL-6 in regulating IL-17 production has only recently been appreciated, the role of IL-6 as a proinflammatory cytokine has long been recognized. For this reason, targeting IL-6 in RA has been considered for some time. Indeed the efficacy of humanized anti IL-6R monoclonal antibodies (tocilizumab/atlizumab) in the treatment of adults and children with arthritis has recently been reported (90). Also, in April 2008 tocilizumab was been approved in Japan for treatment of RA in Japan, and Phase III trials have been completed in the US (http://www.clinicaltrials.gov/). To what extent their efficacy relates to inhibition of IL-17 will be important to ascertain. Similarly, in the US, recombinant human anti-IL-1Rα antagonist (anakinra) has been approved for use in the treatment of RA (91). Recently, more IL-1-targeting therapies for RA patients are in Phase II studies including anti-IL-1ß monoclonal antibody (ACZ885), human monoclonal antibody to IL-1R (AMG108), and IL-1 Trap/rilonacept (http://www.clinicaltrials.gov/). Although IL-1 has been reported to contribute to Th17 differentiation in mouse and man, it remains to be determined whether therapeutic targeting of IL-1 will substantially affect IL-17 in RA. On the basis of the efficacy of anti-IL-12p40 therapy in human autoimmune diseases (14, 15), a humanized monoclonal anti-IL-12p40 antibody (ABT-874) was tested in RA (92). However this antibody was not shown to be effective because of the trial design (92). Another strategy for targeting IL-12/IL-23 is apilimod mesylate (STA-5326), an oral inhibitor of IL-12p35/p40, which has been tested in a Phase II study for RA (http://www.clinicaltrials.gov/). Whether selectively targeting IL-23p19 will be more efficacious or have a better safety profile remains to be determined. In principle, targeting IL-21 might also be efficacious in treating RA in which IL-17 is involved in immune-mediated damage. Importantly, the common gamma chain and JAK3 play a pivotal role in the signal transduction of IL-21 (25). In fact, a JAK3 inhibitor (CP-690550) is about to be tested in RA (Phase II trial) (http://www.clinicaltrials.gov/). It will be important to know whether the efficacy of this drug is related to inhibition of IL-17. Concluding remarks The past four years have witnessed major revisions of our views on the lineage commitment of helper T cells. The conventional Th1/Th2 dichotomy has been replaced by more complex multi-lineage models which do a much better job of explaining the pathogenesis of autoimmunity (Fig. 1). It now seems that Th17 cells are one of the more important villains in the pathogenesis of autoimmune diseases. However, it is of note that they are also essential for host defense against extracellular bacteria. Therefore, the challenge will be to keep Th17 function in check to control autoimmunity, while maintaining host defense at the same time. In this regard, we still need to learn what are the most critical factors for Th17 differentiation. We also need to better understand which human autoimmune diseases are truly mediated by IL-17 and, by extension, which diseases will benefit most from anti-IL-17 therapy. With the current pace of investigations on Th17, this field has a great potential for advances. There is no doubt that IL-17 and the other proinflammatory cytokines are likely to be relevant therapeutic targets in the treatment of human autoimmune diseases. Estrogen directly activates AID transcription and function Abstract The immunological targets of estrogen at the molecular, humoral, and cellular level have been well documented, as has estrogen's role in establishing a gender bias in autoimmunity and cancer. During a healthy immune response, activation-induced deaminase (AID) deaminates cytosines at immunoglobulin (Ig) loci, initiating somatic hypermutation (SHM) and class switch recombination (CSR). Protein levels of nuclear AID are tightly controlled, as unregulated expression can lead to alterations in the immune response. Furthermore, hyperactivation of AID outside the immune system leads to oncogenesis. Here, we demonstrate that the estrogen–estrogen receptor complex binds to the AID promoter, enhancing AID messenger RNA expression, leading to a direct increase in AID protein production and alterations in SHM and CSR at the Ig locus. Enhanced translocations of the c-myc oncogene showed that the genotoxicity of estrogen via AID production was not limited to the Ig locus. Outside of the immune system (e.g., breast and ovaries), estrogen induced AID expression by >20-fold. The estrogen response was also partially conserved within the DNA deaminase family (APOBEC3B, -3F, and -3G), and could be inhibited by tamoxifen, an estrogen antagonist. We therefore suggest that estrogen-induced autoimmunity and oncogenesis may be derived through AID-dependent DNA instability.   Humoral immune responses triggered by foreign antigens require B cell activation. The activated B cell undergoes antibody affinity maturation, which in higher vertebrates includes somatic hypermutation (SHM), gene conversion of antigen-binding V regions, and class switching (1); all of these processes require activation-induced deaminase (AID) (2–5). AID initiates these events by deaminating deoxycytosine to deoxyuracil in DNA (for review see reference [6–8]). The resulting dU:dG lesion can be recognized by several different DNA repair pathways to create the aforementioned antibody diversifications. This necessity for immune diversification and sufficiency for genome instability also highlights AID as an important pathogenic regulator. In the immune system, hyper- or hypoexpression of AID can alter autoimmune pathologies (9–11). Furthermore, SHM, by way of AID, may contribute to lymphomagenesis by mutating (proto-)oncogenes and tumor suppressor genes, or by promoting chromosomal translocations (12–14). There is strong evidence that AID is required for c-myc translocation, leading to tumorgenesis in a murine model for Burkitt's lymphoma (15, 16). Other nonphysiological AID targets include BCL6, CD95/Fas, RHO/TTF, PAX-5, and PIM1 (12, 17–19). Outside the immune system, there are indications that systemic hyperexpression of AID can induce non–B cell cancers from lung (20), lymphatic (20), and liver (21) tissues. AID has also been implicated as a developmental epigenetic reprogramming factor, and its expression levels in oocytes is almost equivalent to that in lymph nodes (22), suggesting that AID could be regulated by pathways other than B cell activation pathways (e.g., E-box proteins [23], NF-κB [24], and Pax5 [25]), with hormones being plausible candidates. Several clinical and epidemiological studies have indicated that females can have stronger and more rapid immune responses upon antigen encounter (26, 27). This gender bias is also reflected in the occurrence of pathogenic immune responses, as found in asthma and other autoimmune diseases (28–31). Several nonimmune pathologies are also strongly influenced by the activity of sex hormones, most notably certain types of cancer. Estrogen and its biological and synthetic derivatives are thought to be oncogenic for breast and ovarian tissue (32, 33), most often being associated with their growth-promoting and differentiating capacity. To further elucidate how AID can be regulated, both within and outside the immune system, and to determine which signaling pathways could use DNA deaminases as DNA instability factors, we analyzed the effect of estrogen on AID's expression and on downstream pathways such as SHM and class switch recombination (CSR). We show that AID can be up-regulated by estrogen, whereas tamoxifen (Tam) can inhibit this stimulation. This effect was most pronounced, but not limited to regulation at the level of transcription. Treatment with estrogen increased AID protein expression, enhanced CSR, augmented mutation frequency in Ig and non-Ig genes, and increased the translocation frequency of c-myc. Estrogen-induced AID messenger RNA (mRNA) production was independent of other B cell stimulatory pathways and could be observed outside immune tissue. We were able to identify two potential estrogen response elements (EREs) near the AID promoter, and determined enhanced ERα binding to the promoter after estrogen treatment in vitro and in vivo. APOBEC3, the evolutionarily related DNA deaminases (34), were also responsive to estrogen treatment in different tissues and cell types. Our data indicate that the mutagenic DNA deaminases are potentially an important target for hormonal regulation. RESULTS Differential effect of sex hormones on AID mRNA Hormones such as estrogen and progesterone exert their biological effects through binding to their intracellular receptors and, upon entering the nucleus, act as transcription factors (35). To determine the effects of hormones on AID expression, we stimulated isolated murine splenic B cells with IL-4 and LPS, which are known to induce AID (24), while adding physiological amounts of progesterone and estrogen. Progesterone addition reduced AID mRNA levels by fivefold, as revealed by quantitative real-time PCR (qRT-PCR; Fig. 1 A). This and all subsequent qRT-PCR analyses were normalized to GAPDH expression, and all enhancements or repressions were analyzed as relative changes to DMSO. To observe a stimulatory effect of estrogen, cells were only treated for a short time period (8 h) rather than the usual 24–48 h. This was done to avoid a possible maximal induction of LPS/IL-4 caused by longer treatment. In contrast to progesterone, physiological amounts of estrogen were able to enhance AID expression threefold in these cells (Fig. 1 A). This antithetical effect of estrogen and progesterone indicated that AID gene regulation is embedded within a systemic sex hormone pathway. The effects of estrogen and progesterone on AID mRNA in murine splenic B-cells. (A) AID mRNA in response to estrogen and progesterone treatment in stimulated B cells. Isolated mouse spleen B cells were stimulated with LPS and IL-4 and treated with different physiological concentrations of estrogen for 8 h or progesterone for 24 h. Unless indicated, DMSO is set to 1, and treatments are represented as relative change to DMSO. (B) AID mRNA in response to estrogen treatment in unstimulated B cells after 8 h treatment with physiological concentrations of estrogen. (C) AID mRNA induction upon different treatment. Cells were treated with 1 nM estrogen and/or 50 nM Tam (Tam) for up to 8 h. DMSO at 0 h is set to 1. All qRT-PCR data are representative of three independent experiments, and error bars indicate standard deviations from the mean. Timelines of cell treatments are indicated next to the graphs. NT, not treated. For A and B, absolute values as compared with GAPDH mRNA are shown in Fig. S10, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1. We focused our subsequent experiments on the verification that the estrogen-induced stimulation on AID was analogous to another known estrogen response gene. Because we could determine that the expression of gene regulated in breast cancer 1 (a known estrogen response gene) (36) had a similar stimulation profile in B cells (unpublished data), it seemed likely that the AID gene could also be activated in uninduced B cells with estrogen. Stimulation of isolated splenic B cells with physiological amounts of estrogen produced a sevenfold increase in AID mRNA (Fig. 1 B). This induction began to plateau at 4 h, with the earliest indication of an increase detectable at ~2 h (Fig. 1 C). In most systems, the synthetic hormone Tam acts as an antagonist to estrogen stimulation, presumably by binding to the estrogen receptor (ER) and altering its DNA binding capacity (37). The presence of Tam during the estrogen treatment of isolated splenic B cells inhibited the stimulatory activity, whereas Tam on its own had only a limited effect on AID mRNA expression at the concentration used (Fig. 1 C, Estrogen 1 nM/Tam and Tam, respectively). Interestingly, at very low concentrations, Tam can have a stimulatory effect on AID mRNA (Fig. S1 A, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1), which may reflect Tam's agonistic activity (see Discussion). Hormonal regulation of AID mRNA is predominantly via transcription Although we were able to observe an increase in AID mRNA at 2 h (Fig. 1), we needed to determine if this regulation was direct or indirect. We pretreated splenic B cells with the translational inhibitor cycloheximide (CHX), followed by stimulation with estrogen, and observed an increase in mRNA production equivalent to treatment without inhibitor (Fig. S1 B). This suggested that the effect of estrogen was directly mediated on AID's mRNA synthesis. Because qRT-PCR of cDNA is a readout of steady-state mRNA, we also tested if the increase by estrogen was caused by transcription or mRNA metabolism. Treatment of cells with transcription inhibitors (actinomycin D [ACT] and α-amanitin [AMA]) abrogated the effect of estrogen, indicating that this alteration in AID's mRNA was not caused by message stability (Fig. S1 B). As estrogen can affect pre-mRNA to mRNA processing (38), we designed qRT-PCR primers to span the complete transcription unit of AID (Fig. S2 A, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). When we compared the relative change in expression of the various pre-mRNA's exons and introns to that of the mature mRNA, we found only a minor effect caused by estrogen treatment (the 5'-most PCR unit, <100 bp away from the start of AID gene, was up-regulated to almost the same extent as the mature mRNA [Fig. S2 B]). The intron between exon 3 and 4 (7,538–7,689 bp) showed a marginal increase in response to estrogen over that of the mature RNA, indicating a potential region for estrogen-induced splicing of AID RNA. Because the relative change did not significantly alter the overall effect, we did not pursue this analysis further, although recent data suggests that alternative splicing may influence AID expression (39). The experiments substantiate the notion that estrogen's main mode of action on AID is through transcriptional regulation and not mRNA metabolism. Identification of hormone response elements in the AID promoter regions The rapid effect of the hormones on AID message via transcription suggested that the AID gene is a direct target for hormonal regulation. Using bioinformatic analysis (Fig. 2 A), we were able to identify putative EREs in the context of other response elements, such as NF-κB. We dissected the 1.5 kb upstream and the 2 kb downstream of the ATG regions for hormone-responsive elements in a heterologous transcription assay. The potential response regions were placed into a luciferase reporter construct and transfected into human SiHa cells, followed by treatment with the indicated hormone (or cotransfected with expression plasmids), and then analyzed for luciferase activity. As we were primarily interested in the effect of hormones on expression, we used relative change as a readout rather than absolute values, which provided a more direct evaluation of the hormone treatment but potentially obscured the individual effect of the various DNA elements. When compared with DMSO treatment, estrogen responsiveness was most significant with Fragment C, indicating that this contained the predominant estrogen-responsive DNA element. Comparable to the mRNA production of AID in B cells, Fragment C also responded in a dose-dependent manner to estrogen (Fig. 2 C). Aside from the putative ERE, Fragment C also harbored the two published NF-κB binding sites (24). As indicated by the qRT-PCR analysis in Fig. 1 A, estrogen and the LPS/IL-4–induced NF-κB stress-response pathway could act synergistically on AID mRNA production. To more directly stimulate the NF-κB pathway in SiHa cells, we used the cell-autonomous activator TNF-α (Fig. 2 C). Interestingly, aside from the synergy (e.g., 10-9 M), the two maxima of the dose response were offset (TNF-γ treated, 10-9 M; untreated, 10-7 M), indicating a higher complexity of the two interacting pathways. To demonstrate independence of the two pathways, we analyzed the response of Fragment C to treatment with TNF-α and estrogen upon cotransfection of the dominant-negative mutant of IκBα (IκBα S32A/S36A dominant mutant [IκBα-mt]), which is known to inhibit the release of NF-κB from the cytoplasm into the nucleus after stimulation. As shown in Fig. 2 D, the TNF-α activation was inhibited in the presence of IκBα-mt, yet estrogen was able to independently activate the transcription. This indicated that in the AID promoter, the NF-κB site and its proximal ERE could act independently as well as synergistically. Human AID promoter analysis for hormone response elements. (A) Schematic representation of potential EREs (square) and NF-κB sites (circle) and their respective locations in the human promoter. The indicated promoter regions (marked A–E) were inserted into a luciferase reporter construct with a minimal promoter. The vectors were transfected into SiHa cells, incubated for 24 h, treated for 4 h with hormones or TNF-α, and analyzed for luciferase activity. (B) Relative luciferase activity after estrogen treatment. Cells were transfected with constructs containing AID promoter fragments and treated with estrogen for 4 h. (C) Effect of TNF-α and estrogen on the human AID promoter. Expression construct with an AID promoter region containing NF-κB sites and putative ERE (Fragment C) were transfected into cells, followed by TNF-α and/or estrogen treatment for 4 h. (D) Estrogen can act independently from NF-κB. Cells were cotransfected with Fragment C and an IκBα-mt expression vector. After 24 h, cells were treated with TNF-α and/or 100 nM estrogen for 4 h. Timelines of cell treatments are indicated below the graphs. NT, not treated. ERE binding in B cell extracts The transient transfection assay indicated that a predicted ERE was subject to estrogen regulation. Thus, we analyzed ER binding to the AID promoter and focused on the more widely expressed receptor subtype ERα. To analyze the binding of ER to parts of the AID promoter in vitro, we prepared nuclear extracts from treated and untreated cells and performed electromobility shift assays (EMSA). For the biochemical analysis of the ER binding to the ERE, we focused on NF-κB proximal ERE of Fragment C. A 34-bp fragment containing the 5'-most proposed ER binding site was incubated with untreated and treated extract (Fig. 3 B, lanes 2 and 3). The estrogen treatment clearly induced a protein that could bind the fragment (arrow), which was not induced when treating the B cells with TNF-α before extract preparation (lane 4). Cotreatment with estrogen and TNF-α had the same effect as estrogen alone (Fig. 3 B, lane 5), indicating that the two pathways act through different nuclear proteins. Competition experiments with the ER binding site (Fig. 3 B, lanes 6–8) or a mutation of the proposed ER site (Fig. 3 B, lanes 9–11) showed a specific competition, indicating ER-like binding kinetics. Using antibodies to the DNA-binding domain of ERα strongly inhibited the formation of the estrogen-induced band (Fig. 3 C, lanes 2 vs. 3–5), but the shift was unaffected by a control antibody (Fig. 3 C, lanes 2 vs. 6–8, arrow). The anti-ERα antibody did induce the appearance of a high molecular weight complex (Fig. 3 C, triangle in lanes 4 and 5), which could be caused by either the supershift of a dimerized ERα or heterodimer ERα/ERß; however, this needs to be analyzed further. Identification of ER binding to human AID promoter by EMSA and ChIP. (A) Schematic representation of human AID promoter region (as in Fig. 2 A). The position of the oligonucleotide used for EMSA and the region amplified by qRT-PCR for ChIP are marked as a black line and a dashed line, respectively. (B) Estrogen (denoted as E) -induced oligonucleotide shift (marked with an arrow) in Ramos nuclear extracts. Cells were treated for 72 h in hormone-depleted serum, followed by 4-h treatment with 10 nM estrogen (lanes 3 and 5–11) and/or TNF-α (lanes 4 and 5), and nuclear extract preparation. Different concentrations of unlabeled competitors ER (lanes 6–8) and mutated ER mut (lanes 9–11) were added to the binding reaction. Open triangle, nonspecific DNA binding band. (C) EMSA with anti-ERα antibodies. Increasing concentrations of anti-ERα antibody and a nonspecific antibody were added to the binding reaction (see Materials and methods). The estrogen-induced band is marked with an arrow, and a super-shifted band appearing upon anti-ERα antibody addition is marked with a closed triangle. Open triangle, nonspecific DNA-binding band. (D) ERα binds to upstream region of human AID promoter. Cells were treated as in B. Data are representative of three independent experiments and error bars indicate standard deviations from the mean. ChIP was performed using anti-ERα or control antibodies, and the bound DNA was subjected to qRT-PCR. Estrogen and Tam treatments are marked with E1 (estrogen 1 nM), E10 (estrogen 10 nM), and Tam, respectively. (E) Estrogen can cooperate with TNF-α in recruiting NF-κB to AID promoter. ChIP is as in D, using anti–NF-κB or control antibodies. NT, not treated. Because estrogen and TNF-α co-stimulation did not alter ER binding to the ERE, we wanted to determine if the reciprocal of NF-κB binding to the NF-κB site, after the combined treatment, was also unaffected. To that end, we probed the published NF-κB site (24) with our extracts. Using cold competitors (Fig. S3 B, lane 4 vs. 6–11, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1), as well as anti–NF-κB antibodies (Fig. S3 C), we could demonstrate the specificity of the NF-κB binding site. As with the ER binding site, NF-κB binding was not altered by cotreatment with estrogen and TNF-α (Fig. S3 B, lanes 4 vs. 5). This indicated that the respective treatments did not alter the general DNA-binding properties of ER or NF-κB proteins. Because the distance, on the AID promoter, between the ER and NF-κB sites was larger than our EMSA probes (i.e., neither probe contained both binding sites), we could not study the effect of cooperative binding. ER binding to AID promoter in vivo Although the in vitro binding of the ER to the AID promoter indicated a direct binding, we also wanted to probe this interaction in vivo. To this end, we used the constitutive AID-expressing and mutating Burkitt lymphoma Ramos cells and performed chromatin immunoprecipitation (ChIP) assay, followed by qRT-PCR. Treated cells were fixed, lysed, DNA sheared, and ERα or NF-κB immunoprecipitated, and the DNA was released for PCR. As shown in Fig. 3 D, the anti-ER antibody specifically immunoprecipitated the AID promoter upon estrogen stimulation in a dose-dependent manner. A control antibody was unable to precipitate this region. Analogous to the EMSA assay, we did not detect a significant increase in ER binding to AID promoter when we co-stimulated with TNF-α, just as TNF-α treatment alone did not enhance ER binding. Again, cotreatment of cells with Tam and estrogen before ChIP did reduce the binding of ERα to the AID promoter, whereas Tam on its own did not significantly alter ERα binding (Fig. 3 D, E 10/Tam and Tam, respectively), providing further evidence for a direct binding of ERα to the AID promoter. Treatment of the cells with TNF-α increased the binding of NF-κB to the AID promoter by more than sevenfold (Fig. 3 E). Interestingly, co-stimulation of the cells with TNF-α and estrogen had a synergistic effect on the NF-κB binding; the binding increased to >30-fold above unstimulated treatment (4.5-fold above TNF-α alone). Estrogen up-regulates AID protein production For us to determine if the effect of estrogen on AID would also extend to the protein level, we developed a quantitative approach for measuring AID protein. We generated a DT40 cell line (a cell line derived from a chicken B cell lymphoma that constitutively expresses AID and undergoes Ig diversification) to express a double tag (3xFLAG-2xTEV-3xc-Myc) fused to the C terminal exon of endogenous AID (AID FLAG-Myc–tagged AID protein [AID-FM]; Fig. S4, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). The modified DT40 express a WT AID protein and an AID-FM fusion protein, both transcribed from the endogenous AID locus. Comparing the qRT-PCR induction kinetics with that of the AID-FM expression, we determined that the steady-state levels of AID protein correlated to those of the mRNA (Fig. 4, A and B). Using the translation inhibitor CHX (Fig. 4 C), we showed that estrogen was not able to increase AID-FM expression in the absence of translation. In the presence of the proteasome inhibitor MG-132 (Fig. 4 C), estrogen increased AID-FM production above that of the MG-132 alone, indicating that estrogen was not acting on the proteasome to increase AID-FM activity. The aforementioned co-stimulatory effects of TNF-α were also observed at the level of protein production (Fig. 4, E 10/TNF-α). The effects of estrogen on AID protein in DT40. (A) Estrogen induces AID mRNA expression. AID-FM–tagged DT40 cells were treated with DMSO, 100 nM estrogen, and 10 nM estrogen with TNF-α, lysed, and analyzed for AID mRNA expression with pRT-PCR at various time-points. (B) Estrogen induces AID-FM fusion protein expression. Treatment as in A, but lysates were analyzed by quantitative Western blot. For each sample, FLAG and Tubulin expression was quantitated. The graph is derived from correlating the FLAG expression to Tubulin expression, and then determining the ratio of estrogen-induced FLAG expression to untreated DMSO samples. (C) Estrogen does not affect AID-FM fusion protein stability. Cells were incubated with CHX or MG-132 for 2 h, followed by estrogen treatment for 4 h. Protein levels were determined by quantitative Western blot. For all experiments, cells were grown in hormone depleted media for 48 h. Results are normalized to control treatments as indicated on each graph. Timelines of cell treatments are indicated below the graphs. Hormonal regulation of CSR and SHM On the molecular level, CSR requires AID, yet regulation can be achieved on multiple levels, some of which may be hormonally influenced. To analyze CSR, isolated mouse splenic B cells were treated with different combinations of cytokines and hormones. B cell stimulation with LPS, LPS and IL-4, LPS and IFN-γ, or LPS and TGF-ß results in switching to IgG3, IgG1, IgG2a, or IgG2b/IgA, respectively. To ensure we could correlate AID activity precisely, and to avoid possible proliferative and antiapoptotic effects of estrogen on stimulated B cells, we monitored the early molecular events of class switching. One of the molecular intermediates during class switching is the generation of a looped-out circular DNA called a switch circle (Fig. S5, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). The circular DNA contains a recombined transcription unit that produces switch circle transcripts; it is generated from the promoter of the downstream switch region and the Igµ switch region. Using a previously described qRT-PCR approach (40), we were able to show enhanced switching to IgG1, IgG3, IgA, or IgE after hormone treatment of splenic B cells (Fig. 5 A). Presumably through the increased production of AID, estrogen was able to enhance switch circle formation for all subclasses tested. As with the AID production, this link was perturbed with the addition of Tam during estrogen treatment (Fig. 5 A, Estrogen 10 nM/Tam and Tam). Hormonal effects on Ig class switching, hypermutation, and translocation. (A) Estrogen induces isotype switching. Isolated mouse splenic B cells were stimulated for 48 h with LPS + IL-4 for switching to IgG1 and IgE, LPS + TGF-ß for switching to IgA, and LPS for switching to IgG3. Indicated amounts of estrogen and/or Tam were added to the cells together with cytokines. Relative efficiency of class switching was determined by detecting circle transcripts with qRT-PCR, and data are normalized to the control treatment with DMSO from three independent experiments (error bars indicate standard deviations). (B) Estrogen increases the mutation frequency in VH and CD95/Fas loci of Ramos, and in Sγ3 of splenic mouse cells. Ramos cells were grown in the presence of 100 nM estrogen for ~20 doublings, followed by sequencing of 341 bp from human VH or 750 bp from human CD95/Fas locus. Splenic mouse cells were treated for 6 d with LPS and 10 nM estrogen, and switch gamma3 loci amplified and sequenced (Fig. S7). Mutation frequencies are normalized to the control treatments with DMSO. A standard unpaired two-tailed Student's t test showed a significant difference in mutation frequency in the Sγ3 loci of DMSO- and estrogen-treated spleen cells. (C) Estrogen enhances the c-myc/IgH translocations in splenic B cells from p53+/- mice. In each experiment, 2 spleens per sample were treated with or without 50 nM estrogen in the presence of LPS for 72 h. More than 7 × 107 cells were analyzed by long-range PCR (5 × 104 cells/PCR; Fig. S8 and Supplemental materials and methods). Frequency was determined as c-myc/IgH translocation events per cell number analyzed. Statistics was performed on the results of the pooled experiments (two-tailed, unpaired Student's t test: P = 0.026). Fig. S7, Fig. S8, and Supplemental materials and methods are available at http://www.jem.org/cgi/content/full/jem.20080521/DC1. To determine the effect of estrogen-induced AID on SHM, we sequenced the VH region of in vitro–cultured Ramos cells, as well as the switch region of γ3 from ex vivo–stimulated splenic B cells. A 3-wk estrogen treatment of Ramos HS13 (41), which possesses a premature stop codon embedded within the AID target motif WRC, showed enhanced surface Ig expression and VH SHM (Fig. 5 B and Fig. S6, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). Because Ramos expresses constitutive amounts of AID, estrogen treatment did not enhance AID production substantially; thus, we used the ex vivo treatment of spleens as a means to detect mutations in the switch region (42). To this end, we treated LPS-activated splenic B cells with 10 nM estrogen for 6 d and sequenced the Sγ3 region. Because AID+/- mice show a haploinsufficiency effect (43, 44), we used F1 spleens from an AID-/- and BALB/c cross for analysis, hypothesizing that the estrogen effect would be more pronounced. As can be seen in Fig. 5 B and Fig. S7, there was a significant (P < 0.02) enhancement of mutation frequency in the switch region after estrogen treatment. Enhanced non-Ig loci targeting of AID It is known that AID can be mistargeted to non-Ig genes (12, 17–19). The effect of aberrant AID targeting can lead to somatic mutations or even translocations of protooncogenes or tumor suppressors, and subsequently to oncogenesis (16). To determine if the activity of hormones via AID can also lead to an alteration in non-Ig loci, we chose to look at the proapoptotic tumor suppressor CD95/Fas. CD95/Fas has been shown to be somatically hypermutated in human B cells, albeit 100–1,000-fold less frequently than the Ig genes (18). We analyzed the effects of estrogen on CD95/Fas mutations in Ramos HS13. As with the physiological B cell maturation events of SHM and CSR, estrogen was able to increase the potentially pathogenic mutation frequency in CD95/Fas (Fig. 5 C and Fig. S7 B). Because of the direct effect on AID function (mutation), this data provides evidence for a novel way in which estrogen can exert a direct genotoxic effect on oncogenes or tumor suppressors. Estrogen enhances c-myc IgH translocations The off-target effect of AID was also detected in recent work on chromosome translocations of the c-myc oncogene into the IgH locus. More importantly, the levels of AID protein directly influenced chromosome translocation frequency. This was determined from both the analysis of AID haploinsufficiency (44) and microRNA regulation of AID protein expression (45, 46). In both cases, reduced levels of AID had a direct bearing on the number of observed c-myc/IgH translocations. Using a previously described assay (16, 47), we were able to determine that estrogen treatment of isolated splenic B cells enhances c-myc/IgH translocations in p53+/- animals (Fig. 5 C and Fig. S8, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). Although, the overall frequency of translocations was low, we have not been able to observe any translocations in LPS/DMSO-treated spleen cells, yet we observed five (two in one experiment and three in another) translocations in the LPS/estrogen treatment. This highlights the importance of regulating AID protein amounts within a cell, and the potential pathogenic consequences of unregulated AID expression. AID induction is not limited to B cells In the past, we demonstrated that AID mRNA expression is not limited to activated B cells, but can also be detected in oocytes (22). We therefore set out to determine if the increase in AID mRNA production by estrogen was also detectable in dissected tissues and various tumor cell lines. Analysis of dissected organs and tissue from mice showed AID responsiveness to hormones outside the immune system in breast and ovarian tissue (Fig. 6). As we were primarily interested in the hormone sensitivity of the transcripts, we determined the relative fold stimulation compared with DMSO, which did not reflect the absolute AID mRNA in each tissue. Interestingly, in ovaries, AID mRNA was induced almost 25-fold, which is higher than in any other organ or tissue. Because isolated oocytes produce AID mRNA (22), we suggest that the increase was predominantly caused by oocytes or other ovarian-derived tissue, rather than infiltrating lymphocytes. Estrogen induces AID and Apobec3 transcription in mouse tissue. The red and blue colors indicate the results for mApobec3 and mAID, respectively. Tissues were treated with DMSO, 1 nM estrogen (E1), or 10 nM estrogen (E10). Gene expression is normalized to the control treatments with DMSO. The tissue expression profiles represent pooled data for the respective tissues from two experiments. Timelines of cell treatments are indicated below the graphs. Absolute values as compared with GAPDH mRNA are shown in Fig. S10, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1. Gross tissue dissection can provide a good indication of possible cell types that can induce AID mRNA upon estrogen treatment, but tissue complexity could also obscure potential targets. Although we did not detect a substantial mRNA increase in hepatocyte or cervix cell lines (Fig. S9, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1), we were able to show a significant increase (ranging from 2.5- to 22-fold) of AID in cell lines, including T cells, placenta, ovary, breast, and prostate. The induction observed in breast cells mimics that of the tissues isolated in Fig. 6 A, and confirms a previous report that AID has been detected in the breast cell line MCF-7 (48). Importantly, AID mRNA was not only present at basal levels but was also significantly up-regulated. Estrogen activates APOBEC3B, 3F, and 3G mRNA transcription AID is the ancestral member of the DNA deaminase family, and the APOBEC3 members are considered to have arisen from AID by gene duplication events (34). Within the DNA deaminase family, APOBEC3 members function predominantly in the cytoplasm and inactivate foreign DNA such as retroviruses and retrotransposable elements (49). We hypothesized that hormonal responsiveness may have been conserved among members of the APOBEC3 family. To this end, we designed qRT-PCR primers to detect APOBEC family member mRNAs in mouse and human cells. APOBEC2, a related member of the DNA deaminases without any apparent catalytic activity (34), was not affected by estrogen treatment; however, we were able to show that estrogen enhanced transcription of mouse Apobec3 from ovaries, spleen, and splenic B cells (Fig. 6). Expression of human APOBEC3B, 3F, and 3G family members was also enhanced upon estrogen treatment from several different cell lines (including those of T cell, ovarian, placental, and cervical origin; Fig. S9). DISCUSSION The physiological panpleiotropic effects of hormones are well documented in many aspects of development, although their effector mechanisms at the molecular level, as well as their pathogenic effects upon cancer and immunity are not well understood (i.e., the gender bias for autoimmunity [29, 31, 50] is as striking as the gender bias for some cancers [51, 52]). The correlation between the expression of hormones or their cognitive receptors and development of pathologies, has led to several different hypotheses; most of which involve hyperactivation of cell proliferation, unregulated differentiation, alteration in DNA repair, or repression of apoptosis (51). Here, we propose another means by which estrogen can be a genotoxin, by directly inducing DNA deaminases such as AID. Our initial experiment using progesterone and estrogen identified AID as being part of a hormonally regulated system. Estrogen exerted its activity through the estrogen–ER complex, directly binding to the AID promoter, whereas our preliminary data suggests that progesterone acts through an estrogen-independent pathway (unpublished data). The consequences of estrogen activation on AID mRNA could also be detected as an increase in AID protein and enhanced downstream physiological effects, such as SHM and CSR. Interestingly, AID activation (mRNA) by estrogen was more pronounced than the effect on SHM. This could indicate that although AID production is necessary for SHM, other factors (e.g., lesion processing, AID targeting, etc) can play a significant role in overall SHM efficiency (53). Similar to SHM, CSR was enhanced upon estrogen treatment, but was lagging behind AID mRNA production. The most dramatic effect from estrogen-induced overexpression of AID was seen at the level of c-myc/IgH translocations. This is analogous to the recent observations that the effect of AID mRNA down-regulation (either by microRNA targeting or haploinsufficiency) was most pronounced at the level of translocations (44–46). Autoimmunity encompasses a broadly defined area of clinical pathologies that stem from abnormalities in numerous systemic, cellular, and molecular mechanisms, a subset of which are B cell–related pathologies (50). In systemic lupus erythematosus, abnormalities in B cell development and the production of autoreactive antibodies play an important pathological role. Overexpression of AID in autoimmune-prone mice induced a more severe systemic lupus erythematosus–like phenotype (10), whereas breeding AID-deficient mice with autoimmune-prone MRL/lpr mice significantly reduced the onset and extent of disease (11), indicating that alterations in AID can change the severity of B cell autoimmunity. Correlating our data with the known effects of estrogen on autoimmunity (28, 30, 50, 54), we propose that the effect of sex-hormones on autoimmunity could partially be through AID transcription and subsequent increase in genome instability. In addition to the direct binding of the estrogen–ER complex to the AID promoter, estrogen may also hyperstimulate AID production through the NF-κB pathway, as we were able to demonstrate that cotreatment of TNF-α and estrogen enhanced NF-κB binding to the AID promoter (Fig. 3 D). Whether this effect is caused by protein–protein interaction or estrogen-induced chromatin modification has yet to be determined, but the synergistic or even cooperative interaction of two important autoimmune modulator pathways on AID expression may have substantially pathogenic effects. There are several hypotheses on how unregulated AID can affect autoimmunity in addition to overstimulation of SHM and CSR, e.g., debilitating mutations in the signaling pathways, of tumor suppressors, or of proapoptotic genes, or alterations that activate oncogenes or antiapoptotic genes (for review see reference [55]). Mutation and subsequent loss of growth control is usually associated with oncogenesis, and this similarity with autoimmunity has previously been noted (55) (e.g., mutations in CD95/Fas have been associated with autoimmunity, as well as B cell lymphomas). Therefore, our data on estrogen-induced mutations in CD95/Fas (Fig. 5 and Fig. S7), derived from increased AID production, may provide a novel molecular mechanism that is important for both pathologies. AID's targeting outside the Ig locus, its apparently crucial involvement with germinal center–derived B cell malignancies (43, 56), and its ability to cause various malignancies when overexpressed in transgenic mice (20, 21, 57) are strong indicators for AID's oncogenic potential. Estrogen is one of the most important and thoroughly studied mitogenic agents in cancer, but it does not possess any direct DNA mutability, and induces transformation by proliferation. In vitro, high concentrations of estrogen derivatives (usually metabolites) can form DNA adducts or produce reactive oxygen-damaging DNA (for review see reference [58–60]), but their role as physiological genotoxins is limited. Furthermore, because the estrogen derivative Tam (61) can inhibit estrogen's oncogenesis in breast cancer, it is unlikely that estrogen (or its derivatives) form DNA adducts under those conditions. It is interesting to note that the antagonistic activity of Tam has recently been used to inhibit some of the pathologies of estrogen-induced autoimmunity (54). On the other hand, because of the pharmacological action of Tam (binding and altering the ER DNA binding capacity), under certain circumstances Tam acts as an estrogen agonist (62). This activity leads to an increased risk in secondary (e.g., ovarian) cancer after Tam treatment for breast cancer (63). Thus, our findings that low concentrations of Tam induced AID mRNA (Fig. S1) also suggest that Tam acts as an agonist and indicates that the proposed usage of synthetic estrogen derivatives as a means to inhibit AID has to be carefully evaluated. Future work on identifying a potential role of AID in mouse models of hormonally induced cancers may provide further evidence on how AID can act as an environmentally stimulated oncogene. As our data indicate, AID's response to estrogen seems to have been evolutionarily conserved among the APOBEC3 family members, different tissues, and cell lines (Fig. 6 and Fig. S8); mouse Apobec3 and human APOBEC3B, -3F, and -3G (and -H to a lesser extent) mRNA were induced by estrogen treatment. In the past, we have hypothesized that the predominant function of AID and its evolved DNA deaminase family members was to inactivate foreign DNA in the cell (8, 34, 64, 65). It is therefore plausible that the observed estrogen response of AID, as well as its expression in oocytes (22), had served a purpose other than targeting AID to Ig genes in B cells. Data indicating that AID has retained some ability to inhibit retroviral elements have substantiated this hypothesis (66). Our work has highlighted that there is a novel pathway by which the nonmutagenic hormone estrogen can mediate genome instability via the activation of AID and other DNA deaminases, in turn possibly altering the predispositions, induction, or severity for cancer, autoimmunity, and viral infectivity. MATERIALS AND METHODS Unless indicated, mouse tissue samples and splenic B cells were derived from 8–12-wk-old female unplugged BALB/c mice, and prepared by standard protocol (see Supplemental materials and methods, available at available at http://www.jem.org/cgi/content/full/jem.20080521/DC1). Human cell lines Jurkat, T47D, Ramos HS13 (Ramos), JAR, and mouse B cells were cultivated in RPMI-1640+GlutaMax medium (Invitrogen); the cell lines MCF7, JAMA2, HeLa, and HepG2 were cultivated in E4 medium; and PC3 was cultivated in HAMS F12 medium. Chicken DT40 cells were maintained in RPMI-1640+GlutaMax with 10% FCS, 1% chicken serum, 50 µM ß-mercaptoethanol, 100 U/ml penicillin, and 100 µg/ml streptomycin at 39°C with 10% CO2. For indicated experiments, cells were treated for 72 h in hormone-depleted serum (Opti-MeM Reduced Serum Medium; Invitrogen) supplemented with Charcoal Stripped Fetal Bovine Serum (Invitrogen); and nonessential amino acids (final concentration; Sigma-Aldrich), as follows: l-Alanine (8.9 µg/ml), l-Asparagine (15.0 µg/ml), l-Aspartic acid (13.3 µg/ml), l-Glutamic acid (14.7 µg/ml), Glycine (7.5 µg/ml), Proline (11.5 µg/ml), and l-Serine (10.5 µg/ml). B cell stimulation was performed using 25 µg/ml LPS (Sigma-Aldrich), 50 ng/ml mouse IL-4 (R&D Systems), 20 ng/ml human TNF-α (R&D Systems), and 2 ng/ml human TGF-ß1 (R&D Systems). Estrogen (17-ß-estradiol; Sigma-Aldrich) and progesterone (Sigma-Aldrich) were dissolved in DMSO at a concentration of 100 mM; this solution was then diluted in DMSO to give 1,000× stock solutions, and final dilutions were made in media (final DMSO concentration was never >0.1%). The final concentrations of estrogen are indicated in the text and figure legends. The concentration of Tam (Sigma-Aldrich) was 50 nM unless otherwise stated. The final concentration for CHX (Sigma-Aldrich) was 10 µg/ml, 20 nM for ACT (Calbiochem), 4 µg/ml for AMA (Calbiochem), and 10 µM for MG-132 (Sigma-Aldrich). qRT-PCR of mRNAs was based on a previous study (67), with modifications (see Supplemental materials and methods). The primers for qRT-PCR (Table S1, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1) were designed using PrimerExpress software. qRT-PCR analysis was performed using the QuantiTect SYBR Green PCR kit (QIAGEN) according to the manufacturer's instructions, Abi 7000 Sequence Detection System (Applied Biosystems), and Abi software. Tissue or cell line hormone responsiveness was determined by qRT-PCR of gene regulated by breast cancer 1, a known estrogen-responsive gene (36). Human AID promoter fragments were analyzed using pE1BLuc plasmid backbone (provided by G. Akusjärvi, Uppsala University, Uppsala, Sweden) containing a luciferase ORF and a minimal promoter region. Primers (Tables S2 and S3, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1) approximately every 500 bp from the human AID transcription start site were used for amplifying AID promoter regions. The obtained PCR products were cloned into pE1BLuc vector, and 3 µg DNA were transfected into SiHa cells using Lipofectamine 2000 (Invitrogen) according to the manufacturer's protocols. 24 h after transfection, cells were treated with the indicated concentrations of hormones or TNF-α for 4 h and analyzed thereafter for luciferase signal using Dual Luciferase Reporter Assay System (Promega) and Glomax luminometer (Promega). The expression vector containing a dominant-negative mutant for IκBα (S32A S36A; a gift from Felix Randow, Medical Research Council-Laboratory of Molecular Biology, Cambridge, England, UK) was cotransfected with pE1BLuc constructs where indicated. 1 µg CMV promoter-driven Renilla luciferase expression vector (Promega) was included in all transfections and used as an internal control for transfection efficiency. Ramos cells were treated for 72 h in hormone-depleted serum before the 4-h hormone treatment. EMSA was performed by standard protocol (68) with minor modifications (for more detail see Supplemental materials and methods). 0.25 pmol of complementary oligonucleotides containing either NF-κB or putative ER binding elements were annealed, labeled, and added to 7 µg of Ramos (treated or untreated) nuclear extract. Reactions were incubated for 20 min at room temperature in the absence or presence of 1-, 3-, and 10-fold mass excess of unlabeled oligonucleotides or antibodies. Samples were electrophoresed at 4°C on 4.5% polyacrylamide gels for 120 min, followed by autoradiography on imaging plates (FujiFilm) and analysis with a FLA-5000 Scanner (FujiFilm). Ramos cells were treated for 72 h in hormone-depleted serum before the 4-h hormone treatment. ChIP procedure was done as previously published, with minor modifications (69) (for further details see Supplemental materials and methods). Approximately 107 cells were fixed with 1% formaldehyde in the culture media for 10 min at 37°C, and then quenched with 0.125 M glycine for 5 min at RT. Cells were pelleted and washed twice with cold PBS. The cell pellet was resuspended in 300 µl SDS lysis buffer and incubated on ice for 10 min. Cell lysates were sonicated for 6 min with a 30-s on/off sonication cycle in 1.5 ml eppendorf tubes. Samples were centrifuged for 5 min at 13,000 rpm at 4°C, and supernatants were diluted to 1.5 ml with ChIP buffer. An aliquot was kept as input measurement. Samples were precleared with 60 µl Protein G beads (ProtG; Roche) and 1 µg salmon sperm DNA (Invitrogen) for 45 min at 4°C. Anti–NF-κB p65 antibody, rabbit anti-ERα HC-20 antibody (Santa Cruz Biotechnology, Inc.), or goat anti–mouse λ control antibody (see Supplemental materials and methods) was added to the supernatant at a concentration of 2 µg/assay and incubated for 12 h at 4°C. 60 µl of salmon sperm DNA (1 µg DNA/20 µl ProtG) was added to the samples and incubated for 1 h while rotating. The ProtG–antibody–protein complex was pelleted and washed with the following buffers: low-salt wash buffer, high-salt wash buffer, LiCl wash buffer, and standard TE buffer (twice). The sample was eluted with 250 µl of elution buffer and incubated for 15 min at room temperature with agitation. The beads were centrifuged and the elution was repeated once. Bound immunocomplexes were then reverse cross-linked with 200 mM NaCl by incubating it at 65°C for 12 h. Proteinase K was added to the sample and incubated for 1 h at 45°C. DNA was then extracted with phenol/chloroform and precipitated, resuspended in 50 µl of water and subjected to qRT-PCR, using CHIP1 and CHIP2 primers for amplifying the -1,189 to -1,039 region of the AID promoter. The surface IgM (sIgM) expression was investigated in Ramos HS13 cells (41), which contain a stop codon in the λ locus, with reversion mutations resulting in sIgM production. IgM stained (anti–human IgM-FITC [Sigma, UK]) and sorted single sIgM-negative cells were grown for ~20 doublings, with fresh hormone containing media added every 48 h. The cells were then stained (anti–human IgM-FITC) for surface expression of IgM and analyzed by flow cytometry. VH and C regions were cloned and sequenced from hormonally treated (20 doublings) Ramos HS13 cells, as previously described (41). The human CD95/Fas locus was PCR amplified and sequenced from isolated genomic DNA, as previously described (18). Mouse Sγ3 switch regions were analyzed from 2 AID+/- spleens and treated with either DMSO alone or 10 nM estrogen for 6 d. AID+/- were derived from breeding AID-/- (gift from T. Honjo, Kyoto University, Kyoto, Japan) with BALB/c mice. Cloning and sequencing were as previously described (42). Sequencing was performed at the LRI sequencing facility. Further details are described in the Supplemental materials and methods. c-myc/IgH translocations were detected by PCR as previously described (47, 70). In brief, DNA was isolated from p53+/- B cells after 72 h of LPS stimulation in the presence or absence of 50 nM estrogen. Two rounds of PCR (for primers see Supplemental materials and methods) were performed using Expand Long Template PCR system (Roche) with primers MycIg1A and primers MycIg1B in the first round, and MycIg2A and MycIg2B in the second round. PCR products were separated on agarose gels, transferred to nylon membranes, and probed with γ-[P32]-ATP–labeled oligonucleotides IgH probe and c-myc probe. P values were calculated using two-tailed unpaired Student's t test. Ig class switching was investigated by detecting switch-circle transcripts in stimulated mouse spleen B cells (40) by qRT-PCR (71). Isolated splenic B-cells were stimulated for up to 72 h with LPS + IL-4 for inducing switching to IgG1 and IgE, LPS + TGF-ß for switching to IgA, and LPS + IFN-γ for switching to IgG3. Hormones were added to the cells together with LPS and cytokines in fresh media. Primers (Tables S4 and S5, available at http://www.jem.org/cgi/content/full/jem.20080521/DC1) that were used for detecting IgG1 (71), IgG3, IgA (40), and IgE (72), have been previously described. Mouse tissue was dissected (from two animals) and passed through cell strainers. Cells were incubated for 6 h in hormone-depleted media before estrogen treatment for 4 h. Total RNA was extracted, cDNA was synthesized, and gene expression was analyzed by qRT-PCR. All experiments were approved by the Cancer Research UK Animal Ethics Committee and the UK Home Office. The Supplemental materials and methods includes details for cells and reagents, qRT-PCR, promoter analysis, EMSA, ChIP, mutation analysis, and c-myc/IgH translocations. It also describes in detail the generation of endogenously tagged AID. Fig. S1 describes the effect of Tam treatment and translation inhibitors on AID expression. Fig. S2 shows the effect of estrogen on AID splicing. Fig. S3 demonstrates the effect of estrogen on NF-kB promoter binding. Fig. S4 provides details on how we generated the DT40 AID knockin allele. Fig. S5 is a schematic of class switching and switch circle formation. Fig. S6 shows the effect of estrogen on Ramos Ig diversification. Fig. S7 provides details on the mutation analysis of IgH and CD95/Fas after estrogen treatment. Fig. S8 provides details on the c-myc/IgH translocation analysis. Fig. S9 shows the effect of estrogen on the expression of DNA deaminase family members in various cell lines. Fig. S10 provides absolute values of qRT-PCR analysis from selected experiments. We also included primer sequences used in this study in five tables (Tables S1-S5). Online supplemental material is available at http://www.jem.org/cgi/content/full/jem.20080521/DC1. A Drosophila Model for EGFR-Ras and PI3K-Dependent Human Glioma Abstract Gliomas, the most common malignant tumors of the nervous system, frequently harbor mutations that activate the epidermal growth factor receptor (EGFR) and phosphatidylinositol-3 kinase (PI3K) signaling pathways. To investigate the genetic basis of this disease, we developed a glioma model in Drosophila. We found that constitutive coactivation of EGFR-Ras and PI3K pathways in Drosophila glia and glial precursors gives rise to neoplastic, invasive glial cells that create transplantable tumor-like growths, mimicking human glioma. Our model represents a robust organotypic and cell-type-specific Drosophila cancer model in which malignant cells are created by mutations in signature genes and pathways thought to be driving forces in a homologous human cancer. Genetic analyses demonstrated that EGFR and PI3K initiate malignant neoplastic transformation via a combinatorial genetic network composed primarily of other pathways commonly mutated or activated in human glioma, including the Tor, Myc, G1 Cyclins-Cdks, and Rb-E2F pathways. This network acts synergistically to coordinately stimulate cell cycle entry and progression, protein translation, and inappropriate cellular growth and migration. In particular, we found that the fly orthologs of CyclinE, Cdc25, and Myc are key rate-limiting genes required for glial neoplasia. Moreover, orthologs of Sin1, Rictor, and Cdk4 are genes required only for abnormal neoplastic glial proliferation but not for glial development. These and other genes within this network may represent important therapeutic targets in human glioma. Author Summary Malignant gliomas, tumors composed of glial cells and their precursors, are the most common and deadly human brain tumors. These tumors infiltrate the brain and proliferate rapidly, properties that render them largely incurable even with current therapies. Mutations in genes within the EGFR-Ras and PI3K signaling pathways are common in malignant gliomas, although how these genes specifically control glial pathogenesis is unclear. To investigate the genetic basis of this disease, we developed a glioma model in the fruit fly, Drosophila melanogaster. We found that constitutive coactivation of the EGFR-Ras and PI3K pathways in Drosophila glia gives rise to highly proliferative and invasive neoplastic cells that create transplantable tumor-like growths, mimicking human glioma. This represents a robust cell-type-specific Drosophila cancer model in which malignant cells are created by mutations in genetic pathways thought to be driving forces in a homologous human cancer. Genetic analyses demonstrated that EGFR-Ras and PI3K induce fly glial neoplasia through activation of a combinatorial genetic network composed, in part, of other genetic pathways also commonly mutated in human glioma. This network acts synergistically to coordinately stimulate cellular proliferation, protein translation, and inappropriate migration. Rate-limiting genes within this network may represent important therapeutic targets in human glioma. Introduction Malignant gliomas, neoplasms of glial cells and their precursors, are the most common tumors of the central nervous system (CNS). These tumors typically proliferate rapidly, diffusely infiltrate the brain, and resist standard chemotherapies, properties that render them largely incurable. One key to developing more effective therapies against these tumors is to understand the genetic and molecular logic underlying gliomagenesis. The most frequent genetic lesions in gliomas include mutation or amplification of the Epidermal Growth Factor Receptor (EGFR) tyrosine kinase. Glioma-associated EGFR mutant forms show constitutive kinase activity that chronically stimulates Ras signaling to drive cellular proliferation and migration [1],[2]. Other common genetic lesions include loss of the lipid phosphatase PTEN, which antagonizes the phosphatidylinositol-3 kinase (PI3K) signaling pathway, and activating mutations in PIK3CA, which encodes the p110α catalytic subunit of PI3K [1],[2]. Gliomas often show constitutively active Akt, a major PI3K effector [1],[2]. However, EGFR-Ras or PI3K mutations alone are not sufficient to transform glial cells, rather multiple mutations that coactivate EGFR-Ras and PI3K-Akt pathways are sufficient to induce glioma [2]–[4]. Understanding the interplay of these mutations and the neurodevelopmental origins of these tumors could lead to new insights into the mechanisms of gliomagenesis. The mammalian brain contains multiple glial cell types that maintain proliferative capacities, including differentiated astrocytes, glial progenitors, and multipotent neural stem cells. EGFR-Ras and PTEN-PI3K signaling regulates many developmental processes in these cell types, particularly proliferation and self-renewal, which are also properties of glioma cells [1]. Although recent hypotheses favor that gliomas arise from multipotent stem cells, data from mouse models demonstrate that differentiated glia, glial progenitors, and stem cells can all produce gliomas in response to genetic lesions found in human gliomas [5],[6]. Thus, misregulation of these genetic pathways may confer unrestricted proliferative capacities to a range of glial cell types, but how this occurs remains unclear. While many of the same effectors are utilized by EGFR-Ras and PI3K in both glial development and cancer, constitutive activation of these pathways may deploy distinct outputs, not utilized in development, that allow particular cells to escape normal physiological cues that restrain proliferation and self-renewal. The identity of such outputs remains unclear. With these issues in mind, we developed a Drosophila glioma model to facilitate genetic analysis of glial pathogenesis. Drosophila offers many tools for precise manipulation of cell-type-specific gene expression and dissection of multigene interactions. Most human genes, including 70% of known disease genes, have functional Drosophila orthologs [7]. Among the most conserved genes are components of major signal transduction pathways, including many gliomagenic genes. Recently, Drosophila has emerged as a model system for human neurological diseases because the CNS shows remarkable evolutionary conservation in cellular composition and neurodevelopmental mechanisms [8]. Similarly, Drosophila have multiple glial cell types that require the EGFR pathway for their normal development, and these cells appear homologous to mammalian glia in terms of function, development, and gene expression [9]. These similarities between flies and humans make Drosophila an attractive system for modeling gliomas. Results Coactivation of EGFR-Ras and PI3K in Drosophila Glia Causes Neoplasia Since concurrent activation of EGFR-Ras and PI3K signaling in glial precursors induces glioma in the mouse [4], we sought to create mutant phenotypes by hyperactivation of these pathways in fly glia and glial precursors. Drosophila has a single functional ortholog each for EGFR(dEGFR), Raf (dRaf), PIK3CA(dp110), PTEN(dPTEN), and Akt(dAkt), and two functional orthologs for Ras(dRas85D, dRas64B) (www.flybase.org). A diagram of the specific mutant forms of dEGFR used in our assays can be found in Figure S1. We performed glial overexpression assays with the Gal4-UAS system [10], using the repo-Gal4 driver, which gives sustained UAS-transgene expression in almost all glia, from embryogenesis through adulthood. For glial-specific RNAi, we employed UAS-dsRNA constructs [11], which we verified with phenotypic tests and/or antibody staining (see Materials and Methods). Glial morphology was visualized with membrane-localized GFP (CD8GFP) [12]. Cell number was determined with staining for Repo, a homeobox transcription factor expressed by repo-Gal4 positive glia [13]. Glial-specific coactivation of EGFR-Ras and PI3K stimulated glial neoplasia, giving rise to CNS enlargement and malformation, neurologic defects, and late larval lethality. repo-Gal4-driven co-overexpression of activated dEGFR (dEGFRλ) and dp110 (dp110CAAX) induced progressive accumulation of ~50-fold excess glia (Figure 1A and 1B) [14],[15]. dEGFR (dEGFRλ) and dp110 (dp110CAAX) induced progressive accumulation of ~50-fold excess glia (Figure 1A and 1B) [14],[15]. dEGFRλ is a constitutively active dEGFR variant in which a lambda dimerization domain replaces the extracellular domain [14] (Figure S1). Co-overexpression of combinations of dEGFRλ and core components of the PI3K pathway, such as dAkt, induced phenotypes similar to repo>dEGFRλ;dp110CAAX, although phenotypes varied somewhat depending on strength of pathway activation and transgene expression (Figure S2 and Table S1). Dramatic glial overgrowth also occurred upon co-overexpression of constitutively active dRas (dRas85DV12) or its effector dRaf (dRafgof) with dp110CAAX, dAkt, or a dPTENdsRNA, which partially knocked-down dPTEN (Figure S2 and Table S1). Finally, glial overgrowth in repo>dEGFRλ;dp110CAAX larvae was strongly suppressed by co-overexpression of dPTEN or more moderately by dominant negative dRas85D (dRas85DN17) (Figure 1D and 1E), indicating that Ras activity and excess phospho-inositols are essential for neoplasia. Coactivation of EGFR-Ras and PI3K in Drosophila glia causes neoplasia. In contrast, glial-specific activation of the EGFR-Ras pathway alone, through overexpression of dEGFRλ or Rafgof, induced 5–10 fold excess glia in the larval brain and later pupal lethality (Figure 1F and Table S1). dRas85DV12 overexpression induced approximately 5–10-fold excess glia, and these glia were smaller than wild-type or dEGFRλ;dp110CAAX glia. (Figure 1G). dRas85DV12 may be more potent than dEGFRλ because dRas85DV12 can activate endogenous PI3K signaling [16]. Overexpression of dEGFRElp, a classical hypermorphic mutant form of dEGFR [17], induced excess glial proliferation and neural morphogenesis defects (Figure S2), but also caused early lethality which precluded examination of dEGFRElp-dp110 interactions. As in mouse models, overexpression of wild-type dEGFR failed to induce excess glia [6],[17], and instead retarded CNS growth (Figure S2). Unlike dEGFRλ, dEGFRWT and dEGFRElp have functional ligand-binding domains (Figure S1), and may cause additional defects by sequestering ligand otherwise required for normal development [17]. Glial-specific activation of the PI3K pathway alone, either by overexpression of dp110CAAX, dp110wild-type, dAkt, or dPTENdsRNA gave viable animals with relatively normal brains (Figure 1H, Figure S2, and Table S1). Therefore, coactivation of the EGFR and PI3K pathways synergize to produce much more severe phenotypes than would be expected if the effects of these pathways were additive. In repo>dEGFRλ;dp110CAAX brains, excess glia emerged in early larval stages and accumulated over 5–7 days. dEGFRλ;dp110CAAX glia severely disrupt the normal cellular architecture of the larval brain (Figure 1A and 1B and Figure 2A–C), lose normal stellate glial morphologies (Figure 2A–C), and generate multilayered aggregations of abnormal glia throughout the brain (Figure 2A–C); in these ways dEGFRλ;dp110CAAX glia are neoplastic [18]. Like neoplastic epithelial cells, dEGFRλ;dp110CAAX glia ectopically expressed an active form of the matrix metalloprotease dMMP1 (Figure S3), which can confer an invasive potential [19],[20], implying that abnormal dEGFRλ;dp110CAAX glia may be invasive within the brain. Unlike neoplastic epithelia, neoplastic neural cells, such as dEGFRλ;dp110CAAX glia, typically retain expression of genes that regulate neural cell fate, such as Repo [21],[22]. Coactivation of EGFR and PI3K cell-autonomously promotes cell cycle entry. Relative to controls, many dEGFRλ;dp110CAAX glia showed BrdU incorporation, which marks S-phase cells (Figure 2D and 2E), indicating that neoplastic glia arise from overproliferation. repo>dEGFRλ;dp110CAAX animals also showed reduced BrdU in neuronal precursors (Figure 2E, data not shown), demonstrating that neoplastic glia disrupt neuronal development. The cell cycle is governed by CyclinD-Cdk4 and CyclinE-Cdk2 complexes, which phosphorylate and inactivate Rb proteins, to release E2F activators to stimulate G1-S-phase entry [23]. Kip-type (p21/p27/p57) and Ink-type cyclin-dependent kinase inhibitors antagonize proliferation by inhibiting CyclinE-Cdk2 and CyclinD-Cdk4, respectively. Cdc25 phosphatases and mitotic cyclins, including CyclinB, promote G2-M progression. Flies have single orthologs each for CyclinE, Cdk2, CyclinD, Cdk4, CyclinB, and p21/p27/p57 (Dap), E2F activators (E2F1) and two orthologs for Rb (Rbf1 and Rbf2) and Cdc25 (Stg and Twe) but no Ink ortholog [23]. dEGFRλ;dp110CAAX glia showed ectopic expression of dCyclinE and dCyclinB (Figure 2F–I), demonstrating that EGFR and PI3K activity upregulated proteins that promote cell cycle entry and progression. High-grade human gliomas contain highly proliferative anaplastic glia and enlarged pleiomorphic polyploid glia [24]. Similarly, repo>dEGFRλ;dp110CAAX larvae showed accumulation of small, highly proliferative glia that strongly expressed cyclins and labeled with BrdU. repo>dEGFRλ;dp110CAAX larvae showed accumulation of small, highly proliferative glia that strongly expressed cyclins and labeled with BrdU. repo>dEGFRλ;dp110CAAX larval brains also displayed abnormal polyploid glia, as assessed by DAPI staining (data not shown), and these cells typically expressed only dCyclinE but not dCyclinB, and thereby likely underwent ectopic DNA replication without mitosis (Figure 2G and 2I, data not shown). However, overexpression of dCyclinE-dCdk2, dCyclinD-dCdk4, or dE2F1-dDp complexes and/or Rbf1 knock-down did not cause neoplasia, and instead either doubled glial cell numbers or resulted in embryonic lethality (Figure S4, data not shown). We next examined negative regulators of the cell cycle. dEGFRλ;dp110CAAX glia expressed Rbf1, but showed little Dap, a result we also observed in wild-type glia (Figure S5). Dap inhibits dCyclinE-cdk2 complexes [25], and is transiently expressed in neural progenitors to promote cell cycle exit as they begin differentiation [26]. Dap overexpression completely suppressed repo>dEGFRλ;dp110CAAX glial overgrowth (Figure 2L), demonstrating that glial neoplasia is cell-autonomous and requires dCyclinE-dCdk2. Similarly, overexpressed Rbf1 and dCyclinE mutations also reduced repo>dEGFRλ;dp110CAAX glial overproliferation (Figure 2M, data not shown). The gross neural morphogenesis defects observed in repo>dEGFRλ;dp110CAAX brains may be secondary to glial overproliferation since these defects were largely prevented by Dap or Rbf1 co-overexpression (Figure 2J–M). In repo>dEGFRλ;dp110CAAX animals, other mutant glia outside of the brain, such as peripheral glia, also became highly proliferative and invasive, and these defects, too, were corrected by Rbf1 or Dap overexpression (data not shown). In controls, Rbf1 or Dap overexpression in wild-type glia inhibited proliferation (Figure S4), reducing numbers of glia by approximately half. Together, these results suggest that repo-Gal4 glia undergo at least one round of cell division, consistent with published studies [27],[28], and this proliferation becomes prolonged by constitutive coactivation of EGFR and PI3K signaling. Coactivation of EGFR and PI3K Does Not Elicit Neoplasia in All Neural Cell Types The phenotype triggered by coactivation of EGFR and PI3K in glia is distinct from other Drosophila brain-overgrowth mutant phenotypes, which involve accretion of excess neurons or neuroblasts [21],[29]. repo>dEGFRλ;dp110CAAX cells were not transformed into neurons or neuroblasts as they lacked expression of the Elav and Miranda markers (Figure S6). Lineage-tracing with a Su(H)-lacZ neuroblast reporter showed that excess dEGFRλ;dp110CAAX glia did not express LacZ, and thus are not directly derived from larval neuroblasts (data not shown). Moreover, constitutive EGFR-Ras and PI3K signaling does not elicit overgrowth in all neural cell types, as assessed with defined cell-type specific Gal4-drivers (Table S2). For example, dRas85DV12 overexpression in fly neurons causes defects in fate specification, patterning, and apoptosis [30],[31]. Co-overexpression of dEGFRλ or dRas85DV12 with dp110CAAX in neurons (elav-Gal4, scratch-Gal4, OK107-Gal4, and Appl-Gal4) and neuroblasts/neuronal precursors (pros-Gal4, wor-Gal4, and 1407-Gal4) did not induce overgrowth (Figure S7, data not shown), even with increased transgene expression from Gal4 amplification [32]. In fact, broad co-overexpression of dEGFRλ and dp110CAAX in neuroblasts and neuronal precursors (pros-Gal4) reduced brain size, perhaps because signaling through these pathways stimulates precocious cell cycle exit of neuronal precursors, as in the developing eye [33]. Furthermore, transient expression of dRas85DV12 or dEGFRλ and dp110CAAX in embryonic glia (gcm-Gal4) also failed to promote glial overgrowth (Figure S7) [27], demonstrating that sustained activation of these pathways is required for glial overproliferation. Certain glial subtypes, such as oligodendrocyte-like neuropil glia (Eaat1-Gal4), some astrocyte-like cortex glia (Nrv2-Gal4), and peripheral perineurial glia (gli-Gal4) also failed to become neoplastic in response to EGFR-Ras and PI3K (Figure S7, data not shown) [34]–[36]. Therefore, neoplastic proliferation is not a uniform cellular response to EGFR and PI3K. Coactivation of EGFR and PI3K Inhibits Cell Cycle Exit in Glia Since repo>dEGFRλ;dp110CAAX animals die in 5–7 days, we assessed the proliferative potential of mutant glia using an abdominal transplant assay, a classic test of tumorigencity in flies [37]. Brain fragments from repo>dEGFRλ;dp110CAAX and wild-type larvae were transplanted into young adults. Wild-type transplants grew and survived over 1–6 weeks, but produced few glia (Figure 3A and 3C). dEGFRλ;dp110CAAX mutant glia survived and proliferated into massive tumors that filled the hosts' abdomens, often causing premature death (Figure 3B). Tumors were composed of small glial cells with little cytoplasm (Figure 3D–F). Tumors also contained trachea embedded throughout their mass (Figure 3D and 3E and Video S1), suggesting that tumors stimulated growth of new trachea or enveloped existing trachea, perhaps in a process akin to tumor angiogenesis. The leading edges of the tumors harbored individual cells invading nearby tissues, such as the ovary (Figure 3F and Video S2), which is consistent with the ectopic expression of active dMMP1 observed in dEGFRλ;dp110CAAX glia in the larval brain (Figure S3). However, some tissues, such as the gut, did not contain metastases, implying some degree of selective invasion. Thus, once unconstrained by the larval life cycle, dEGFRλ;dp110CAAX glia fail to exit the cell cycle, continue to proliferate, and form highly invasive tumors, all properties of human cancer cells. Coactivation of EGFR and PI3K inhibits cell cycle exit in glia. Coactivation of EGFR-Ras and PI3K Creates Invasive Neoplastic Glia To explore the invasive potential of mutant glia, we used FLP-FRT clonal analysis, a technique in which discrete clones of mutant cells are induced in otherwise normal tissues, a situation analogous to somatic tumorigenesis. We used a heat-shock-driven FLP-recombinase to catalyze mitotic recombination between FRT-bearing chromosomes such that a daughter cell (and all of its clonal progeny) initiated expression of GFP and UAS-containing transgenes only in repo-Gal4-expressing cells [12]. Clones were induced late in development, from mitotic founder cells, and were examined in adults. We could not definitively determine if clones were derived from single cell events since our study of these clones was retrospective, but given the frequency of control clone induction, many mutant clones likely originated from single cells. In wild-type controls, we observed clones in 68% of brains examined (N=149). Of the brains with clones, 75% had 1–3 clones, and 83% of these clones consisted of 1–3 cells of the same glial subtype (Figure 4A and 4B). Glial clones overexpressing dRas85DV12, dEGFRλ, or dEGFRElp alone typically contained 2-fold more cells than wild-type (dRas85DV12 shown in Figure 4C). To examine PI3K signaling, we used a dPTEN null allele, which became homozygous in FLP-FRT clones. dPTEN-/- glia did not overgrow, but did show aberrant cytoplasmic projections (Figure 4D), perhaps reflecting dPTEN function in the cytoskeleton [38]. Coactivation of EGFR-Ras and PI3K creates invasive neoplastic glia. To coactivate EGFR-Ras and PI3K in glial clones, dRas85DV12, dEGFRElp, or dEGFRλ was overexpressed within dPTEN-/- glia, using repo-Gal4. We observed overgrown and invasive dRas85DV12;dPTEN-/- and dEGFRElp;dPTEN-/- clones (Figure 4E–G). dEGFR;dPTEN-/- clones were less affected (Figure 4F and 4G), consistent with the larval overexpression of dRas85DV12 giving more severe growth phenotypes. Tumor-like overgrowth was only observed in dEGFR-dRas85D;dPTEN-/- cells, illustrating that chronic EGFR-Ras activation and PTEN loss can cause cell-autonomous over-proliferation. Cells from these double mutant clones appeared to invade the brain, typically following fiber tracts, and sometimes induced the formation of trachea (Figure 4E). Tumor-like growths of dEGFR-dRas85D;dPTEN-/- cells often penetrated deep into the brain, as exemplified by Videos S3 and S4 which show an animated 88 µm thick confocal z-stack of the dRas85DV12;dPTEN-/- clone in Figure 4E compared to a 16 µm thick z-stack of a wild-type control clone in Figure 4A. These phenotypes were reminiscent of invasion and angiogenesis in human gliomas [24]. We more commonly observed smaller dEGFR-dRas85D;dPTEN-/- clones composed of relatively differentiated, enlarged glia with diffusely invasive projections (Figure 4G); these clones likely derive from glia that differentiated prior to achieving sufficient EGFR-Ras transgene expression, and are consistent with findings that not all glial subtypes become neoplastic. Since neoplastic larval glia were concentrated in the outer anterior central brain and developing optic lobe, they may be derived from glial progenitor cells present in these regions [27],[28],[39]. To create clones from a discrete subpopulation of glial progenitors, we used an eyeless(ey)-promoter driven FLP-recombinase (ey-FLP), which is active in ey-expressing glial progenitors in the optic lobe [27]. Single mutant ey-FLP clones of dPTEN-/-, dRas85DV12, or dEGFRElp cells contained a modest number of excess and abnormal glia relative to wild-type controls (Figure 4H–J and Figure S8). In contrast, the double mutant dRas85DV12;dPTEN-/-, dEGFRElp;dPTEN-/-, or dEGFRλ;dPTEN-/- ey-FLP clones, which emerge as approximately tens of cells in 3rd instar larval brains (Figure 4K), became large invasive tumors composed of hundreds to thousands of cells in adults (Figure 4L and Figure S8). EGFR;PI3K Induces Glial Neoplasia via Pnt and Stg Expression To address the function of EGFR-Ras and PI3K in glioma, we analyzed the genetic basis of glial pathogenesis in our repo>dEGFRλ;dp110CAAX model, as this model shows robust neoplasia, similarity to human tumor genotypes, and sensitivity to dEGFR and dPTEN gene dosage (data not shown). EGFR-Ras signaling can promote proliferation through Erk kinase-mediated induction of nuclear targets. In repo>dEGFRλ;dp110CAAX brains, mutant glia showed high levels of nuclear, activated di-phospho-Erk relative to wild-type glia in control brains (Figure 5A and 5B). In flies, Erk activity can induce expression of PntP1, an ETS-family transcription factor encoded by the pointed (pnt) locus [40],[41]. PntP1, which is expressed in embryonic glia and is required for their normal development [40], was upregulated in dEGFRλ;dp110CAAX glia (Figure 5C and 5D). High levels of PntP1 can be detected normally in neuronal progenitors (data not shown), suggesting that it promotes a proliferative progenitor state. In developing eye tissue, EGFR-Ras-Erk signaling induces Pnt proteins to stimulate G2-M cell cycle progression through direct upregulation of Stg (Cdc25 ortholog) expression [41]. Glial-specific RNAi knock down of pnt reduced Stg expression and completely suppressed dEGFRλ;dp110CAAX neoplasia (Figure 5E–H), demonstrating that Pnt proteins are required for both Stg expression and neoplastic overproliferation in dEGFRλ;dp110CAAX glia. Notably, in repo>pntdsRNA;dEGFRλ;dp110CAAX brains, glia maintained their fate, as evidenced by repo-Gal4 and Repo expression. EGFR;PI3K induces glial neoplasia via Pnt and Stg expression. Stg itself was rate limiting for glial neoplasia. Reduction of stg with a mutation or a stgdsRNA partially suppressed the repo>dEGFRλ;dp110CAAX phenotype, whereas overexpressed Stg synergistically enhanced neoplasia (Figure 5E and Figure S9, data not shown). In contrast, Stg overexpression alone increased glial cell numbers approximately 2-fold, and could not induce neoplasia when combined with PI3K effectors (data not shown). Thus, dEGFRλ;dp110CAAX induces neoplasia via coordinated stimulation of G1-S entry through dCyclinE, and G2-M progression through Stg, both of which are EGFR-Ras dependent outputs [41],[42]. EGFR;PI3K Glial Neoplasia Requires Akt and Tor Activation and FoxO Inactivation We sought to determine which PI3K effectors contribute to the repo>dEGFRλ;dp110CAAX phenotype. Genetic reduction of dAkt, a major target of PI3K signaling, with a dAktdsRNA or a mutant allele strongly suppressed repo>dEGFRλ;dp110CAAX glial neoplasia (Figure 6A–C, data not shown). Therefore, Akt is necessary for the outcome of EGFR-PI3K coactivation. Many Akt effectors are implicated in glioma, and we tested orthologs of these loci in our model (Table S3). EGFR;PI3K glial neoplasia requires Akt and Tor activation and FoxO inactivation. Tor, a kinase that promotes cell growth and proliferation, is a key Akt target. In glioma models, coactivation of EGFR-Ras and PI3K stimulates Tor, and in humans, Tor activity is correlated with poor patient prognosis [43],[44]. We tested the single Drosophila Tor ortholog, dTor, by genetically reducing dTor activity in repo>dEGFRλ;dp110CAAX larvae with a viable combination of hypomorphic dTor alleles or by co-overexpression of dominant negative dTor 45,46. Both of these manipulations reduced glial overgrowth (Figure 6D, data not shown). In flies and mammals, Tor exists in two different signaling complexes, TORC1 and TORC2. TORC2, a complex including Tor and the Sin1 and Rictor regulatory proteins, directly phosphorylates Akt, creating a positive feedback loop that fully activates Akt [47]. In mouse, Sin1 and Rictor mutants die early due to extraembryonic defects, but dSin1 and dRictor mutant flies are viable as homozygous nulls, allowing us to remove TORC2 function genetically [47],[48]. dSin1-/-; repo>dEGFRλ;dp110CAAX larval brains showed a near-wild type phenotype (Figure 6E). Results were similar with a dRictor null allele and a dSin1dsRNA (data not shown). dSin1-/- and dRictor-/- mutants display reduced Akt-dependent phosphorylation and inactivation of dFoxO [49], the single Drosophila ortholog of FoxO transcription factors. This suggests that TORC2 loss might antagonize glial neoplasia through dFoxO upregulation. However, excess dFoxOSA, which is resistant to dAkt phosphorylation [45], only partially suppressed repo>dEGFRλ;dp110CAAX glial overproliferation (Figure 6F), arguing that TORC2 has additional roles. Notably, on their own, dSin1-/- and dRictor-/- mutant flies did not show any detectable glial defects (Figure 6G, data not shown). Thus, TORC2 is dispensible for normal glial development, but is necessary for dEGFRλ;dp110CAAX glial neoplasia. TORC1, a complex including Tor and the Raptor regulatory protein, drives cellular growth by stimulating protein synthesis through its effectors S6 kinase and the eIF-4E translation initiation factor [47]. Akt and Erk stimulate TORC1 through phosphorylation and inactivation of the TSC1-TSC2 protein complex, which activates Rheb, and stimulates TORC1 kinase activity. We tested TORC1 function by glial-specific overexpression of dsRNAs for the single Drosophila orthologs of Raptor (dRaptor), S6-kinase (dS6K), and eIF4E (deIF4E); these all significantly reduced accumulation of dEGFRλ;dp110CAAX mutant glial cells, but only caused mild glial hypoplasia in controls (Figure 6I–K and Figure S10). Co-overexpression of d4EBP, a deIF4E antagonist and dFoxO target gene, also blocked glial neoplasia (Figure 6L). Glial-specific RNAi of dTSC1 enhanced repo>dEGFRλ;dp110CAAX glial overgrowth (Figure 6H). However, overexpression of dTSC1dsRNA, dRheb, activated dS6K (dS6Kact), or deIF4E alone did not produce glial overproliferation, even though these constructs can mimic TORC1 activation (Figure S10, data not shown) [47],[49]. Thus, TORC1 activity is necessary for EGFR-PI3K-driven glial neoplasia, but is not sufficient. Moreover, neither deIF4E nor dS6Kact produced glial neoplasia when co-overexpressed with dEGFRλ (data not shown), illustrating that additional dTor-dependent outputs synergize with dEGFR signaling to drive neoplasia. Myc and Max Drive EGFR;PI3K Glial Neoplasia via CyclinD-Cdk4 dTor coordinates increased translation, mediated by dS6K and deIF4E, with expression of cell cycle regulators and ribosomal components, through dMyc, the single Drosophila ortholog of the Myc bHLH transcription factors [49]. Within developing epithelial tissues, dMyc is required for TORC1-dependent growth and can substitute for dTor activity [49],[50]. Myc protein levels can also be posttranslationally upregulated by EGFR-Ras-Raf signaling [16],[51]. Thus, we suspected that dMyc might mediate signal integration between EGFR-Ras and PI3K. dEGFRλ;dp110CAAX glia showed high levels of nuclear dMyc compared to wild-type glia (Figure 7A and 7B). dMyc was also highly expressed in wild-type neuroblasts (Figure 7A), suggesting that dMyc promotes a proliferative progenitor state [21]. Genetic reduction of dMyc with a dsRNA or a single loss-of-function allele strongly suppressed dEGFRλ;dp110CAAX glial neoplasia (Figure 7D, data not shown). In fact, some dMyc+/-; repo>dEGFRλ;dp110CAAX animals were rescued to viability, indicating that dMyc is an essential rate-limiting output of EGFR-PI3K coactivation. Myc and Max are required to drive EGFR;PI3K glial neoplasia via CyclinD-Cdk4. Myc proteins activate transcription through heterodimerization with the bHLH Max. Max activity was also required for glial neoplasia; a dsRNA for dMax, the single Drosophila Max ortholog, strongly suppressed repo>dEGFRλ;dp110CAAX overgrowth (Figure 7E). dMyc-dMax heterodimers promote proliferation by activating expression of multiple cell cycle genes, including dCyclinD and dCdk4 [52]. dCyclinD expression, which is high in repo>dEGFRλ;dp110CAAX glia, was inhibited by a dMycdsRNA (Figure 7F–H), suggesting that dMyc reduction suppresses glial neoplasia through reduced dCyclinD-dCdk4 activity. To test this we used loss-of-function mutations in dCdk4, which are viable [23]. dCdk4-/-; repo>dEGFRλ;dp110CAAX larvae showed near complete absence of excess glia and rescue of glial morphogenesis defects (Figure 7I). Moreover, a dCdk4dsRNA and dCyclinDdsRNA suppressed repo>dEGFRλ;dp110CAAX (data not shown). Thus, dCyclinD-dCdk4 is essential for EGFR-PI3K glial neoplasia, although dCdk4 is not required for development (Figure 7J). Myc Overexpression or Rb Loss Synergizes with EGFR dMyc is necessary for glial neoplasia, but is not sufficient when overexpressed alone (Figure 8C). dMyc-overexpressing glia showed polyploidy (Figure 8C, data not shown), indicating that these cells undergo DNA replication without mitosis, but require additional signals for cell cycle progression. In contrast, co-overexpression of dMyc with dEGFRλ produced a phenotype on par with that of repo>dEGFRλ;dp110CAAX (Figure 8D), indicating that dMyc overexpression can substitute for PI3K activation and promote neoplasia when combined with EGFR signaling. Myc overexpression or Rb loss synergizes with EGFR. Given that the dMyc targets dCyclinD-dCdk4 are required for dEGFRλ;dp110CAAX neoplasia, we tested whether dCyclinD-dCdk4 overexpression could cooperate with dEGFRλ. Additionally, we tested loss of Rbf1, the only known dCdk4 substrate that controls proliferation [23]. repo>dCyclinD;dCdk4;dEGFRλ and repo>Rbf1dsRNA;dEGFRλ animals showed glial overgrowth, but did not accumulate as many cells as repo>dEGFRλ;dp110CAAX animals (Figure 8E and 8F). Thus, glia likely require additional dp110 or dMyc effectors to undergo full neoplastic proliferation. Other known PI3K and dMyc-dMax target genes that promote proliferation include ribosomal proteins and translation regulators [49],[52], such as eIF4E, which is highly expressed in dEGFRλ;dp110CAAX glia and required for neoplasia (Figure 6 and Figure S11). Our data imply that PI3K, dMyc, and dCyclinD-dCdk4 exist in a linear system, in which Rbf1 inactivation by dCyclinD-dCdk4 is one direct output of dp110CAAX or dMyc. However, in high-grade glioma, Rb loss co-occurs with EGFR and PTEN mutations [2], implying that these mutations cooperate to promote gliomagenesis. To explore interactions between Rb and EGFR-PI3K, we created the triple mutant repo>Rbf1dsRNA;dEGFRλ;dp110CAAX. These animals displayed exacerbated glial neoplasia, with a substantial increase in small anaplastic-like glia throughout the CNS (Figure 8G). This synergistic interaction likely derives from derepression of dE2F1 upon Rbf1 loss, and concomitant increased expression of dE2F1 target genes, including Stg and dCyclinE [53]. Increased dCyclinE and Stg expression may accelerate cell cycle progression, perhaps through increased dCdk2 and dCdk1 activity and/or truncated G1 and G2 gap phases caused by constant dCyclinE and Stg protein levels [23]. Consistent with this, we observed increased dCyclinE expression in Rbf1dsRNA;dEGFRλ;dp110CAAX glia relative to dEGFRλ;dp110CAAX glia (Figure 8H and 8I), and co-overexpression of Stg or dCyclinE-dCdk2 with dEGFRλ;dp110CAAX synergistically exacerbated glial neoplasia, yielding phenotypes similar to repo>RbfdsRNA;dEGFRλ;dp110CAAX (Figure 5E and Figure S9, data not shown). To assess dCdk2 activity in repo>Rbf1dsRNA;dEGFRλ;dp110CAAX brains compared to repo>dEGFRλ;dp110CAAX brains, we stained for phospho-MPM2 (Figure S12), which detects nuclear foci in cells with active dCyclinE-dCdk2 complexes [54]. Phospho-MPM2 foci were present in glia of both genotypes, although repo>Rbf1dsRNA;dEGFRλ;dp110CAAX brains appeared to have a higher density of glia with phospho-MPM2 foci (Figure S12), suggesting that expanded expression of dCyclinE results in broader activation of dCdk2. Thus, while PI3K and Rbf1 act in a common genetic pathway linked by dCyclinD-dCdk4, Rbf1 loss nevertheless synergizes with mitogenic stimulation from combined EGFR and PI3K signaling, and this synergy emerges from increased expression of dCyclinE and Stg, rate-limiting regulators of the cell cycle. Discussion We show that constitutive coactivation of EGFR-Ras and PI3K signaling in Drosophila glia and glial precursors gives rise to neoplastic, invasive cells that create transplantable tumor-like growths, mimicking human glioma, and mirroring mouse glioma models. This represents a robust organotypic and cell-type specific Drosophila cancer model in which malignant cells are created by mutations in the signature genes and pathways thought to be driving forces in a homologous human cancer. This was not necessarily an expected result since fly and human glia show many biological differences despite displaying important similarities [9],[55],[56]. Through genetic analysis of our model, we identified crucial downstream effectors of EGFR and PI3K signaling, many of which are mutated and/or activated in human glioma. These effectors act in a combinatorial network to coordinately stimulate cell cycle entry and progression, block cell cycle exit, and promote inappropriate cellular growth and migration (Figure 8J). Pathways within this network, while interdependent, act synergistically, rather than additively. Thus, Drosophila shows evolutionary conservation of oncogene cooperation. At least four pathway circuits are necessary for glial neoplasia initiated by EGFR and PI3K signaling, including dRas and dMyc circuits, which induce dCyclinE and dCyclinD to drive cell cycle entry, a Pnt circuit, which induces Stg to promote cell cycle progression, and a Tor-eIF4E-S6K pathway, which provides protein translation necessary for proliferation and growth (Figure 8J). When activated individually, these pathways fail to elicit glial neoplasia, implying a requirement for coordinated stimulation of multiple effectors and inactivation of negative regulators. Orthologs for many of the genes within these pathways, such as dRictor, are implicated in human glioma, although specific roles for some, such as ETS transcription factors, have not been defined despite their expression in glioma [2],[57],[58]. While many of these genes are known EGFR and PI3K pathway components, we did not necessarily expect them to be required for EGFR and PI3K dependent glial neoplasia. Indeed, we have tested many other pathway components and outputs, such as Jun kinase, that did not significantly suppress repo>dEGFRλ;dp110CAAX phenotypes upon reduced function (unpublished data). Coactivation of EGFR and PI3K signaling upregulates dMyc, which is necessary for glial neoplasia. This is consistent with findings that, in flies and mammals, EGFR-Ras, PI3K, and Tor signaling upregulate Myc protein levels [16],[49],[51],[59],[60]. Myc oncogenes are well-known to cooperate with RTK-Ras signaling to drive neoplastic transformation [51], and we demonstrate that this property of Myc is conserved in flies. We also observed sensitivity to reduced Myc gene dosage in our glioma model, which has also been recently documented in a mouse model of PTEN-dependent glioma [61]. c-myc is commonly amplified in gliomas [62], implying that Myc is rate limiting, and c-myc amplification may be selected for this reason. D-cyclins, established Myc target genes, and Cdk4 are also commonly amplified and/or overexpressed in gliomas [1],[51]. We observed dMyc-dependent dCyclinD overexpression, and a requirement for dCyclinD-dCdk4 in repo>dEGFRλ;dp110CAAX neoplasia, although dCdk4 itself is not required for normal glial proliferation. Together with our analysis of TORC2, this illustrates that oncogenic EGFR-PI3K co-opts effectors that do not control normal glial development. Similarly, cdk4-/- mutant mice show normal proliferation in many tissues, but are resistant to ErbB-2-driven breast cancers [63],[64]. Our data argue that Cdk4 activity is a key tumor-specific rate-limiting output of EGFR and PI3K signaling in glioma as well. In contrast to glia, coactivation of EGFR-Ras and PI3K in neuroblasts, which are fly neural stem cells, does not promote unchecked proliferation, despite the fact that neuroblasts express dMyc and are capable of undergoing neoplastic transformation in response to other genetic mutations [21]. Thus, in Drosophila, neither a neural stem cell fate nor Myc activity confer competence to undergo EGFR-PI3K neoplastic transformation. Rather, our results suggest that neoplastic cells arise from committed glial progenitors: dEGFR-dRas85D;dPTEN-/- clones derived from progenitor cells produce large tumors, and anaplastic cells in repo>dEGFRλ;dp110CAAX brains are concentrated in regions enriched for glial progenitors. Notably, regulated developmental signaling through the EGFR pathway promotes proliferation of normal Repo-expressing glial progenitors [27], and our results show that constitutive EGFR and PI3K signaling prolongs this proliferative progenitor state. Further studies of Drosophila glial progenitors and glioma-like cells may illuminate the cellular origins of human gliomas, which are thought to arise from progenitor-like glial cells. Moreover, our results argue that cell-type specific factors govern glial neoplasia. One such factor may be Dap, the single p21/p27 ortholog, which is normally expressed in only 5% of all glia (Figure S4). Perhaps glial progenitors do not express Dap, whereas neuronal progenitors do [26], and this underlies susceptibility to transformation by EGFR-Ras and PI3K. Dap is highly regulated in a cell-type specific manner [26], and studies of Dap regulation in glia may further illuminate the genetic origins of glioma, especially given that lack of p21 expression may underlie the tumorigenic response of mammalian glial progenitors to constitutively active EGFR [65]. While EGFR-Ras and PI3K are commonly upregulated in gliomas and experimental models demonstrate that these pathways are required for tumorigenesis, therapies that target EGFR and PI3K signaling have proven disappointing. This discrepancy between clinical and experimental data has many possible explanations. For example, recent studies have demonstrated that EGFR inhibitors are attenuated by particular mutations found in glioma cells, such as PTEN loss or RTK co-amplification [2]. Addressing these and other possibilities remains a challenge that dictates a need for new experimental models. The results presented here establish Drosophila as a viable model system for the study of glioma, offering a complex organismal system for rapidly identifying and evaluating therapeutic targets using genetic approaches. Such a system may be especially useful for distinguishing those genetic mutations and pathways that drive tumorigenesis from the large number of genes that show mutations and altered expression in glioblastomas uncovered by recent genomic analyses of patient samples [66],[67]. Our studies have already identified key rate-limiting genes, such as dCyclinE, Stg, and dMyc, and genes only required for abnormal neoplastic glial proliferation, such as dSin1, dRictor, and dCdk4, which may represent important therapeutic targets in human gliomas. Materials and Methods Fly Stocks, Genetics, and Culture Conditions Flies were cultured at 25°C. All genotypes were established by standard genetics. To assess larval brain overgrowth phenotypes, embryos were collected for 6–24 hrs, grown for 120–140 hrs, and wandering 3rd instar larvae were selected for dissection. Stocks were obtained from the Bloomington Stock Center unless otherwise noted. Other than UAS-PTENdsRNA lines from Bloomington, all UAS-dsRNA lines were obtained from the VDRC stock center [11]. The following stocks were obtained from other investigators: UAS-dEGFRλ ( T. Schubach), UAS-dEGFRElp, UAS-dEGFRwild-type (N. Baker), UAS-dPTEN, FRT40A dPTEN2L117, UAS-dFoxOSA (S. Oldham), UAS-dap (I. Hariharan), UAS-dp110wild-type, UAS-dCycD, UAS-dCdk4, UAS-dMyc, dMyc4 (B. Edgar), UAS-Rbf1 (N. Dyson), appl-Gal4 (K. Finley), pros-Gal4 (B. Ohlstein), wor-Gal4 (C. Doe), gcm-Gal4 (V. Hartenstein), dTor2L7, dTorl(2)k17004 (R. Bodmer), dRictorΔ2 (S. Cohen), and stgCB03726 (A. Spradling) UAS-dsRNA Validation UAS-dMycdsRNA, UAS-TSC1dsRNA lines were validated in prior publications [48]. UAS-dsRNA lines were crossed to actin-Gal4, ey-Gal4, and GMR-Gal4 to assess phenotypes. Lines that showed phenotypes inconsistent with known phenotypes for their target genes were excluded from analysis. Gene knock-down in repo-Gal4 glia was verified with immunohistochemical stains for the following constructs: UAS-dMycdsRNA, UAS-dAktdsRNA, UAS-dS6KdsRNA, UAS-eIF4EdsRNA, UAS-Rbf1dsRNA, and UAS-pntdsRNA (Figure S11). Abdominal Transplants Larval brains were dissected into sterile PBS, washed, and cut into fragments. Abdominal incisions were made in virgin female hosts and single brain fragments were inserted. Hosts were cultured for 1–6 weeks, dissected and fixed in 4% paraformaldehyde, incubated in 10% sucrose and embedded in O.C.T. Thick 50 µm sections were stained as described below. Clonal Analysis For hs-FLP clones, genotypes are indicated in figure legends. Flies were initially grown at 18°C or 20°C to minimize spontaneous clones, which occurred at a low frequency during late larval-pupal stages. 3rd instar larvae, 0–48 hr pupae, or 0–2 day old young adults were treated with heat shock to induce clones and subsequently cultured at 25°C for 1–4 weeks. For ey-FLP clones, flies were cultured at 25°C continuously. Immunohistochemistry and Imaging Larval tissue was fixed for 30–50 minutes in 1×PBS 4% paraformaldehyde. Adult brains were fixed for 1–2 hr in 1×PBS 4% paraformaldehyde or in PLP with 2% paraformaldehyde. For BrdU labeling, larvae were cultured in food with 1 mg/ml BrdU for 4–6 hrs, and fixed larval brains were treated with 2 N HCl for 30 minutes followed by DNase for 1 hr. Stains were performed in 1×PBS 10% BSA with 0.3% Triton-X100 for larval brains and 0.5% Triton-X100 for adult samples. The following antibodies were obtained from the Developmental Studies Hybridoma Bank and diluted 15–110: 8D12 anti-Repo, anti-dMMP1, anti-dCyclinB, anti-Elav, and 40-1a anti-lacZ. Larval and/or adult brains were also stained with rabbit anti-Repo (G. Technau, 1500), rat anti-dCyclinE (H. Richardson, 1100), anti-BrdU (BD, 1100), rat anti-Miranda (C. Doe, 1100), mouse anti-diphospho-Erk (Sigma, 1200), mouse anti-Rbf1 (N. Dyson, 15), mouse anti-Dap (I. Hariharan, 110), rabbit anti-PntP1 (J. Skeath, 1500), rabbit anti-eIF4E (P. Lasko, 1100), rabbit anti-dMyc (D. Stein, 11000), and anti-phospho-MPM2 (Upstate Biotechnology, 1200). Anti-HRP-Cy5 and anti-HRP-Cy3 (Jackson Labs) were used at 1250–1500. Secondary antibodies were conjugated to Cy3 (Jackson Labs) or Alexa-488 or Alexa-647 (Molecular Probes). Actin was visualized with Rhodamine-labeled phalloidin (Invitrogen). Brains were imaged as whole mounts on a Zeiss LSM 510 confocal system. Images were analyzed in Zeiss LSM Image Browser and processed in Photoshop CS3. For experiments in which protein levels were compared between genotypes, all sample preparation, histochemistry, imaging, and image processing was performed in parallel in the same manner. Supporting Information Figure S1 Diagram of Drosophila EGFR mutant forms. Proteins are shown as horizontal bars along which functional domains are indicated, and the locations of alterations in mutant forms are noted. Wild-type human EGFR and Drosophila EGFR (dEGFR) show extensive conservation, with 55% homology in the kinase domain and 41% homology in the ligand-binding portion of the extracellular domain, which includes extensive cysteine repeats. The signal peptide is labeled in yellow. Following the signal peptide, the entire cytoplasmic domain is replaced with the lambda dimerization domain in dEGFRλ, which causes constitutive activation. Both human EGFR and dEGFR show extensive cysteine repeats in the extracellular domain, indicated in blue. The tyrosine kinase domain is labeled in magenta. The Elp mutant form of dEGFR contains an A887T substitution in the N-lobe of the kinase domain, which causes constitutive activation. (0.23 MB TIF) Figure S2 Coactivation of Ras-Raf and PI3K in Drosophila glia causes neoplasia. 2 µm optical sections of representative larval brain hemispheres from wandering 3rd instar larvae (A–H) and an early 2nd instar larval brain (I), all displayed at the same scale. 20 µm scale bars. Frontal sections; midway through brains. Anterior up; midline to left. Glial cell nuclei are labeled with Repo (red). CD8GFP (green), driven by the repo-Gal4 driver, labels glial cell bodies and membranes. An anti-HRP counterstain (blue) reveals neuropil (neuronal fiber tracts) at high intensity and some cell bodies of neurons and neuronal precursors at low intensity, and this varies slightly according to exact plane of section and mutant phenotype. repo>dEGFRλ;dp110CAAX (A), repo>dRas85DV12;dPTENdsRNA (B), and repo>dRafgof;dp110CAAX (C) brains show increased numbers of glia relative to wild-type (D), repo>dRas85DV12 alone (E), or repo>dPTENdsRNA alone (F). Compared to wild-type (D), repo>dEGFRwild-type (H) brains show reduced neurons (HRP, low intensity blue), which renders glia more densely packed and the entire brain smaller than normal. However, repo>dEGFRwild-type brains (H) do not show a substantial increase the number of glia compared to wild-type (D). Brains of repo>dEGFRElp 2nd instar larvae (I) show substantial neuron loss, brain malformation, and excess glia for that stage of development. Genotypes: (A) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/+ (B) UAS-CD8GFP/+; repo-Gal4/UAS-dRas85DV12 UAS-dPTENdsRNA (C) UAS-dp110CAAX; repo-Gal4 UAS-CD8GFP/UAS-dRafgof (D) repo-Gal4 UAS-CD8GFP/+ (E) repo-Gal4 UAS-CD8GFP/UAS-dRas85DV12 (F) UAS-CD8GFP/+; repo-Gal4/UAS-dPTENdsRNA (G) UAS-dEGFRλ/+; repo-Gal4 UAS-CD8GFP/+ (H) UAS-CD8GFP/+; repo-Gal4/UAS-dEGFRwild-type (I) UAS-CD8GFP/+; repo-Gal4/UAS-dEGFRElp. (10.31 MB TIF) Figure S3 dMMP1 expression in wild-type and dEGFRλ;dp110CAAX glia. (A,B) 3rd instar larval brains. Frontal sections, showing medial regions enriched for proliferating glia. Anterior up; midline to left. 4.5 µm optical projections, matched in scale. 20 µm scale bars. Expression of the active form of dMMP1 (red) in wild-type (A) and repo>dEGFRλ;dp110CAAX brains (B), shown alone (left panels) and overlaid with an HRP (blue) neuronal label and a CD8GFP (green) glial label (right panels). In wild-type brains, glia (‘G’) rarely express active dMMP1, although some glia on the surface of the brain show low levels of dMMP1 staining (arrow). In repo>dEGFRλ;dp110CAAX brains (B), some neoplastic glia (‘G’) express active dMMP1 (red in right panel, yellow in overlay), which is largely membrane-localized in individual cells (arrow). Genotypes: (A) UAS-CD8GFP/+; repo-Gal4/+ (B) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+. (3.93 MB TIF) Figure S4 Glial-specific Rbf1 knock-down or overexpression of G1 Cyclin-Cdks, Dap, and Rbf1 affects glial proliferation. (A–F) 2 µm optical sections of larval brain hemispheres from wandering 3rd instar larvae displayed at the same scale. 20 µm scale bars. Frontal sections; midway through brains. Anterior up; midline to left. Glial cell nuclei are labeled with Repo (red). Glial cell bodies and membranes are labeled with CD8GFP (green) driven by repo-Gal4. HRP counter-staining (blue) reveals neuropil at high intensity and some cell bodies of neurons and neuronal precursors at low intensity. Rbf1 knock-down in repo>Rbf1dsRNA (B) does not significantly alter glial cell numbers (red nuclei) compared to wild-type (A). Ectopic expression of dCyclinE-dCdk2 (C) or dCyclinD-dCdk4 in repo-Gal4-glia promotes an approximate doubling of glial cells numbers (red nuclei), even when combined with Rbf1dsRNA (D). Continuous expression of Dap (E) or Rbf1 (F) in otherwise wild-type repo-Gal4-glia substantially reduces glial cell numbers, showing that repo-Gal4 glia normally undergo proliferation controlled by dCyclinE-dCdk2 and Rbf1-E2F1. Genotypes: (A) UAS-CD8GFP/+; repo-Gal4/+ (B) UAS-CD8GFP/+; repo-Gal4/UAS-Rbf1dsRNA (C) UAS-CD8GFP/+; repo-Gal4/UAS-dCyclinE UAS-dCdk2 (D) UAS-CD8GFP/UAS-dCyclinD UAS-dCdk4; repo-Gal4/UAS-Rbf1dsRNA (E) UAS-dap/+; UAS-CD8GFP/+; repo-Gal4/+ (F) UAS-CD8GFP/+; repo-Gal4/UAS-Rbf1. (8.27 MB TIF) Figure S5 Dap and Rbf1 expression in wild-type and dEGFRλ;dp110CAAX glia. 3rd instar larval brains. Frontal sections, showing superficial dorsal regions enriched for Dap- and Rbf1-expressing cells. Anterior up; midline to left. 7 µm optical projections, all matched in scale. 20 µm scale bars. (A,B) Dap expression (red) in wild-type (A) and repo>dEGFRλ;dp110CAAX brains (B), shown alone (left panels) and overlaid with Repo (blue) and repo>CD8GFP (green) glial markers (right panels). In both genotypes, Dap is primarily expressed in neuroblasts (‘NB’) and ganglion mother cell neuronal precursors (‘NP’), rarely in glia (‘G’), and almost never in neurons (‘N’). In repo>dEGFRλ;dp110CAAX brains, Dap is rarely expressed in glia as seen by the lack of substantial overlap between Dap expression and glial markers (B, right panel). White arrows denote rare Dap-positive glia in both wild-type and repo>dEGFRλ;dp110CAAX brains; these glia show lower levels of Dap protein than neighboring neuroblasts and neuronal precursors. Dap-positive glia were counted in 3 repo>dEGFRλ;dp110CAAX brains and an average of 5% of all glia expressed Dap (87 glia expressed Dap out of 1738 total glia counted). (C–D) Rbf1 expression (red) in wild-type (E) and repo>dEGFRλ;dp110CAAX brains (F), shown alone (left) and overlaid (right) with the CD8GFP (green) glial marker and HRP (blue) neuronal marker. In both genotypes, Rbf1 is highly expressed in glia (‘G’) and neuroblasts (‘NB’), which were identified by their characteristic positions and large cell bodies. Neurons (‘N’) show lower expression. Genotypes: (A,C) UAS-CD8GFP/+; repo-Gal4/+ (B,D) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+. (7.96 MB TIF) Figure S6 dEGFRλ; dp110CAAX glia do not express neuronal and neuroblast markers. Late 3rd instar larval brains. Medial anterior regions shown. Anterior up; midline to left. 2 µm optical sections. CD8GFP (green), driven by the repo-Gal4 driver, labels glial cell bodies and membranes. An HRP counter-stain (blue) reveals neuropil at high intensity and cell bodies of neurons and some neuronal precursors at low intensity. (A,B) Repo expression (red) in glial cell nuclei in wild-type (A) and repo>dEGFRλ;dp110CAAX brains (B). In wild-type brains, neuronal fibers and neurons (blue) are enveloped by glial processes (green) to give the brain a honeycombed appearance. In repo>dEGFRλ;dp110CAAX brains, neurons (‘N’) lack GFP or Repo expression, in contrast to glia (‘G’). (C–D) Elav expression (red) in wild-type (C) and repo>dEGFRλ;dp110CAAX brains (D), shown alone (left panels) and overlaid (right panels) with the CD8GFP (green) glial marker and HRP (blue) neuronal marker. In both genotypes, Elav is highly expressed in neurons (‘N’) but absent from glia (green cell bodies indicated by white arrows). (E–F) Miranda expression (red) in wild-type (E) and repo>dEGFRλ;dp110CAAX (F) brains, shown alone (left panels) and overlaid (right panels) with CD8GFP (green) glial marker and HRP (blue) neuronal marker. In both genotypes, Miranda is highly expressed in neuroblasts (‘NB’), which are identified by their characteristic positions and large cell bodies, but is absent from glia (green cell bodies, indicated by white arrows). Genotypes: (A,C,E) UAS-CD8GFP/+; repo-Gal4/+ (B,D,F) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+. (9.24 MB TIF) Figure S7 dEGFRλ and dp110CAAX do not induce neoplasia from neurons, neuroblasts, or certain glia. Late 3rd instar larval brain hemispheres. Medial anterior regions shown. Anterior up; midline to left. 3.5 µm optical projections. Repo expression (red) in glial cell nuclei in all genotypes. An HRP counter-stain (blue) reveals neuropil at high intensity and cell bodies of neurons and some neuronal precursors at low intensity, and this varies slightly according to variance in exact section plane. (A–D) Wild-type (A), dEGFRλ; dp110CAAX overexpressed using the 1407-Gal4 neuroblast driver (B), dEGFRλ; dp110CAAX overexpressed using the pros-Gal4 neural driver (C), and dEGFRλ;dp110CAAX overexpressed using the gcm-Gal4 embryonic glial driver (D). In all cases, dEGFRλ;dp110CAAX overexpression did not induce neoplastic overgrowth of glial or neuronal cell types, as determined by brain size and cell-type specific stains. (E,F) Cytoplasmic GFP (green), driven by the Nrv2-Gal4 driver, labels cell bodies and cytoplasmic processes, as seen in wild-type (E). dEGFRλ;dp110CAAX overexpressed using the Nrv2-Gal4 glial driver (B). Nrv2-Gal4 is expressed by post-mitotic cortex glia, which proliferate somewhat in response to dEGFRλ;dp110CAAX to cause slight brain enlargement, but do not become neoplastic like repo>dEGFRλ;dp110CAAX glia. Genotypes: (A) +/CyO (B) UAS-dEGFRλ UAS-dp110CAAX/+; 1407-Gal4/UAS-Gal4 (C) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-Gal4/+; pros-Gal4/+ (D) UAS-dEGFRλ UAS-dp110CAAX/+; gcm-Gal4/+ (E) Nrv2-Gal4 UAS-GFP/+; Nrv2-Gal4 UAS-GFP/+ (F) UAS-dEGFRλ UAS-dp110CAAX/+; Nrv2-Gal4 UAS-GFP/UAS-Gal4; Nrv2-Gal4 UAS-GFP/+. (7.96 MB TIF) Figure S8 Coactivation of EGFR and PI3K in glial progenitors creates invasive neoplastic glia. (A–E) FLP/FRT clones in adult brains derived from a population of ey-FLP and repo-Gal4-expressing cells. CD8GFP (green) marks cell bodies and membranes of glial clones derived by FLP/FRT mitotic recombination (see text). Repo (red) marks all glial cell nuclei, in both clones and surrounding normal tissue. 8.5 µm confocal optical sections through brains of similarly aged adults, all matched to scale. 20 µm scale bars. Each panel shows half brains, including a whole optic lobe and adjacent central brain. dEGFRElp (C) and dPTEN-/- (J) clones are composed of 2–5-fold more cells than wild-type controls (H). In contrast, dEGFRλ;dPTEN-/- (D) and dEGFRElp;PTEN-/- double mutant clones form large tumors visible in adult brains. As with hs-FLP clones, dEGFRλ;dPTEN-/- and dEGFRElp;dPTEN-/- clones are less cellular and more invasive than dRas85DV12;PTEN-/- clones (see Figure 4). Genotypes: (A) ey-flp/+; FRT40A tubGal80/FRT,40A; repo-Gal4 UAS-CD8GFP/+ (B) ey-flp/+; FRT40A tubGal80/FRT40A PTEN2L117; repo-Gal4 UAS-CD8GFP/+ (C) ey-flp/+; FRTG13 tubGal80/FRTG13 UAS-CD8GFP; repo-Gal4/UAS-dEGFRElp (D) ey-flp/UAS- dEGFRλ; FRT40A tubGal80/FRT40A PTEN2L117; repo-Gal4 UAS-CD8GFP/UAS- dEGFRλ (E) ey-flp/+; FRT40A tubGal80/FRT40A PTEN2L117; repo-Gal4 UAS-CD8GFP/UAS-dEGFRElp. (7.84 MB TIF) Figure S9 Stg overexpression exacerbates EGFR-PI3K glial neoplasia. 2 µm optical sections of larval brain hemispheres from late 3rd instar larvae, displayed at the same scale. 20 µm scale bars. Frontal sections, midway through brains. Anterior up; midline to left. Glial cell nuclei are labeled with Repo (red); glial cell bodies and membranes are labeled with CD8GFP (green) driven by repo-Gal4. An HRP counter-stain (blue) reveals neuropil at high intensity and neuronal cell bodies at low intensity. HRP stains varied between the two samples according to effects of mutant glia and slight variance in section plane. Stg co-overexpression with dEGFRλ;dp110CAAX (B) enhances neoplasia compared to dEGFRλ;dp110CAAX (A), leading to a dramatic increase in aberrant glia and yielding a phenotype similar to that of repo>Rbf1dsRNA;dEGFRλ;dp110CAAX (see Figure 8G). Genotypes: (A) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/+ (B) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-stg/+; repo-Gal4 UAS-CD8GFP/+. (6.27 MB TIF) Figure S10 Knock-down and/or overexpression of dTor and dTor effectors. 2 µm optical sections of larval brain hemispheres from wandering 3rd instar larvae, approximately 130hr AED, all displayed at the same scale. 20 µm scale bars. Anterior up; midline to the left. Frontal sections; midway through brains. Repo (red) marks glial cell nuclei; CD8GFP (green), driven by repo-Gal4, labels glial cell bodies and membranes. An HRP-counter stain (blue) reveals neuropil (neuronal fiber tracts) at high intensity and some cell bodies of neurons and neuronal precursors at low intensity, and this varies slightly according to exact plane of section, brain orientation, and mutant phenotype. repo>dAktdsRNA brains (B) are smaller and have fewer glia than wild-type controls (A). All other genotypes have relatively normal sized brains. By gross examination, repo>TorTED (C), repo>dRaptordsRNA (E), repo>deIF4EdsRNA (F) and repo>dS6KdsRNA (G) display an estimated 10–20% reduction in glial cell numbers. repo>dTSC1dsRNA (D), repo>Rheb (H) and repo>deIF4E (I) brains contain normal numbers of glia. Genotypes: (A) repo-Gal4 UAS-CD8GFP/+ (B) UAS-CD8GFP/+; repo-Gal4/UAS-dAktdsRNA (C) UAS-TorTED/UAS-CD8GFP; repo-Gal4/+ (D) UAS-CD8GFP/+; repo-Gal4/UAS-dTSC1dsRNA (E) UAS-dRaptordsRNA/UAS-CD8GFP; repo-Gal4/+ (F) UAS-CD8GFP/+; repo-Gal4/UAS-deIF4EdsRNA (G) UAS-dS6KdsRNA; UAS-CD8GFP/+; repo-Gal4/+ (H) UAS-CD8GFP/+; repo-Gal4/UAS-Rheb (I) UAS-CD8GFP//UAS-deIF4E; repo-Gal4/+. (8.83 MB TIF) Figure S11 Validation of dsRNA constructs. 2 µm optical sections of larval brain hemispheres from wandering 3rd instar larvae. Frontal sections. Anterior up, midline to the left. Each individual staining pattern is shown alone (left panels) and with overlaid glial or neuronal markers (right panels). Repo (blue) in (A–J) marks glial cell nuclei. Glial cell bodies and membranes are labeled with CD8GFP (green) driven by repo-Gal4. In (K,L) an HRP counter-stain (blue) reveals neurons. Red marks histochemical stains for each indicated protein in repo>dEGFRλ;dp110CAAX (A,C,E,G,I,K) and repo>dEGFRλ;dp110CAAX with each indicated dsRNA construct (B,D,F,H,J,L). Each dsRNA construct was expressed with repo-Gal4 to yield glial-specific knock-down, which left protein expression in surrounding neuronal tissue intact. For the nuclear protein PntP1, knock-down was verified by the lack of glial-specific staining in the presence of the pntdsRNA (B), although PntP1 is present within neighboring neurons (‘N’). For dAkt, deIF4E, and dS6K, gene knock-down was confirmed using antibodies for total protein. Glial-specific reduction in gene expression by dAktdsRNA and deIF4EdsRNA is highlighted by white outlines in (D) and (F). dAkt protein is higher in neuronal tissue (‘N’) in both repo>dEGFRλ;dp110CAAX (C) and repo>dAktdsRNA;dEGFRλ;dp110CAAX (D). deIF4E is high in dEGFRλ;dp110CAAX glia (E) and low in neurons (E,F). repo>deIF4EdsRNA;dEGFRλ;dp110CAAX brains (F) have reduced glial eIF4E. Reduced glial dS6K protein, which is diffusely cytoplasmic, is observed in repo>dS6KdsRNA;dEGFRλ;dp110CAAX brains (H) compared to repo>dEGFRλ;dp110CAAX (G). Reduced glial dMyc expression in repo>dMycdsRNA;dEGFRλ;dp110CAAX brains (J) is noted by white outlines, and the absence of purple nuclei in overlay (J, right panel), relative to repo>dEGFRλ;dp110CAAX (I, right panel). dMyc is highly expressed in neuroblasts (‘NB’) in both samples (I, J), which do not express the dMycdsRNA. Glial Rbf1 protein is absent in repo>dRbf1dsRNA;dEGFRλ;dp110CAAX brains (L) in contrast to neighboring neurons (‘N’) in ,repo>dEGFRλ;dp110CAAX brains (K). Genotypes: (A,C,E,G,I,K) UAS-dEGFR, λ UAS-dp110,CAAX/+; UAS-CD8GFP/+; repo-Gal4/+ (B) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/UAS-pntdsRNA (D) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/UAS-dAktdsRNA (F) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/UAS-deIF4EdsRNA (H) UAS-dEGFRλ UAS-dp110CAAX/UAS-dS6KdsRNA; repo-Gal4 UAS-CD8GFP/+ (J) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-dMycdsRNA/+; repo-Gal4 UAS-CD8GFP/+ (L) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/UAS-Rbf1dsRNA. (9.04 MB TIF) Figure S12 Phospho-MPM2 reveals increased S-phase and M-phase glia in repo>dRbf1dsRNA;dEGFRλ;dp110CAAX brains. (A–C) phospho-MPM2 expression (red) in wild-type brains (A), repo>dEGFRλ;dp110CAAX brains (B), and repo>dRbf1dsRNA;dEGFRλ;dp110CAAX brains (C). 20 µm scale bars. Anterior up, midline to the left. 6 µm optical projections showing representative superficial dorsal regions enriched for mitotic glia. Phospho-MPM2 shown alone (left panels), overlaid with the Repo (blue, middle panels), and CD8GFP (green, right panels) glial markers. Phospho-MPM2 nuclear foci are present in S-phase cells (arrows note examples). S-phase glia are clearly visible in the middle panel as purple cells with MPM2 foci, which appear enriched in repo>dRbf1dsRNA;dEGFRλ;dp110CAAX brains (C) relative to repo>dEGFRλ;dp110CAAX brains (B). In all genotypes, high levels of phospho-MPM2 is also expressed in mitotic cells (asterisks note examples). Mitotic glia showed low levels of Repo (middle panels), but are clearly GFP-positive (right panels), suggesting that Repo protein expression is reduced upon mitosis in glia. repo>dRbf1dsRNA;dEGFRλ;dp110CAAX brains (C) also show increased density of phospho-MPM2-positive mitotic glia compared to repo>dEGFRλ;dp110CAAX brains (B). In wild-type (A), the majority of phospho-MPM2-postive cells are not glial and are neuroblasts or neuronal precursors, as revealed by the lack of overlap between phospho-MPM2 staining (middle panel) and the Repo and CD8GFP markers. Genotypes: (A) repo-Gal4 UAS-CD8GFP/+ (B) UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+ (C) UAS-dEGFRλ UAS-dp110CAAX/+; repo-Gal4 UAS-CD8GFP/UAS-Rbf1dsRNA. (8.99 MB TIF) Table S1 Overexpression of EGFR-Ras and PI3K-Akt pathways in Drosophila glia. (0.08 MB PDF) Table S2 Coactivation of EGFR and PI3K does not elicit neoplasia in all neural cell types. (0.08 MB PDF) Table S3 Genetic analysis of Akt, Tor, Myc, and CyclinD-Cdk4 pathways in neoplastic and normal glia. (0.10 MB PDF) Video S1 An animated 7.5 µm thick confocal z-stack of dEGFRλ; dp110CAAX tumor cells derived from transplanted larval glia, pictured in Figure 3D. The depth of each frame is noted in the upper left-hand corner. 20 µm scale bar for x/y-axis. Transplanted mutant glia are labeled with membrane bound CD8GFP (green) and the Repo nuclear protein (blue). Actin staining (red, phalloidin) reveals abdominal anatomy of the host. Asterisks indicate trachea embedded in the pictured dEGFRλ;dp110CAAX tumor, visible as hollow actin-positive (red) tubules running through tissues. Genotypes: Hosts were w1118 virgin females. Transplanted glia were UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+. (1.54 MB MOV) Video S2 An animated 27 µm thick confocal z-stack of dEGFRλ; dp110CAAX tumor cells derived from transplanted larval glia, pictured in Figure 3F. The depth of each frame is noted in the upper left-hand corner. 20 µm scale bar for x/y-axis. Transplanted mutant glia are labeled with membrane bound CD8GFP (green) and the Repo nuclear protein (blue). Actin staining (red, phalloidin) reveals abdominal anatomy of the host. Arrowheads in indicate dEGFRλ;dp110CAAX glial cells invading an ovary, distinguished by its characteristic actin staining (bright red). An asterisk indicates trachea embedded in this dEGFRλ;dp110CAAX tumor, visible as a hollow actin-positive (red) tubule running through the tissue. Genotypes: Hosts were w1118 virgin females. Transplanted glia were UAS-dEGFRλ UAS-dp110CAAX/+; UAS-CD8GFP/+; repo-Gal4/+. (1.16 MB MOV) Video S3 An animated 88 µm thick confocal z-stack of a hs-FLP/FRT clone of dPTEN-/-;dRas85DV12 cells pictured in Figure 4E. The depth of each frame is noted in the upper left-hand corner. 20 µm scale bar for x/y-axis. Approximately half of the affected brain is displayed, with the midline to the left. The dPTEN-/-;dRas85DV12 mutant cells are labeled with membrane-bound CD8GFP (green) and penetrate through the entire depth of the brain, some of which is visible as red background. These dPTEN-/-;dRas85DV12 cells appear to originate in the optic lobe (right) and extend variable tendrils into the central brain (middle, left). Genotype: hs-flp1/+; FRT40A tubGal80/FRT40A PTEN2L117; repo-Gal4 UAS-CD8GFP/UAS-dRas85DV12. (1.06 MB MOV) Video S4 An animated 16 µm thick confocal z-stack of a hs-FLP/FRT clone of wild-type cells pictured in Figure 4A. The depth of each frame is noted in the upper left-hand corner. 20 µm scale bar for x/y-axis. Midline to the left. This clone of normal glia is labeled with membrane-bound CD8GFP (green) and shows cells in a cell-body rich region of the central brain adjacent to the optic lobe. The clone is composed of 2 cells with extensive cytoplasmic projections. Genotype: hs-flp1/+;FRTG13 tubGal80/FRTG13 UAS-CD8GFP; repo-Gal4/+. (0.24 MB MOV) Kinesin-5–dependent Poleward Flux and Spindle Length Control in Drosophila Embryo Mitosis Abstract We used antibody microinjection and genetic manipulations to dissect the various roles of the homotetrameric kinesin-5, KLP61F, in astral, centrosome-controlled Drosophila embryo spindles and to test the hypothesis that it slides apart interpolar (ip) microtubules (MT), thereby controlling poleward flux and spindle length. In wild-type and Ncd null mutant embryos, anti-KLP61F dissociated the motor from spindles, producing a spatial gradient in the KLP61F content of different spindles, which was visible in KLP61F-GFP transgenic embryos. The resulting mitotic defects, supported by gene dosage experiments and time-lapse microscopy of living klp61f mutants, reveal that, after NEB, KLP61F drives persistent MT bundling and the outward sliding of antiparallel MTs, thereby contributing to several processes that all appear insensitive to cortical disruption. KLP61F activity contributes to the poleward flux of both ipMTs and kinetochore MTs and to the length of the metaphase spindle. KLP61F activity maintains the prometaphase spindle by antagonizing Ncd and another unknown force-generator and drives anaphase B, although the rate of spindle elongation is relatively insensitive to the motor's concentration. Finally, KLP61F activity contributes to normal chromosome congression, kinetochore spacing, and anaphase A rates. Thus, a KLP61F-driven sliding filament mechanism contributes to multiple aspects of mitosis in this system. INTRODUCTION Mitosis, the process by which identical copies of the replicated genome are distributed to the products of each cell division, involves a highly dynamic sequence of coordinated motility events, mediated by a bipolar protein machine, the mitotic spindle (Karsenti and Vernos, 2001 ; Mitchison and Salmon, 2001 ; Gadde and Heald, 2004 ; Wadsworth and Khodjakov, 2004 ; Mogilner et al., 2006 ; Brust-Mascher and Scholey, 2007 ; Walczak and Heald, 2008 ). These motility events are driven by molecular-scale forces generated by mitotic kinesins and dyneins, together with dynamic microtubules (MTs), whose activities are controlled by a network of regulatory proteins, e.g., mitotic kinases, phosphatases, and proteolytic enzymes (Sharp et al., 2000c ; Bettencourt-Dias et al., 2004 ; Maiato and Sunkel, 2004 ; Rogers et al., 2005 ; Goshima et al., 2007 ). Among these mitotic proteins, the kinesin-5 motor is thought to play a key role (Cottingham et al., 1999 ; Valentine et al., 2006a ; Civelekoglu-Scholey and Scholey, 2007 ). Purified kinesin-5 is a slow, modestly processive, plus-end–directed bipolar homotetramer capable of cross-linking adjacent MTs and sliding apart antiparallel MTs in motility assays (Sawin et al., 1992 ; Cole et al., 1994 ; Kashina et al., 1996a ; Kapitein et al., 2005 ; Tao et al., 2006 ; Valentine et al., 2006b ; Krzysiak et al., 2008 ; Van den Wildenberg et al., 2008 ). In yeast cells the homotetrameric structure of kinesin-5 appears to be essential for mitosis (Hildebrandt et al., 2006 ), and in Drosophila embryos KLP61F displays dynamic properties consistent with an association with spindle MTs (Cheerambathur et al., 2008 ) and forms presumptive MT–MT cross-bridges (Sharp et al., 1999a ). These results suggest that ensembles of multiple kinesin-5 motors could serve as dynamic cross-links that organize spindle MTs into bundles and drive a sliding filament mechanism that pushes apart antiparallel spindle MTs (Sharp et al., 1999a ; Brust-Mascher et al., 2004 ), although alternative mechanisms of action for kinesin-5 motors have also been proposed (Kapoor and Mitchison, 2001 ; Tsai et al., 2006 ; Johansen and Johansen, 2007 ; Gardner et al., 2008 ). The most obvious and frequently observed consequence of loss of activity of kinesin-5 motors, induced by loss-of-function mutation, antibody inhibition, small molecule inhibition, or RNA interference, is the formation of abnormal monoastral spindles (Enos and Morris, 1990 ; Saunders and Hoyt, 1992 ; Sawin et al., 1992 ; Heck et al., 1993 ; Saunders et al., 1997 ; Cottingham et al., 1999 ; Mayer et al., 1999 ; Sharp et al., 1999b ; Sharp et al., 2000a ; Goshima and Vale, 2003 ). This suggests that kinesin-5 may normally contribute to spindle bipolarity by sliding apart interpolar (ip) MTs to drive spindle pole separation during mitotic spindle assembly, elongation and function. (Saunders and Hoyt, 1992 ; Sawin et al., 1992 ; Heck et al., 1993 ; Saunders et al., 1997 ; Straight et al., 1998 ; Walczak et al., 1998 ; Sharp et al., 2000a ; Brust-Mascher and Scholey, 2002 ; Brust-Mascher et al., 2004 ; Cheerambathur et al., 2007 ). It is clear that this activity is deployed differently in different systems, however. For example, kinesin-5 activity is required for initial spindle assembly in yeast but not in Drosophila embryos, whereas it is required for spindle maintenance in both systems (Saunders and Hoyt, 1992 ; Sharp et al., 1999b ). Inhibition of kinesin-5 in different systems has also resulted in defects in poleward flux (Miyamoto et al., 2004 ; Cameron et al., 2006 ) and spindle pole organization (Gaglio et al., 1996 ). However, in Drosophila S2 cells, perturbation of kinesin-5 levels did not interfere with metaphase spindle length (Goshima et al., 2005b ). In Caenorhabditis elegans, kinesin-5 was also found to exert a braking effect on the spindle midzone that governs the rate of spindle elongation mediated by cortical motors (Saunders et al., 2007 ). It is unclear if these findings are relevant to all spindles and if they all reflect the antiparallel MT sliding activity of kinesin-5, or if kinesin-5 functions differently in different systems as implied by the proposal that kinesin-5 functions as a MT depolymerase rather than a sliding motor to control chromosome congression in Saccharomyces cerevisiae (Gardner et al., 2008 ). To distinguish these possibilities, therefore, systematic studies of kinesin-5 function throughout mitosis are needed, but this is difficult because the most frequently observed effect of perturbing kinesin-5 function, spindle collapse, obscures later effects in mitosis. In Drosophila embryos, where the centrosomes dictate the assembly of remarkably dynamic, fast-acting spindles (Megraw et al., 2001 ; Cheerambathur et al., 2007 ), inhibiting the function of the kinesin-5, KLP61F produced collapsed monoastral spindles at a specific stage during prometaphase, and this was rescued by loss of function of the kinesin-14, Ncd (Sharp et al., 1999b , 2000a ). Biochemical evidence supports the idea that the prometaphase spindle is maintained by a KLP61F-Ncd force balance (Tao et al., 2006 ), but it is unclear if the monoastral spindles form by the drawing together of focused poles due to Ncd-induced inward MT sliding or by the disorganization of spindle MTs combined with the fusion of microtubule organizing centers (Goshima and Vale, 2003 ). We previously proposed that the KLP61F-driven outward sliding of ipMTs drives pre-anaphase B poleward flux and anaphase B spindle elongation (Brust-Mascher and Scholey, 2002 ; Brust-Mascher et al., 2004 ). However, reductionist modeling (Brust-Mascher et al., 2004 ) and system level modeling (Wollman et al., 2008 ) surprisingly predicted that the rate of anaphase B pole–pole separation is insensitive to the KLP61F concentration and is controlled instead by the unloaded KLP61F MT sliding velocity together with a pole-associated MT depolymerase. Although experimental evidence supports a requirement for KLP61F activity in anaphase B (Sharp et al., 2000a ), we had not tested the effect of changing its concentration on the rate of anaphase B or its contribution to poleward flux and metaphase spindle length. Furthermore, it is unclear if some of these activities also require the activity of the overlying cortex, e.g., via cortical motors pulling astral MTs outward. In the current study we addressed some of these uncertainties by undertaking a systematic investigation of the spectrum of mitotic functions of kinesin-5, through antibody-induced dissociation of KLP61F from Drosophila syncytial blastoderm-stage embryo spindles, using an experimental system where the extent of dissociation can be directly observed. This was complemented by studies of the effects of varying the KLP61F gene dosage in embryos and by the investigation of mutant spindles after depletion of the maternal load of KLP61F. We find that kinesin-5 contributes to various aspects of mitosis in this system, some of which are novel and perhaps unexpected. MATERIALS AND METHODS Drosophila Stocks Flies were maintained and embryos were collected as described (Sharp et al., 1999 ). Experiments were performed on embryos expressing green fluorescent protein (GFP)::tubulin or GFP::CNN (provided by Dr. Thomas Kaufman, Indiana University Bloomington) or GFP::CID, the Drosophila CENPA homolog (Henikoff et al., 2000 ), which is a stable and specific kinetochore marker (provided by Dr. Steven Henikoff, Fred Hutchinson Cancer Research), or KLP61F::GFP (Cheerambathur et al., 2008 ), as well as on Claret nondisjunctional (cand) and klp61f mutant embryos. Generation of klp61f Mutant Fly Lines Expressing GFP-Tubulin Standard genetic techniques were used to generate flies with GFP::tubulin on the second chromosome and these were then crossed into different alleles of klp61f mutants for genetic analyses. Briefly, we first crossed y[1] w[*]; P{w[+mC] = UASp-GFPS65C-alphaTub84B{14-6-II/P{w[+mC] = UASp-GFPS65C-alphaTub84B{14-6-II with yw;P{Act5C-Gal4{/CyO,y+ or w*; P{w[+mC] = matalpha4-GAL-VP16{V2H. We selected P{w[+mC] = UASp-GFPS65C-alphaTub84B{14-6-II/P{Act5C-Gal4{or P{w[+mC] = UASp-GFPS65C-alphaTub84B{14-6-II/P{w[+mC] = matalpha4-GAL-VP16{V2H females and crossed them with w; ln(2LR)Gla,wgGla-1Bc1/CyO males. We selected progeny in which recombination between the UAS-GFP-tubulin and the Gal4 driver had occurred by picking fluorescent larvae and used individual flies to make balanced stocks. All stocks were from the Bloomington Stock Center. To cross these balanced lines expressing GFP-tubulin under the Gal 4 driver with klp61f mutants (provided by Margarete Heck, University of Edinburgh; Andrea Pereira, University of Massachusetts; and Patricia Wilson, Georgia State University), we first crossed both lines with w; Sp/CyORoi; Ly/TM3Sbe flies (provided by Dr Jeanette Naetzle, University of California Davis), in which we had replaced the TM3 balancer with KrGFPTM3 or TM6 to distinguish homozygous from heterozygous embryos or larvae, respectively. The klp61f alleles used (7012 and 7415) are null or severe hypomorphic, late larval lethals, obtained by P-element insertion in the untranslated region of KLP61F (Heck et al., 1993 ). Antibody Generation KLP61F antibody was generated against amino acids 900-1066, which includes the BimC box containing a mutation in which the putative phosphorylation site, Threonine 933, was mutated to aspartic acid. To generate this mutant, two sets of primers were used in three sequential PCR steps: two anchor primers, flanking the region of interest and containing BamHI and EcoRI restriction sites, and two mutagenic primers containing the T933D mutation. To produce KLP61F antibody, PCR-amplified KLP61F (nt 2698-3201) containing the T933D mutation was subcloned into pGEX-JDK and the expressed protein was purified using glutathione-agarose affinity chromatography. Two rabbit polyclonal antibodies were generated using standard procedures and affinity-purified on GST-KLP61F columns. The antibody was acid eluted, neutralized in Tris buffer, dialyzed into PBS buffer, and concentrated. For embryo injection, the concentration in the needle was 7–15 mg/ml. Time-Lapse and Fluorescent Speckle Microscopy of Embryos Embryos were injected with rhodamine labeled tubulin (Cytoskeleton, Denver, CO). Time-lapse images were acquired with an Olympus (Melville, NY) microscope equipped with an Ultra-View spinning disk confocal head (Perkin Elmer-Cetus, Boston, MA) and a 100× 1.35 NA objective at time intervals of 1–3 s. Images were analyzed with Metamorph Imaging software (Universal Imaging, West Chester, PA). No neighbors deconvolution or sharpen high filters followed by a low-pass filter were applied to all images. Pole-to-pole distance as a function of time was measured from the position of the poles in each image. Kymography was used to quantify speckle movement. Calculations and statistical analyses were done on Microsoft Excel (Redmond, WA). To quantify the amount of motor on the spindle we measured the intensity of GFP and that of rhodamine on the spindle and subtracted background intensity and ratioed these two values. This allowed us to compare the amount of motor remaining on spindles within single embryos, but we cannot control the amount of injected rhodamine tubulin precisely enough to permit quantitative comparisons between different embryos. Time-Lapse Analysis of Mitosis in Late Embryos and Larval Brains Embryos were collected for 1 h and imaged after at least 7 h. Brains were dissected from wild-type larvae expressing G147-GFP (Bloomington Stock Center) or GFP-tubulin and from klp61f mutant second and third instar larvae expressing GFP-tubulin in D-22 (pH 6.7) insect medium (U.S. Biological, Swampscott, MA) supplemented with 7.5% fetal bovine serum and 0.5 mM ascorbic acid (Sigma, St. Louis, MO). The explanted brain was then moved carefully in a drop of medium onto a No. 1 coverslip placed on a special stainless steel slide with an indentation to allow observation of the sample. The brain was pushed carefully into direct contact with the coverslip, and an additional coverslip was placed on top of the stainless steel slide to keep the medium in place and prevent dehydration of the sample. Late embryos and brains were imaged as above except that stacks of 14–18 confocal planes spaced 0.5 µm apart were taken at time intervals of 30 s. Images were analyzed as above. Online Supplemental Material Supplemental Video 1 shows a time-lapse movie of an embryo expressing KLP61F-GFP and injected with rhodamine tubulin and anti-KLP61F (see Figure 1). Supplemental Video 2 shows a time-lapse movie of an embryo injected with rhodamine tubulin and anti-KLP61F (see Figure 2, A and C). Supplemental Video 3 shows a time-lapse movie of an embryo expressing GFP-CNN and injected with rhodamine tubulin and anti-KLP61F (see Figure 2, E–G). (A) Microinjection of an anti-KLP61F antibody (designed to dissociate KLP61F from spindles) results in a gradient of antibody concentration and produces a gradient in the KLP61F content of different spindles. Images from time-lapse movie of an embryo expressing KLP61F-GFP and injected with rhodamine tubulin and anti-KLP61F (see Supplemental Video 1). Time in each frame is given in seconds from NEB. Bar, 10 µm. The injection site was close to the top of the embryo (arrow), KLP61F-GFP forms immunoprecipitates, and most KLP61F-GFP is depleted from the spindles. These spindles collapse, as seen at 247 s. Toward the bottom of the embryo, KLP61F-GFP is still present on the spindles and consequently these spindles assemble, though they may exhibit defects. (B) Graph of pole–pole distance as a function of time (left) and quantification of KLP61F remaining on these spindles (right). The normalized ratio of KLP61F-GFP to rhodamine tubulin is used to compare the amount of motor remaining on each spindle at different time points. Spindles that collapse have practically no motor on them, whereas spindles that do not collapse or recover from partial collapse have at least 40% remaining. (A) Two time points from a wild-type embryo injected with anti-KLP61F antibody showing a gradient of defects (see Supplemental Video 2). Bar, 10 µm. (B) Two time points from an Ncd null embryo injected with anti-KLP61F antibody showing more disorganized spindles due to the loss of two motors. Bar, 10 µm. (C) Graph of pole–pole distance as a function of time for the embryo shown in A. Close to the injection site spindles collapse (see s1 and s3), and other spindles shorten but remain small (see s2). Further away from the injection site, spindles do not elongate during prometaphase (s5, and the most distal spindles exhibit anaphase elongation (s6, s7, and s8). For comparison, we show graphs from a wild-type embryo (purple) and most severe effect (an embryo injected with enough antibody to collapse all spindles within the field of view; red). (D) Spindle length as a function of time for the Ncd null embryo shown in B. For comparison graphs of wild-type embryos (purple), Ncd null embryos (dark blue), and Ncd null embryos injected with a high concentration of anti-KLP61F (red) are shown. The gradient of concentration of antibody leads to a gradient of spindle pole lengths. A high inhibition of KLP61F leads to spindle collapse; lower concentrations lead to less severe defects. At a certain concentration the spindle maintains a steady length (light purple). (E) Anti-KLP61F antibody causes spindle collapse at prometaphase without apparent disruption of centrosomes or spindle poles. Images from time-lapse movie of an embryo expressing GFP-CNN (green) and injected with rhodamine tubulin (red) and anti-KLP61F (see Supplemental Video 3). Time in each frame is given in seconds. Bar, 10 µm. Note how the centrosomes move toward each other, but do not appear disorganized. (F) Tubulin fluorescence in the central spindle increases as the spindle collapses. Normalized spindle length (filled symbols) and fluorescent tubulin intensity at the equator (empty symbols) as a function of time for wild-type (black) and KLP61F inhibited spindles (red). After KLP61F inhibition spindles collapse with a correlated increase in tubulin intensity, suggesting that MTs are sliding inward and not depolymerizing. The increase is much higher than that observed in wild-type spindles, indicating that it is not due to a normal increase in tubulin. (G) Linescans of fluorescent tubulin intensity along a single ipMT bundle at different time points during the collapse. The intensity increases as the spindle collapses. RESULTS Injection of Anti-KLP61F Antibody into the Drosophila Syncytial Embryo Causes a Gradient of Mitotic Defects We raised new rabbit affinity-purified polyclonal antibodies to the C-terminal tail domain of KLP61F. Studies using transient transfection with Eg5 mutants in cultured Xenopus cells (Sawin and Mitchison, 1995 ), with KLP61F mutants in live Drosophila embryos (Cheerambathur et al., 2008 ) and using antibodies raised against phospho- and unphospho-epitopes in Drosophila embryos (Sharp et al., 1999a ) suggest that the cyclin-dependent kinase-mediated phosphorylation of a conserved Thr-933 residue in the bimC box of kinesin-5 is critical for its localization to the spindle. We reasoned that antibodies to the phosphorylated form of this domain would competitively inhibit the binding of native phosphorylated KLP61F to the spindle and, by displacing the motor from its site of action, would inhibit its function. Accordingly antibodies were raised against a tail domain fragment in which the Thr-933 residue was changed to an aspartate phospho-mimic by site-directed mutagenesis (Materials and Methods). These antibodies react with a single polypeptide of appropriate molecular weight on immunoblots of embryonic extracts and have been shown to stain mitotic spindles by immunofluorescence (our unpublished results and Silverman-Gavrila and Wilde, 2006 ), consistent with our previous results (Sharp et al., 1999a ; Cheerambathur et al., 2008 ). As reported previously, the microinjection of affinity-purified antibody into the syncytial blastoderm embryo blocked mitosis and produced monoastral arrays of microtubules (Sharp et al., 1999b , 2000a ) similar to those seen in fixed loss-of-function klp61f mutants after depletion of the maternal load in larvae (Heck et al., 1993 ; see below). To directly visualize the extent of the anti-KLP61F induced dissociation of KLP61F from mitotic spindles in the injected embryos, we coinjected fluorescent tubulin and anti-KLP61F into stable transgenic strains of Drosophila that express functional KLP61F-GFP (Cheerambathur et al., 2008 ; Figure 1; Supplemental Video 1). The antibody rapidly displaced KLP61F from spindles and formed visible, fluorescent immunoprecipitates in the adjacent cytoplasm (Figure 1, arrow). Moreover this experiment provided clear visual evidence of the formation of a spatial gradient of inhibition of the target antigen throughout the syncytial blastoderm, as had been inferred indirectly from the effects of inhibitory antibody microinjection in previous studies (Sharp et al., 2000b ; Blower and Karpen, 2001 ; Kwon et al., 2004 ). Proximal to the antibody microinjection site, where the anti-KLP61F concentration is highest, KLP61F is completely dissociated from spindles and consequently spindles form monoasters. The injected antibody diffuses through the cytoplasm, setting up a concentration gradient, highest at the injection site and decreasing at increasingly distal sites. Accordingly, moving away from the injection site, the extent of depletion of KLP61F from spindles decreases, and increasingly higher concentrations of fluorescent motor are retained on spindles (Figure 1). Thus at distal sites, spindles do not form monoasters but display less severe defects, producing a range of “phenotypes” analogous to an allelic series of genetic mutants (Figure 1). We quantified the amount of KLP61F present on spindles at different sites within this gradient by measuring the fluorescence intensity ratio of KLP61F-GFP to injected rhodamine tubulin, which allows us to compare spindles within a single embryo, but not between multiple embryos, due to small variations in the amount of injected tubulin. As shown in Figure 1B, the severity of spindle pole separation defects correlates reasonably well with the extent of depletion of KLP61F from the spindle, so that spindles containing maximal levels of KLP61F-GFP (arbitrarily assigned a value 1.0) displayed spindle pole separation profiles similar to uninjected controls, spindles containing only 0–10% residual KLP61F-GFP collapsed into monoasters, and those containing intermediate levels of the motor (50–70%) displayed spindle length defects but completed mitosis nonetheless. Spindle Pole Separation Defects Produced by Varying Levels of KLP61F Depletion in Wild-Type Versus Ncd Null Mutant Embryos In previous studies, we reported that prometaphase spindles collapse into monoasters after KLP61F inhibition and are “rescued” in Ncd null mutants, but the behavior of spindles containing intermediate levels of KLP61F was not systematically examined. Results presented in the previous section now permit us to do so. In wild-type embryos, we observed that, at increasing distances from the injection site, the extent of perturbation of spindle length, as characterized by changes in pole–pole spacing versus time, decreases in parallel with the decrease in extent of depletion of KLP61F from spindles (Figures 1B and 2C). The most severe effect, prometaphase spindle collapse, occurred at a rate of 0.06 µm/s (Figure 1B, spindles s5 and s6), although some spindles close by retained sufficient levels of KLP61F to slow down the rate of collapse (Figure 2C, spindles s1–s4). Further away, spindles were observed to shorten somewhat, but did not collapse and instead maintained a short prometaphase length, and then elongated at anaphase B (Figures 1B, s3 and s4, and 2C, s5). Even further away, spindles did not elongate during prometaphase, or did so only slightly, but then elongated at approximately wild-type rates during anaphase B sometimes after a slight delay (Figures 1B, s1 and s2, and 2C, s6–s8). Thus, particularly noteworthy features of the spindles that do not collapse after partial loss of KLP61F are their short lengths during metaphase and anaphase A and their relatively normal rates of anaphase B spindle elongation. The latter result was intriguing because we had previously proposed that KLP61F-driven ipMT sliding drives anaphase B, yet in studying the gradient of intermediate phenotypes (Figures 1B and 2C), we did not observe spindles that circumvented the prometaphase collapse but then failed to elongate normally during subsequent anaphase B. This is, in fact, consistent with mathematical modeling predicting that the rate of anaphase spindle elongation is independent of the number of motors, at least above a low threshold number (Brust-Mascher et al., 2004 ; Wollman et al., 2008 ). We hypothesized that spindles containing sufficient KLP61F motor to circumvent collapse and progress through prometaphase might also contain enough motor to elongate the spindle at normal rates during anaphase B. We further reasoned that, because Ncd generates an inward force capable of antagonizing KLP61F (Sharp et al., 2000a ; Tao et al., 2006 ), in the absence of Ncd function, we might improve our chances of encountering spindles with sufficient KLP61F to proceed through prometaphase but below the threshold level of KLP61F needed to drive normal rates of anaphase B. In such spindles, a decreased rate of anaphase B spindle elongation should be observed. Accordingly, we inhibited KLP61F in Ncd null mutant embryos, which totally lack kinesin-14 function, in order to circumvent prometaphase spindle collapse and to study the role of KLP61F during spindle elongation. In uninjected mutants, pole–pole separation occurred prematurely from prometaphase onward, supporting our hypothesis that Ncd serves as a “brake” that exerts an inward force on spindle poles, antagonistic to KLP61F (Figure 2D, dark blue). Pole–pole separation in Ncd null embryos occurs shortly after nuclear envelope breakdown (NEB), earlier than in wild-type embryos, suggesting that normally Ncd's braking activity is turned on throughout prometaphase. Anti-KLP61F injection into Ncd null mutant embryos produced a gradient of defects (Figure 2D), where the most severe effect was spindle collapse. This was not observed in our previous studies (Sharp et al., 1999b , 2000a ) because more antibody is needed to collapse the spindle in an Ncd null background than in a wild-type background. This result suggests that, when KLP61F and Ncd function are both missing, another inward force-generator must drive pole–pole collapse. In addition, spindles lacking the function of both motors are disorganized (Figure 2B), suggesting that these motors contribute to spindle organization by cross-linking MTs throughout the spindle, in agreement with their localization in these cells. Significantly, at more distal sites where spindles retain partial KLP61F function but Ncd function is missing, the spindles do not collapse and either do not elongate at all (Figure 2D, s1) or elongate after a significant delay (Figure 2D, s2–s6). This failure to elongate was not observed in anti-KLP61F–injected wild-type embryos, and it supports the hypothesis that KLP61F is necessary for anaphase B. The Collapse Observed after Strong Inhibition of KLP61F Is Due to the Drawing Together of Intact, Organized Spindle Poles by Inward MT Sliding We wanted to better understand if the spindle length defects we observe after loss of kinesin-5 function reflect defects in spindle pole separation (Saunders et al., 1997 ; Sharp et al., 2000a ) or in spindle pole focusing (Gaglio et al., 1996 ; Goshima and Vale, 2003 ). To simultaneously observe the behavior of MTs and spindle poles and see if anti-KLP61F has any detectable effect on pole organization, we microinjected anti-KLP61F with rhodamine-labeled tubulin into embryos expressing GFP-centrosomin (CNN), an integral component of the mitotic centrosome (Megraw et al., 2002 ; Figure 2E; Supplemental Video 3). During collapse, intact spindle poles steadily moved toward one another at a rate of ~0.06 µm/s, and the appearance of focused, radial sets of fluorescent MTs projecting from discrete centrosomes persisted throughout the collapse (Figure 2E). There was no obvious disorganization of the spindle poles, although some MT bundles appeared to “splay” laterally, projecting from the pole into the adjacent cytoplasm. Injection of a high concentration of antibody caused all observed spindles to collapse, and this was accompanied by a concomitant increase in the fluorescent intensity of tubulin in the central spindle (Figure 2F) as seen in linescans along the pole–pole axis of the collapsing spindles (Figure 2G). This indicates that ipMT bundles become thicker during the collapse, consistent with spindle shortening being due to an inward sliding of adjacent, antiparallel ipMTs, although serial section electron microscopy (EM) done at time intervals during the collapse, which is probably not feasible, would be required to establish this. Genetic Evidence for a Role of KLP61F in Driving Spindle Pole Separation Loss-of-function mutants provide a complementary method to antibody inhibition for probing motor protein function (Scholey, 1998 ; Figure 3). Homozygous klp61f mutants undergo mitosis in embryos due to the maternal load of KLP61F but display late larval lethality due to mitotic defects that arise when the maternal load of KLP61F is depleted (Heck et al., 1993 ). Fixed larval neuroblasts were previously found to contain abnormal monoastral spindles, consistent with our antibody inhibition results (Heck et al., 1993 ). However, in the absence of real-time studies, it is unclear if these monoasters arise from an initial failure in spindle pole separation, from a collapse of assembled bipolar spindles, or from the fixation of transiently collapsing structures. To study this problem, we created klp61f mutant flies (Heck et al., 1993 ) expressing GFP-tubulin using laborious genetic crosses (Materials and Methods). (A) Still images from real-time recordings of spindle pole separation dynamics in klp61f7012 mutant embryos expressing GFP-tubulin. Top row, spindle in wild-type embryo. Second row, a monoastral spindle that becomes bipolar. Third row, a spindle that collapses to form a permanent monoaster. Arrows mark the poles. Fourth row, monoaster that does not change during observation time. Bar, 5 µm. (B) Still images from real-time recordings of spindle pole separation dynamics in wild-type and klp61f7415 mutants expressing GFP-tubulin. Top row, a wild-type neuroblast. Second row, a mutant neuroblast with separated spindle poles that transiently collapse, then recover and separate fully. Third row, a mutant larval spindle that collapses to form a permanent monoaster. Bar, 5 µm. (C) Differences in spindle pole separation due to varying KLP61F gene dosage. Left, embryos from heterozygous mothers have a shorter metaphase length than wild-type embryos. Right, klp61f mutant embryos rescued with two copies of a KLP61F-GFP transgene have a longer metaphase spindle length than those rescued with one copy. In one allele (klp61f7012), larvae have smaller brains than wild types and spindles are rare. In these mutants, spindles start collapsing in mitotic domains of the late embryo (Figure 3A), where we observed that some spindles collapse into stable monoasters that fail to complete mitosis. Other monoasters seem to form monoastral bipolar spindles, which may assemble via a chromosome-directed pathway in which KLP61F activity is required to focus the anastral spindle pole as proposed for S2 cells (Goshima and Vale, 2003 ). Such monoastral bipolar spindles were not observed in our antibody inhibition experiments in syncytial embryos, but they have previously been observed in fixed larval brains and gonial cells of klp61f mutants (Wilson et al., 1997 ). These observations are consistent with the idea that KLP61F is required for the separation of spindle poles in the rapid, centrosome-dominant mitoses of the syncytial embryo, but as development proceeds, mitosis slows down (see below) and centrosomes become less dominant, so that KLP61F plays increasingly important roles in anastral spindle pole focusing. In a different allele (klp61f7415), homozygous larvae had brains of normal size, and we could follow the progression of living mutant cells through mitosis (Figure 3B). We observed collapsing spindles with similar pole–pole separation defects to those seen in anti-KLP61F–injected embryos. In wild-type brains we never observed such collapsing or shortening spindles. Figure 3B shows the dynamics of spindle pole separation in a wild-type neuroblast spindle (Figure 3B, top row), a mutant spindle that initially separates its poles then collapses into a stable monoaster (Figure 3B, bottom), and a mutant spindle that transiently collapses and then recovers (Figure 3B, center). Careful 3D observations and measurements confirm that we were indeed visualizing spindle pole separation dynamics and not simply changes in the orientation of the spindle. The results are consistent with a complete and partial depletion of the maternal load of KLP61F giving rise to permanent and transient spindle collapse, respectively. Because the mutants examined are recessive lethal (Heck et al., 1993 ), this result makes it extremely unlikely that the collapse of prometaphase spindles seen after severe antibody inhibition is due to dominant effects of the antibody. Interestingly, the rate of spindle pole separation in late embryos and neuroblasts is slower than in syncytial embryos; the time from NEB to telophase takes 10–15 min in late embryos and ~20 min in neuroblasts compared with 4–6 min in the syncytial embryo. However, whether this difference in time reflects differences in the balance of forces operating in the spindles of different cell types or some other unknown factors is unclear. Further evidence in support of the idea that KLP61F exerts outward forces on spindle poles was obtained by examining embryos carrying varying “doses” of the KLP61F gene (Figure 3C). For example, during prometaphase wild types harboring two copies of the KLP61F gene separated their spindle poles further than klp61f mutant heterozygotes containing a single copy, suggesting that the predicted increase in the concentration of the motor increases the outward force that pushes apart the poles (Figure 3C, left). In addition, the viability of klp61f mutants could be rescued with one or two copies of a KLP61F-GFP transgene that express the expected amount of protein based on immunoblotting (Cheerambathur et al., 2008 ). Measurement of pole–pole distances in these embryos also showed that two copies of the GFP transgene supported faster prometaphase spindle elongation and the formation of longer metaphase spindles than did a single copy of the transgene (Figure 3C, right), again consistent with the gradient observed after antibody inhibition. KLP61F Is Required for Poleward Flux and Normal Metaphase Spindle Length The gradient of defects elicited by antibody inhibition experiments revealed additional roles for KLP61F in embryos. On the basis of comparisons of the rates of poleward flux within ipMT bundles and the rates of MT sliding induced by purified KLP61F, as well as the observation that poleward flux stops at the onset of anaphase B, we previously proposed that KLP61F drives poleward flux in Drosophila embryos, but this had not been tested in vivo (Cole et al., 1994 ; Brust-Mascher and Scholey, 2002 ; Brust-Mascher et al., 2004 ; Tao et al., 2006 ). If we classify the gradient induced by KLP61F antibody injection by analyzing spindle pole length, we find as stated above that the most severe effect is spindle collapse. The next most severe effect is partial spindle shortening followed by a steady-state length that is smaller than the pole spacing at NEB. Other spindles maintained the characteristic NEB pole spacing until anaphase B or displayed only a small prometaphase elongation. In these spindles we measured the steady-state length and the rate of poleward flux during these periods of steady spindle length maintenance (i.e., metaphase and anaphase A). The average metaphase length decreased from 11.8 ± 0.08 to 8.8 ± 1.4 µm (Table 1), and poleward flux was significantly inhibited both within ipMTs and kinetochore (k) MTs (Figure 4). The average rate of flux was 0.01 ± 0.02 µm/s in KLP61F inhibited embryos compared with 0.05 ± 0.02 µm/s in control embryos (Figure 4; Table 1). This indicates that KLP61F-driven MT sliding is necessary to achieve the wild-type metaphase spindle length and that it also contributes to poleward flux in both ipMTs and kMTs. Interestingly, 17% of our flux measurements were negative (Figure 4C). A possible source of negative flux numbers, especially close to zero, is noise in the measurements. A more interesting possibility is that they represent speckles on ipMTs associated with one pole that slide inward as the spindle transiently shortens, bringing them closer to the pole in the opposite half spindle. Such a speckle would flux toward the first pole but away from the second, now closer, pole, and this would be measured as negative flux. Partial inhibition of KLP61F decreases the rates of flux and anaphase A chromosome to pole movement as well as the metaphase spindle length and kinetochore-kinetochore distance Poleward flux requires the function of KLP61F. (A) In wild-type embryos kymographs show that tubulin speckles flux toward the poles during the metaphase–anaphase A steady state, but move at the same rate as the poles during anaphase B. (Brust-Mascher and Scholey, 2002 ). (B) In embryos injected with anti-KLP61F antibody, kymographs obtained from spindles that did not collapse but maintained a steady length show that tubulin speckles do not flux toward the pole, indicating that the function of KLP61F is required for this movement (see Supplemental Video 2 for speckled movie). (C) Histogram of all the flux data obtained on spindles that did not collapse after KLP61F inhibition. Congression Is Defective, Kinetochore Spacing and the Rate of Anaphase A Are Reduced, and the Relative Timing of Anaphase A and B Is Perturbed in Spindles Partially Depleted of KLP61F In those spindles that retain sufficient KLP61F to maintain pole–pole spacing and resist prometaphase spindle collapse, we were also able to study congression and measure the subsequent rate of anaphase A in transgenic embryos expressing GFP-CID that were injected with rhodamine tubulin and anti-KLP61F (Figure 5). In spindles that shorten or do not undergo prometaphase-to-metaphase elongation, but then proceed through anaphase, we find that congression is defective. Kinetochores occupy a larger area around the equator compared with wild-type spindles, revealing subtle defects in chromosome positioning on the metaphase plate (Figure 5B). Interestingly, kinetochore–kinetochore distance is reduced from 1.31 ± 0.22 µm in wild type to 0.77 ± 0.20 µm in KLP61F-inhibited spindles (Table 1), indicating a reduction in the magnitude of poleward forces acting on the kinetochores. Subsequently, the average rate of kinetochore-to-pole movement is 0.06 ± 0.02 µm/s (144 kinetochores in 37 spindles) compared with 0.11 ± 0.02 µm/s for wild-type anaphase A movement (Figure 5; Table 1). Some spindles that maintained a reduced but steady pole–pole spacing displayed no kinetochore-to-pole movement and a single nucleus reformed around the unsegregated parental chromosomes. Such spindles were excluded from our calculations of the average rate. Chromosome congression and anaphase A are perturbed by partial KLP61F inhibition. (A) Time-lapse images of one spindle in an embryo expressing GFP-CID injected with rhodamine tubulin and anti-KLP61F antibody. The spindle shortens slightly (compare 215 with 126) but does not collapse after KLP61F inhibition. Kinetochores move to the poles albeit slower than in control spindles. Time in each frame is given in seconds from NEB. Bar, 5 µm. (B) Positions of the center of kinetochore pairs in wild-type (black) and KLP61F-inhibited (red) spindles. In anti-KLP61F–inhibited spindles, kinetochores do not congress as tightly as in controls, and they exhibit a larger displacement around the equator. Triangles mark the average positions of the poles. (C) Pole–pole distance and kinetochore-to-pole distances for the spindle shown in A (averages in Table 1). Significantly, the lower rate of 0.06 µm/s parallels the observed lack of poleward flux (previous section) and supports the idea that anaphase A is driven by a combined “flux-pacman” mechanism in Drosophila embryos (Brust-Mascher and Scholey, 2002 ; Maddox et al., 2002 ; Rogers et al., 2004 ). Interestingly, we noticed that in many of these spindles, kinetochore-to-pole movement occurs at the same time as pole–pole separation, whereas in wild-type spindles, anaphase A always precedes anaphase B. How loss of kinesin-5 function could lead to changes in the temporal relationship between chromosome-to-pole motility and spindle elongation during anaphase is unclear. As predicted (Sharp et al., 2000a ; Brust-Mascher et al., 2004 ) the rate of anaphase B is relatively insensitive to KLP61F concentration to a certain point: spindles that exhibited a partial collapse and had a metaphase length of <7 µm elongated at a slower rate, but spindles that maintained at least the NEB length elongated at approximately wild-type rates. Influence of Cortical Organization on KLP61F-dependent Mitotic Events Cortical pulling forces could complement the outward sliding of ipMTs to control pole–pole spacing. To examine this, we measured spindle pole dynamics and MT speckle movements in two mutants that have severely disrupted actin cortices surrounding the nucleus. In the Drosophila syncytial embryo, actin caps are normally formed over each nucleus and a furrow is formed around the spindle during metaphase (Cytrynbaum et al., 2005 ). The cortex-defective Scrambled (sced; Stevenson et al., 2001 ) and Sponge (spg; Postner et al., 1992 ) mutants have highly reduced actin caps and no metaphase furrows (Supplemental Figure S1). If cortical dynein or cortical MT depolymerases play a significant role in pulling spindle poles apart by sliding astral MTs outward relative to the actin cortex, this activity should be reduced or eliminated due to the depletion of actin caps and furrows in these mutants. Plots of pole–pole separation dynamics (Supplemental Figure S1) show that the rate of separation during prometaphase is slower in these mutants, but during anaphase B there is no significant difference. Using fluorescence speckle microscopy, we examined the rate of movement of tubulin speckles away from the equator during metaphase and anaphase. This could reflect the sliding apart of MTs either by outward ipMT sliding by KLP61F or, during anaphase B, by outward pulling by cortical forces. In wild-type spindles, speckles move at a rate of 0.05 ± 0.02 µm/s, in spg mutants they move at 0.05 ± 0.02 µm/s, and in sced mutants at 0.06 ± 0.02 µm/s (Supplemental Figure S1), indicating that outward pulling by cortical forces is not significant throughout metaphase and anaphase. Together these results are consistent with the hypothesis that cortical forces augment ipMT sliding to contribute to spindle elongation during the prometaphase-to-metaphase transition, for example, via cortical dynein activity (Sharp et al., 2000a ). However, cortical pulling forces apparently do not play a significant role after metaphase onset, whereupon KLP61F becomes the major force generator for the outward sliding of ipMTs that underlies poleward flux and anaphase spindle elongation. DISCUSSION This study exploits our ability to create and directly visualize the KLP61F concentration gradient produced by the microinjection of an antibody to the motor's spindle targeting domain in wild-type and Ncd null mutant Drosophila syncytial blastoderm embryos in order to examine the spectrum of functions of kinesin-5. The work suggests that MT–MT crosslinking and sliding by KLP61F contributes to 1) the maintenance of prometaphase spindles as chromosomes are captured; 2) poleward flux in ipMTs and kMTs; 3) the bundling and organization of MTs throughout the spindle; 4) the control of pre-anaphase B spindle length; 5) tight congression, kinetochore–kinetochore spacing and the rate of anaphase A chromatid-to-pole motility; and 6) the elongation of the anaphase B spindle in a manner that is relatively insensitive to the KLP61F concentration (Figure 6). The hypothesis that KLP61F drives MT sliding from NEB through anaphase B to push apart spindle poles was also supported by gene dosage experiments in embryos and real-time fluorescence microscopy of transgenic late embryonic/larval klp61f mutants expressing GFP-tubulin. The latter studies also revealed monoastral bipolar spindles that were never seen in syncytial embryos, suggesting that chromosome-directed spindle assembly becomes more evident as embryonic and larval development proceeds and mitosis slows down. Model for the multiple roles of KLP61F in Drosophila syncytial embryo mitosis. We propose that kinesin-5 motors function as ensembles of dynamic, transient, MT crosslinkers throughout the spindle (Sharp et al., 1999a ; Cheerambathur et al., 2008 ), organizing parallel MTs into bundles (inset i) and sliding apart antiparallel MTs (inset ii), in accordance with the biochemical properties of KLP61F (e.g., Cole et al., 1994 ; Kashina et al., 1996a ; Tao et al., 2006 ; Van den Wildenberg et al., 2008 ). The results suggest that KLP61F-driven MT-MT sliding is required to oppose an inward force (ii, brown arrows) and maintain the spindle during early prometaphase, as well as to elongate it to the metaphase steady state length. In preanaphase B (i.e. metaphase and anaphase A) spindles, KLP61F driven ipMT outward sliding (blue solid arrows) is required for poleward flux in both kMT and ipMT bundles and for anaphase A chromosome movement. The kMTs may be slid poleward (dashed blue arrows) via crosslinks to the actively sliding antiparallel ipMTs. Finally, KLP61F drives spindle elongation during anaphase B. This model is proposed only for Drosophila embryos (see Discussion). Experimental Strategy Several methods are currently available for inhibiting target proteins in cells, including mutants, injection of dominant negative protein fragments, chemical inhibitors, RNA interference, and inhibitory antibody microinjection. In the current study, microinjected antibody to the spindle-targeting phospho-tail domain on kinesin-5 (Sawin and Mitchison, 1995 ; Sharp et al., 1999a ; Cheerambathur et al., 2008 ) rapidly dissociated KP61F from mitotic spindles. By utilizing stable transgenic embryos that express GFP-tagged KLP61F, we were able to directly visualize the extent of dissociation of the target antigen from spindles in different regions of the syncytium relative to the injection site. This in turn allowed us to correlate the extent of KLP61F depletion with the mitotic perturbation observed, which permitted an exploration of the range of mitotic functions of KLP61F, in addition to the unseparated/collapsed spindle pole phenotype seen after severe inhibition or loss of function (Heck et al., 1993 ; Sharp et al., 1999b ). Our results with respect to spindle pole spacing were confirmed by the observation of similar defects in klp61f mutants, including spindle collapse into monoasters after severe loss of KLP61F function and the observation that spindles depleted of ~30–50% of KLP61F by antibody injection, like those in embryos carrying half the normal dose of the corresponding gene, are shorter than normal at metaphase, but complete anaphase successfully (Figures 1 and 3). Relation to Previous Experimental Studies of Kinesin-5 in Drosophila Syncytial Embryos The results support previous studies suggesting that KLP61F is dispensable for the assembly of astral syncytial embryo spindles but is essential for prometaphase spindle maintenance where it is antagonized by Ncd (Sharp et al., 1999b ; Sharp et al., 2000a ; Tao et al., 2006 ), and it further implicates the action of an additional, unknown, inward motor, possibly the MT depolymerase, KLP10A. Previously, we had predicted that KLP61F has a role in driving poleward flux and that the rate of anaphase B is insensitive to changes in the concentration of KLP61F (Brust-Mascher et al., 2004 ; Wollman et al., 2008 ). Here, we confirmed these predictions experimentally and also obtained evidence for new roles in controlling the length of the metaphase spindle, the tightness of chromosome congression, the distance between sister kinetochores, the rate of anaphase A chromatid-to-pole motility, and the relative timing of anaphase A and B, none of which have been reported previously. For example, in spindles lacking Ncd function, spindle collapse driven by the inward force-generators can be circumvented and the results, specifically the failure or delay of spindle elongation, then reveal a role for KLP61F-driven sliding in anaphase B. In the spindles that fail to elongate, we propose that there is enough KLP61F to prevent spindle collapse but not enough to drive anaphase B (Figure 2). An outward, KLP61F-driven ipMT sliding, coupled to MT depolymerization at the poles, could also underlie KLP61F's contribution to poleward flux as predicted, but not previously tested (Brust-Mascher and Scholey, 2002 ). Perhaps surprisingly, after partial KLP61F inhibition, flux is slowed down in all MT bundles, suggesting that all bundles are interconnected. We propose that the sliding apart of ipMTs by KLP61F would also push kMTs poleward, if the kMTs are mechanically coupled to the sliding ipMTs by KLP61F (or other) MT–MT cross-linkers (Figure 6; Sharp et al., 1999a ; Cheerambathur et al., 2008 ). Depolymerization of ipMTs and kMTs at the poles would then create KLP61F-dependent poleward flux. The slowing down of anaphase A chromatid-to-pole motility resulting from KLP61F inhibition could be explained by the loss of flux because it has been shown that in Drosophila embryos, anaphase A is driven by a combined flux-pacman mechanism (Rogers et al., 2004 ). Surprisingly, we also observed a loss of tension between sister kinetochores and a loss of congression. This may also be due to the loss of flux; i.e., normally as MTs slide outward, they generate a pulling force on the kinetochores. This force maintains the tension between sister kinetochores so that the loss of this outward sliding leads to a decrease in the distance between sisters. Relation to Experimental Studies of Kinesin-5 Function in Other Systems A comprehensive review of the diverse roles of kinesin-5 in different systems is beyond the scope of this article (see Valentine et al., 2006a ; Civelekoglu-Scholey and Scholey, 2007 ), but a brief comparison between Drosophila syncytial embryos and other systems is appropriate. Live imaging of klp61f mutant late embryos and larvae (Figure 3) revealed that bipolar spindles collapse into monoasters, as observed in syncytial embryos, and also revealed the formation of functional bipolar, monoastral spindles, as observed previously in studies of fixed larval brains and testis (Wilson et al., 1997 ) but not in anti-KLP61F–injected embryos. It is plausible that this reflects the formation of bipolar monoastral spindles by chromosome-directed, KLP61F-mediated acentrosomal pole focusing as proposed for Drosophila S2 cells, based on RNA interference (RNAi) experiments (Goshima and Vale, 2003 ). We hypothesize that the syncytial embryo spindles that we study are adapted for rapid mitoses (Civelekoglu-Scholey et al., 2006 ) and consequently their assembly and function is dominated by the centrosome-directed pathway so that, unlike in larvae, S2 cells and many other cells, the chromosome-directed spindle assembly pathway plays a lesser role (Heald et al., 1997 ; Megraw et al., 2001 ). The relative importance of centrosome- versus chromosome-directed spindle organization may be a key factor influencing how kinesin-5 is deployed in different spindles, e.g., in spindle pole focusing versus spindle pole separation, though it is obviously not the only factor. Our results showing that KLP61F appears to contribute to the length of pre-anaphase B spindles in living embryos differ from the results obtained in fixed S2 cells where, above the critical concentration required for bipolar spindle assembly, increasing concentrations of KLP61F did not influence metaphase spindle length (Goshima et al., 2005b ). Although we cannot rule out an influence of technical differences between the two studies, we suspect that they reflect genuine system-specific differences, akin to the different consequences of loss of Ncd function in embryos versus cultured cells of Drosophila (Sharp et al., 2000a ; Goshima et al., 2005a ; Morales-Mulia and Scholey, 2005 ). The Drosophila embryo spindle differs from that of S. cerevisiae in using only one, rather than two kinesin-5 motors. In both these systems, kinesin-5 function is required to prevent kinesin-14–induced prometaphase spindle collapse, but only in yeast is it essential for de novo spindle assembly (Saunders and Hoyt, 1992 ). As in Drosophila embryos, the kinesin-5 motors appear to contribute to metaphase spindle length (Saunders et al., 1997 ) and to anaphase spindle elongation (Saunders et al., 1995 ; Straight et al., 1998 ). In some respects, therefore, the manner in which kinesin-5 is deployed in Drosophila embryos resembles yeast more closely than S2 cells, again possibly reflecting the relative dominance of centrosomes and spindle pole bodies. In other respects the two systems may differ significantly; for instance, although kinesin-5 contributes to congression in Drosophila embryos and yeast, the proposed underlying mechanisms are very different (Figure 6 and Gardner et al., 2008 ). The kinesin-5 inhibitor, monastrol, unfortunately does not interfere with purified kinesin-5 motility or mitosis in Drosophila (our unpublished observations), but it has been used extensively to probe the diverse functions of vertebrate kinesin-5. This work, together with complementary approaches, has revealed multiple roles for kinesin-5 in early spindle assembly, in spindle maintenance (in Xenopus but not cultured cells), in the sorting out of antiparallel chromosome nucleated MTs, and in the proper biorientation of chromosomes (Gaglio et al., 1996 ; Walczak et al., 1998 ; Kapoor et al., 2000 ; Uteng et al., 2008 ). Roles for kinesin-5 in driving poleward flux and controlling metaphase spindle length were proposed previously for Xenopus extract spindles (Miyamoto et al., 2004 ; Shirasu-Hiza et al., 2004 ) as we report here for Drosophila embryo spindles. In vertebrate cultured cells, in contrast, the inhibition of Eg5 function led to only a 25% decrease in the rate of poleward flux while spindle length remained normal, suggesting only minor contributions from kinesin-5 (Cameron et al., 2006 ). The latter authors also reported a small decrease in kinetochore–kinetochore spacing after Eg5 inhibition, supporting the notion that kinesin-5 may contribute to the exertion of poleward tension on kinetochores in some systems, including the Drosophila embryo (Figure 5). Thus kinesin-5 motors have diverse, often essential mitotic functions in many systems including the Drosophila embryo. However, in Dictyostelium amoebae and C. elegans embryos, kinesin-5 motors play relatively minor mitotic roles, and they appear to be dispensable for mitosis (Saunders et al., 2007 ; Tikhonenko et al., 2008 ). Thus, mitotic motors can be deployed to carry out very different functions in different types of spindles, presenting a challenge for the formulation of general, integrated models of kinesin-5 function. Relation to Studies of the Mechanism of Kinesin-5 Action The multiple mitotic functions of KLP61F in Drosophila syncytial embryos can be explained by a model in which KLP61F cross-links adjacent MTs throughout the spindle (Figure 6), in accordance with the biochemical properties of the purified native embryonic or recombinant baculovirus-expressed motor (Cole et al., 1994 ; Kashina et al., 1996a ; Kashina et al., 1996b ; Tao et al., 2006 ; Van den Wildenberg et al., 2008 ), its immunolocalization (Sharp et al., 1999a ), and its localization and dynamics in living cells (Cheerambathur et al., 2008 ). By cross-linking parallel MTs, KLP61F would “zip” together adjacent MTs to bundle and organize spindle MTs but would generate no force on them. By cross-linking and moving toward the plus ends of antiparallel ipMTs, however, it would persistently slide them apart to drive poleward flux when ipMT sliding is balanced by ipMT depolymerization at spindle poles, or to exert outward forces on spindles poles when ipMT depolymerization is turned off (McIntosh et al., 1969 ; Sharp et al., 1999a ; Van den Wildenberg et al., 2008 ). In the current study, this KLP61F-driven, outward ipMT sliding was directly visualized by monitoring fluorescent tubulin speckle behavior (Brust-Mascher and Scholey, 2002 ; Brust-Mascher et al., 2004 ), and it appeared to be antagonized by the inward sliding of ipMTs during prometaphase. In support of this, the decrease in pole–pole separation after KLP61F inhibition occurs without spindle pole disorganization, as documented in embryos expressing GFP-CNN, and is accompanied by an increase in microtubule density, as revealed by an increase in fluorescent tubulin intensity at the central spindle, consistent with the proposed sliding filament mechanism (Figure 2). The results obtained with sponge and scrambled mutants, which have severely disrupted cortices, suggest that cortical organization, and by inference, the pulling activity of cortical force generators, do not play a significant role in spindle pole separation after NEB. This is consistent with the hypothesis that KLP61F-driven outward sliding of ipMTs, augmented at specific stages by other ipMT-bound proteins such as KLP3A (Kwon et al., 2004 ), plays a central role. We do not want to extrapolate our sliding filament model for kinesin-5 function in Drosophila syncytial embryos (Figure 6) to other systems where the molecular mechanism of action of kinesin-5 may be different (e.g., Kapoor and Mitchison, 2001 ; Gardner et al., 2008 ). The slow, plus-end–directed motility of kinesin-5 was first observed using recombinant motor domain subfragments of Eg5 (Sawin et al., 1992 ), and subsequently, in elegant, pioneering work, full-length recombinant Eg5 was shown to cross-link and slide adjacent MTs (Kapitein et al., 2005 ). The dynamic behavior of Eg5 in spindles is also consistent with it driving MT–MT sliding (Uteng et al., 2008 ), but to our knowledge, the ultrastructure of Eg5 and the properties of the purified native protein have not been studied. The yeast kinesin-5, Kip1, has a bipolar ultrastructure, like the Drosophila embryo protein, but its motility properties have not been described (Gordon and Roof, 1999 ). Elsewhere, many kinesin-5 motors have not been purified and studied biochemically, so it is possible that some kinesin-5 motors may act differently, e.g., serving as MT depolymerases (Gardner et al., 2008 ) or sliding MTs relative to a spindle matrix (Kapoor and Mitchison, 2001 ). Purified kinesin-5 motors have been shown to slide MTs over inert surfaces but not to depolymerize MTs (Sawin et al., 1992 ; Tao et al., 2006 ). Further biochemical studies on the mechanism of action of purified kinesin-5 from multiple systems would obviously be fruitful. Relation to Quantitative Force-Balance Models for Mitosis System level and reductionist models have been used to describe spindle pole separation after NEB in Drosophila (Brust-Mascher et al., 2004 ; Goshima et al., 2005b ; Wollman et al., 2008 ; Civelekoglu-Scholey, Tao, Wollman, Brust-Mascher, and Scholey, unpublished data). In Wollman et al. (2008) a system level search was undertaken to find the integration of force generator activity that best explains spindle pole separation versus time in embryos, using experimental data to constrain the number of plausible models. Although ~1000 models emerged from the search, these models fall into only six groups, which fit the experimental data, including most of the data on spindle pole separation presented here, remarkably well. Some discrepancies remain, e.g., the timing of activation of Ncd, but this is perhaps not surprising, given the inherent assumptions in such a broad theoretical approach. Reductionist models were developed for Drosophila S2 cells (Goshima et al., 2005b ) and for embryos (Brust-Mascher et al., 2004 ; Civelekoglu-Scholey, G., Tao, L., Wollman, R., Brust-Mascher, I., and Scholey, J.M., unpublished data). In the former model, which complemented novel high throughput microscopy and RNAi experiments, coupling of outward ipMT sliding to MT depolymerization at the poles was introduced to account for the lack of effect of KLP61F concentration on spindle length. We now understand that this model contains a physically unrealistic assumption in which an inward motion of the poles through increased depolymerization, coupled to decreased sliding, occurs in the absence of an associated force to drive this movement. Of course, this imperfection would remain undetected if the same mechanism had been described only qualitatively, but it shows that this model requires further refinement to explain the KLP61F concentration versus spindle length data. The latter models were based closely on experimental data, including the dynamics, physical properties, and geometry of embryo spindles as well as the biochemical properties of MTs and mitotic motors. The Anaphase B (Brust-Mascher et al., 2004 ) model made several useful predictions, for example, that although KLP61F acts on dynamically unstable ipMTs to drive steady spindle elongation, the actual rate of anaphase B is insensitive to KLP61F concentration, in agreement with the experimental data presented here. It also predicts a spatial gradient of MT catastrophe events at anaphase B onset, which would be difficult to formulate using purely qualitative arguments and which is now being tested (Cheerambathur et al., 2007 ). In the prometaphase model, the proposed KLP61F and Ncd force balance was subjected to a focused reductionist analysis of individual spindles and purified proteins (Civelekoglu-Scholey, Tao, Brust-Mascher, Wollman, and Scholey, unpublished data). This work supports the idea that the KLP61F/Ncd force balance engages immediately after NEB as dynamic ipMT bundles start to form, whereas the subsequent elongation phase requires additional outward acting force-generators, for example, KLP3A on chromosome arms and cortical dynein (Sharp et al., 2000a ; Kwon et al., 2004 ). Although all modeling approaches are useful and merit further refinement and exploration, reductionist models that pay close attention to both experimental evidence and physical chemical principles provide the most useful and realistic descriptions of mitosis. Our own force-balance models for spindle assembly and function are based on the premise that antagonistic motors mediate the relative sliding (e.g., kinesin-5 and -14) or depolymerization (e.g., kinesin-13) of ipMTs, but other scenarios are also possible. For example, a new “slide-and-cluster” model for anastral spindle assembly proposes that kinesin-5 transports MTs that assemble around chromosomes to the spindle poles where they are focused by minus-end–directed motors (Burbank et al., 2007 ). This model apparently generates a force-balance capable of maintaining a robust steady-state spindle length, and it provides a plausible description of chromosome-directed, anastral spindle assembly in Xenopus extracts and elsewhere, possibly including Drosophila late embryos/larvae and S2 cells. However, in Drosophila syncytial embryo spindles, where MT assembly is nucleated predominantly at centrosomes, MTs turn over very quickly (half-time, 5 s), and it seems unlikely that kinesin-5 would have sufficient time to transport MTs to the spindle poles for focusing there (Cheerambathur et al., 2007 ). This suggestion again implies that kinesin-5 is deployed in different ways in different systems. Summary This comprehensive study has revealed the spectrum of mitotic functions of the kinesin-5 motor, KLP61F, in a single system, the Drosophila embryo. We conclude that KLP61F has multiple mitotic functions, all of which are consistent with its MT–MT cross-linking and antiparallel MT sliding activity. Distinct Roles for FOXP3+ and FOXP3- CD4+ T Cells in Regulating Cellular Immunity to Uncomplicated and Severe Plasmodium falciparum Malaria Abstract Failure to establish an appropriate balance between pro- and anti-inflammatory immune responses is believed to contribute to pathogenesis of severe malaria. To determine whether this balance is maintained by classical regulatory T cells (CD4+ FOXP3+ CD127-/low; Tregs) we compared cellular responses between Gambian children (n=124) with severe Plasmodium falciparum malaria or uncomplicated malaria infections. Although no significant differences in Treg numbers or function were observed between the groups, Treg activity during acute disease was inversely correlated with malaria-specific memory responses detectable 28 days later. Thus, while Tregs may not regulate acute malarial inflammation, they may limit memory responses to levels that subsequently facilitate parasite clearance without causing immunopathology. Importantly, we identified a population of FOXP3-, CD45RO+ CD4+ T cells which coproduce IL-10 and IFN-γ. These cells are more prevalent in children with uncomplicated malaria than in those with severe disease, suggesting that they may be the regulators of acute malarial inflammation. Author Summary While Tregs have been implicated in regulation of the immune response to chronic infections, their potential in determining disease outcome in acute infections is unclear. In this study we have found that Tregs are unable to control the florid inflammation during acute, severe P. falciparum malaria infections, suggesting that this component of the immunoregulatory arsenal may be rapidly overwhelmed by virulent infections. Further, we identified, for the first time in an acute human infection, a population of IL-10-producing Th-1 effector cells and found that IL-10-producing Th-1 cells were associated with development of uncomplicated as opposed to severe malaria, leading us to suggest that such “self-regulating” Th-1 cells may contribute to clearing malaria infections without inducing immune-mediated pathology. In addition, we found evidence that malaria-induced Tregs may limit the magnitude of malaria-specific memory responses detectable 28 days later, which may reduce the risk of immune-mediated pathology upon reinfection and may explain how immunity to severe disease can be gained after as little as one or two infections. We conclude that vaccines designed to induce cell-mediated responses should be assessed for their ability to induce IL-10 producing Th-1 cells and Tregs. Introduction The clinical spectrum of P. falciparum infection ranges from asymptomatic parasite carriage to a febrile disease that may develop into a severe, life-threatening illness. The factors that determine disease severity are not completely understood but are likely to include both parasite and host components [1]–[3]. Ultimately, the interplay between the parasite and the immune response likely determines the outcome of the infection [4]. Although sterile immunity - completely preventing re-infection - is hardly ever seen and protection against clinical symptoms of uncomplicated disease is only acquired after repeated infections [5], immunity to severe disease and death may be acquired after as few as one or two infections [6] suggesting that different immune mechanisms underlie these different levels of immunity. While there is a growing consensus that killing of malaria parasites or malaria-infected red blood cells requires the synergistic action of antibodies and cell-mediated immune responses [5],[7], the mechanisms conferring protection against severe disease are less clear. Given that pathology of severe disease has repeatedly been linked to sustained and/or excessive inflammatory responses [4], acquiring the ability to regulate these responses adequately may be a key determinant of immunity that protects against severe disease [8]. Thus, while an early inflammatory response is needed to control parasite replication in human P. falciparum malaria [9]–[11], excessive levels of pro-inflammatory cytokines such as TNF-α [12]–[15], IFN-γ, [16],[17], IL-1ß and IL-6 [14],[18],[19] are associated with severe pathology. Conversely, low levels of regulatory cytokines such as TGF-ß have been associated with acute [20] and severe malaria [21],[22], a relative deficiency in IL-10 was seen in those who succumbed to severe malaria [23], significantly lower ratios of IL-10 to TNF-α were found in patients with severe malarial anaemia [24],[25], and high ratios of IFN-γ, TNF-α and IL-12 to TGF-ß or IL-10 were associated with decreased risk of malaria but increased risk of clinical disease in those who became infected [26]. In summary, therefore, immunity against severe malaria may depend upon the host's ability to regulate the magnitude and timing of the cellular immune response, allowing the sequential induction of appropriate levels of inflammatory- and anti-inflammatory cytokines at key stages of the infection. Given these associations between severe disease and exacerbated immune pathology, a number of studies have explored the role of CD4+CD25hiFOXP3+CD127-/lo regulatory T cells (Tregs) in determining the outcome of malaria infection. Induced and/or activated in response to malaria infection [27], Tregs may be beneficial to the host in the later part of the infection - when parasitaemia is being cleared - by down-regulating the inflammatory response and thereby preventing immune-mediated pathology. On the other hand, if Tregs mediate their suppressive effects too early, this could hamper the responses required for initial control of parasitaemia, permitting unbridled parasite growth which may also lead to severe disease. Malaria-specific induction of Tregs has been observed in a variety of experimental malaria infections in mice [28]–[31], but their role in preventing severe malarial pathology is unclear. Thus, in BALB/c mice infected with a lethal strain of P. yoelii, ablation of Treg activity by depletion of CD25+ cells either allowed mice to control parasitaemia and survive [32] or had no impact on the course of disease [33]. Depleting CD25+ cells of BALB/c mice infected with either P. berghei NK65 [34] or P berghei ANKA [35] reduced neither parasitaemia nor mortality, but increased the severity of symptoms in the diseased mice, suggesting at least some benefit from Tregs in this model. Rather oddly, infection of CD25+ T cell-depleted BALB/c mice with P. chabaudi adami DS led to increased parasitaemia and more severe anaemia [30]. Finally, CD25+ T cell depletion around the time of parasite inoculation reduced the incidence of experimental cerebral malaria in C57BL/6 mice infected with P. berghei ANKA in two independent studies [29],[31], but not when CD25+ T cells were depleted 30 days prior to infection [31]. Whilst various explanations have been offered for these discrepant results, including differences in the various strains of mice and parasites employed, the microbial microenvironment in which the mice are kept, and the precise CD25 depletion protocols employed, these studies are currently not very helpful when trying to understand the role of Tregs in human malaria infections. Malaria naive individuals undergoing experimental P. falciparum sporozoite infection showed an increase in FOXP3 mRNA expression and expansion of Tregs 10 days after infection; Treg induction correlated with high circulating levels of TGF-ß, low levels of pro-inflammatory cytokines and rapid parasite growth [27] suggesting - but not proving - that Treg activation early in infection may inhibit the development of effective cellular immunity. More recently, we have observed that Treg populations appear to be transiently expanded and activated during the malaria transmission season in individuals from a malaria endemic community [36], again suggesting that naturally acquired malaria infection can drive the expansion and activation of Tregs. However, although Tregs have been implicated in IL-10-mediated down-regulation of Th1-like responses in the placenta of malaria-infected women [37] and reduced Treg frequencies and function have been linked to enhanced anti-parasite immunity in certain ethnic groups in West Africa [38] the potential for Tregs to influence the clinical outcome of malaria infections is still unclear. To investigate the role of Tregs during clinical malaria infection, we have compared cellular immune responses of children with either severe or uncomplicated malaria. Interestingly, although we did not observe any significant differences in Treg numbers or function between severe and uncomplicated malaria cases, our data do indicate that malaria-induced Tregs may limit the magnitude of malaria-specific Th1 memory responses and thus moderate pro-inflammatory responses to subsequent infections, providing a possible explanation for the very rapid acquisition of immunity to severe malaria. Moreover, we have identified a population of FOXP3-, CD45RO+, CD4+ T cells which co-produce IL-10 and IFN-γ and which are more prevalent in children with uncomplicated malaria than in those with severe disease. We suggest that these IL-10 producing effector T cells may contribute to clearing malaria infection without-inducing immune-mediated pathology. Results Immune responses of 59 Gambian children with severe P. falciparum malaria were compared with those of 65 children with uncomplicated clinical malaria and with 20 healthy (control) children of similar age and recruited from the same study area at the same time (Table 1). On admission, only 12 (9.4%) patients had a white blood cell count (WBC) above the age-specific norm and there was no significant difference in median WBC count between uncomplicated and severe cases, suggesting that few if any children had a concomitant systemic bacterial infection. No difference was observed in the differential WBC between the two groups. As expected, [39]–[41] numbers of lymphocytes, CD3+ and CD4+ T cells were significantly lower during the acute disease than during convalescence in both the severe and the uncomplicated groups (Table 1A). Parasite density on admission was two-fold higher in patients with severe malaria than in those with uncomplicated malaria; severely ill children also had significantly lower hemoglobin levels and were on average 2.4 years younger than children with uncomplicated disease. The number of P. falciparum clones per clinical isolate ranged from 1 to 4 (with an overall mean of 2 (CI95%: 1.8–2.1)), and – as has been observed previously [42] - did not differ significantly between the three groups (p=0.3, Table 1B). Other factors potentially confounding immune responses, such as the degree of malnutrition or intestinal helminth infections were of similarly low prevalence in both severe and uncomplicated malaria cases and were not associated with severity of disease (Table 1B). Characteristics of study participants (A) and distribution of potential confounding factors (B). For statistical analysis, patients were classified as uncomplicated or severe cases, with the latter being further subdivided into those patients suffering from cerebral malaria (CM), severe anaemia (SA) or severe respiratory distress (SRD) (grouped together as SA) and those suffering only from severe prostration (grouped as SB). Data were analysed using linear regression, with a random effect to allow for the within subject measurements over time, adjusting for age, sex, duration of prior symptoms and numbers of clones causing the infection. Due to the multiplicity of comparisons that were made within the model, resulting from multiple responses and multiple comparisons within response, hypotheses rejected with a probability of less than 0.012 have a false discovery rate of 5% [43]. Similar numbers and proportions of Tregs among uncomplicated and severe malaria cases, increasing during convalescence We hypothesized that children with severe malaria would have fewer circulating Tregs than children with uncomplicated malaria, or that Tregs of severely ill children would be less active than those of children with uncomplicated disease. However, the proportion of cells expressing a Treg phenotype (defined by flow cytometry as CD3+CD4+ lymphocytes being FOXP3+and CD127-/low; Figure 1A, 1B, and 1C) was similar in the acute (D0) and the convalescent phase (D28) for both uncomplicated and severe cases (SA+SB) and in healthy control children; on average, 2–3% of CD4+ T cells expressed the regulatory phenotype (Figure 1D). However, when the number of cells expressing a Treg phenotype was calculated using lymphocyte and monocyte counts from the differential WBC, we found that the absolute numbers of Tregs (per litre of blood) were significantly and similarly elevated in both severe and uncomplicated malaria cases during convalescence when compared to the acute phase (p<0.001) or when compared to the control group of healthy children (p=0.037) (Figure 1E). A similar kinetic was observed for FOXP3 mRNA levels (Figure 1F). Similar number and proportion of Tregs in uncomplicated and severe disease, increasing during convalescence. Although not supportive of our original hypothesis, these observations are consistent with the notion that acute malaria infection drives expansion of Treg populations which then persist for some weeks to maintain immune homeostasis during the contraction phase of the effector response [36]. In accordance with this notion, and in agreement with our previous observation that increased levels of Tregs were associated with faster parasite growth during the early stages of blood stage infection [27], we observe here in - children with either severe or uncomplicated malaria infections - a significant positive correlation between parasite density and the frequency of Tregs within the CD4+ T cell population (p=0.002, Figure 2). Proportion of T cells expressing a Treg phenotype correlates with parasitaemia. Tregs display an activated memory phenotype during acute disease Since fully differentiated Tregs predominantly express an activated/memory phenotype [44] T cells from children with severe and uncomplicated malaria were analysed for expression of CD45RO (Figure 3A and 3B). In both uncomplicated and severe cases, the proportion of all T cells expressing CD45RO was significantly higher (p<0.001) during acute infection than during convalescence (data not shown). Irrespective of disease severity, more than 90% of Tregs expressed CD45RO during the acute phase of infection but expression of this marker decreased significantly (to approx 70%) during convalescence (Figure 3C). Likewise, the median fluorescence intensity (MFI) of CD45RO staining was 1.5 fold higher (p=0.0025) during acute disease than during convalescence (Figure 3D). Taken together, these data indicate that in both uncomplicated and severe cases of malaria Tregs are predominantly of a memory phenotype and are activated during acute malaria infection. Tregs display an activated memory phenotype during acute disease. Similar Treg function in uncomplicated and severe malaria patients Three different indicators were used to assess the regulatory potential of Tregs during acute malaria infection. Firstly, using a classical anti-CD25 depletion assay, we assessed the ability of Tregs to suppress P. falciparum shizont extract (PfSE)-driven lymphocyte proliferation. Anti CD25 treatment removed approximately half (geometric mean 48.8%; CI95%: 41–58%) of the CD4+ T cells that were FOXP3+CD127-/low and this was associated with a 1.76 fold and 1.57-fold (geometric means) increase in PfSE-induced lymphoproliferation in severe and uncomplicated cases, respectively, with no significant difference between the groups (p=0.343, Figure 4A). Tregs from severe and uncomplicated cases have similar functional capacity. Next, since reduced expression of SOCS-2, a member of the suppressors of cytokine signaling family confined to Tregs [45], has been linked to impaired Treg function in Africans [38] we compared SOCS-2 mRNA levels among severe and uncomplicated cases. While SOCS-2 levels were found to be significantly reduced during acute disease compared to convalescence (p=0.0076), no difference was observed between those with severe and those with uncomplicated malaria (data not shown). Finally, since high concentrations of TNF-α have been reported to impair Treg activity (by upregulating and then signaling via TNFR2, leading to decreased FOXP3 mRNA and protein expression [46]), and the functional impairment of Tregs observed in rheumatoid arthritis patients can be reversed by anti-TNF-α antibodies [47], we considered the hypothesis that the high levels of TNF-α seen in severe malaria patients [13],[15], might upregulate TNFR2 and impair Treg function. TNFR2 expression on Tregs was assessed by flow cytometry (Figure 4B and 4C). However, although a significantly higher proportion of Tregs expressed TNFR2 (Figure 4D) - with higher MFI (data not shown, p=0.028) - during acute disease compared to convalescence, no difference was seen in TNFR2 expression on Tregs between severe and uncomplicated cases (Figure 4D). Moreover, there was no correlation between plasma levels of TNF-α and TNFR2 expression on Tregs, neither among severe or uncomplicated cases nor among all cases combined; neither did we observe any inverse correlation between TNF-α concentration and FOXP3 expression. Rather, the MFI of TNFR2 on Tregs was positively correlated with the MFI of FOXP3 in Tregs (r: 0.476; p<0.0001). Thus, our data seem to be more in line with data from mice suggesting that the interaction of TNF-α with TNFR2 on Tregs promotes their expansion and upregulation of FOXP3 [48] than with the data from studies of human rheumatoid arthritis. Stronger Th-1 responses observed in severe compared to uncomplicated cases are balanced by IL-10 Since the balance of T-effector to Treg responses is likely to be as important, or more important, than the absolute levels of either [26], we compared the ratio of the levels of mRNA for the Th1 transcription factor T-BET with those for FOXP3, currently considered the best marker for Tregs, and the ratio of T-effector cells (defined as CD3+CD4+CD25+FOXP3-, T-effector) over Tregs among the various groups. As shown in Figure 5A, in all groups the T-BET/FOXP3 ratio was significantly higher during acute disease than during convalescence and a similar, albeit not significant, trend was observed for the ratio of T-effector/Tregs (data not shown). Moreover, since the absolute number of circulating T-effector cells was significantly higher in severe cases than in uncomplicated cases (p=0.01, data not shown), the T-effector/Treg ratio tended to be higher among severe cases than uncomplicated cases on day 0 (p=0.039) and a similar trend was seen for the T-BET/FOXP3 ratio (p=0.058). The ratio of FOXP3 to GATA-3 (Th2 lineage factor) mRNA was similar for both time points in all groups (data not shown) but the Th1/Th2 ratio (T-BET/GATA-3 mRNA) was significantly higher during acute disease in children with CM, SA or SRD compared to those suffering from severe prostration (p=0.0075), indicating that the expansion of the T-effector population is biased towards Th1 responses. Ratios of Th1 over Treg cells and pro-inflammatory over regulatory cytokines. These data confirm previous studies indicating a shift towards a more inflammatory response during acute and severe malaria but, significantly, our data extend the previous observations by revealing that this inflammation is not balanced by a commensurate increase in Treg function. Indeed, our data strongly suggest that the potent inflammation induced during an acute malaria infection overwhelms the normal homeostatic capacity of the immune system and, in particular, that the Treg response in children with severe malaria is insufficient to balance a much stronger Th1 effector response. To investigate further the dynamics of pro-inflammatory/regulatory responses during clinical malaria, plasma concentrations and mRNA transcripts of inflammatory (IFN-γ, TNF-α) and regulatory cytokines (IL-10) were assayed. In accordance with previous observations, plasma concentrations of IFN-γ, TNF-α and IL-10 were all significantly higher during acute disease than during convalescence, with significantly higher levels in severely ill children compared to uncomplicated cases (Figure 5B, 5C, and 5D). Levels of mRNA transcripts for IL-10 and IFN-γ were also significantly elevated in all groups during acute disease, but there was no significant difference between severity groups (Figure S1A and S1B). For both severe and uncomplicated cases, levels of IFN-γ mRNA were highly correlated with levels of IL-10 mRNA during the acute phase (severe: r=0.833 p<0.001, uncomplicated: r=0.693 p<0.001), suggesting that IFN-γ production is being balanced by IL-10 production. Interestingly, IFN-γ mRNA levels on day 0 correlated with FOXP3 mRNA on day 0 for both severe (r: 0.39 p=0.003) and uncomplicated cases (r: 0.44 p=0.0001), suggesting that IFN-γ may also be driving FOXP3 expression. The balance of pro-and anti inflammatory cytokine responses clearly changed with time, but somewhat surprisingly, there were no marked differences in cytokines ratios between children with differing levels of disease severity. Thus, the ratios of TNF-α or IFN-γ to IL-10 on day 0 were similar in all disease severity groups (Figure 5E and 5F) and ratios of IFN-γ to IL-10 mRNA were similar in all disease severity groups both on day 0 and day 28 (Figure S1F). However, IFN-γ mRNA levels were on average only 3.2-fold higher on day 0 than day 28 but IL-10 mRNA levels were 29-fold higher on day 0, resulting in a significantly lower IFN-γ/IL-10 mRNA ratio on day 0 than day 28 (Figure S1F). CD25-FOXP3- CD45RO+ T cells but not Tregs are the major source of IL-10 during acute malaria IL-10 is a crucial immunoregulatory cytokine in both human [23] and murine [33],[49] malaria; we have recently identified CD4+ T effector cells as a major source of IL-10 [33], but the source of IL-10 in human malaria infection is unknown. In other protozoal infections of mice CD4+ effector T cells that co-produce IFN-γ and IL-10 have been identified [50]–[52]. We therefore cultured freshly isolated PBMCs from 30 children with acute malaria (17 severe, 13 uncomplicated) and 20 healthy control children, with or without PMA and Ionomycin (PI), for 5 hours and analyzed them for the presence of intracellular IL-10 and IFN-γ by flow cytometry (Figure 6A). No cytokine production was observed in unstimulated cells (data not shown), and PBMC from healthy children failed to produce any IL-10 in response to PI (data not shown), indicating that stimulation with PI predominantly induces cytokine production from recently activated cells. Phenotypic characterization of cytokine-producing cells. By contrast, distinct populations of IL-10+ and IFN-γ+ cells were seen among the PI-stimulated cells from children with acute severe or uncomplicated malaria, with a small but easily distinguishable population of cells (approx 1% of all PBMC) producing both cytokines simultaneously (Figure 6A, right plot). In both severe and uncomplicated cases, IL-10 producing cells were predominantly CD45RO+ CD4+ T cells (Figure 6B and 6C) and were almost exclusively FOXP3- and CD25- (Figure 6D). Moreover, although a transient increase of FOXP3 in activated human T-effector cells has been reported [53], in our hands less than 1% (median 0.97%, CI95%: 0.67–1.27%) of IFN-γ producing cells were FOXP3+ (Figure 6E). Overall, among children with acute malaria, approx 4% of PI-stimulated CD4+ T cells produced IL-10 and approx 8% produced IFN-γ and neither the proportions of cells producing one or the other cytokine (Figure 6F) nor the ratio of IFN-γ/IL-10 producing cells (data not shown) differed significantly between severe and uncomplicated cases. However, intriguingly, the proportion of CD4+ T cells simultaneously producing IL-10 and IFN-γ was three fold higher in uncomplicated cases than severe cases (geometric mean 5.2% vs 1.6%, p=0.041, Figure 6F). Moreover, the proportion of IFN-γ+ CD4+ T cells that also produce IL-10 was almost twice as high among uncomplicated cases as among severe cases (p=0.045, Figure 6G). Taken together, these data indicate that during acute, uncomplicated or severe, malaria infections IL-10 producing cells are overwhelmingly T effector cells and that Th1 effector cells that also produce IL-10 are more prevalent in children with uncomplicated malaria than in children with severe malaria. The frequency of Tregs during acute disease is negatively associated with the magnitude of subsequent malaria-specific IFN-γ memory responses It has been reported that Tregs present at the time of infection [54] or vaccination [55] may restrict the development of subsequent Th1 memory responses. To determine whether Tregs present during acute malaria infection might similarly affect the induction of immunological memory, we compared FOXP3 mRNA levels on day 0 with malaria specific IFN-γ memory responses (as assessed by PfSE-specific cultured ELISPOT) among PBMC collected from 34 of our convalescent malaria patients (19 severe and 15 uncomplicated) on Day 28. As shown in Figure 7A, cells from uncomplicated and severe cases mounted similarly strong IFN-γ memory responses following culture with PfSE. When plotted against FOXP3 mRNA levels measured on day 0, a linear by linear hyperbolic fit revealed that higher levels of FOXP3 mRNA on day 0 were highly significantly (p=0.009) associated with lower malaria-specific IFN-γ memory responses on Day 28, suggesting that Tregs induced during the acute infection may limit the magnitude of subsequent Th1 responses (Figure 7B). For neither group could a significant effect of parasitaemia on the memory response be observed (r=-0.12 p=0.962 for severe and r=0.015, p=0.957 for uncomplicated cases). Tregs induced during the clinical episode under study limit the magnitude of subsequent malaria-specific IFN-γ memory responses. Discussion We hypothesized that the balance of inflammatory to regulatory immune responses would be biased towards a more inflammatory response in children with severe malaria than in children with uncomplicated malaria, that this balance would be restored during convalescence and – crucially – that this would be associated with differences in the proportion, absolute number or function of circulating classical (CD4+ CD127-/lo FOXP3+) regulatory T cells. In partial support of these hypotheses, the number of cells expressing a Treg phenotype and FOXP3-mRNA levels were both significantly higher during convalescence than during the acute clinical episode and the ratio of the Th1 transcription factor T-BET to the Treg transcription factor FOXP3 was significantly higher during acute disease than during convalescence in both severe and uncomplicated cases, compatible with the notion that Tregs fail to sufficiently regulate pro-inflammatory responses which might contribute to the onset of symptomatic malaria infection. Given our previous observation of Treg expansion during the pre-patent phase of malaria infection [27], we suggest that Tregs are induced/activated shortly after parasite emergence from the liver, that their numbers in peripheral blood then decline as a result of sequestration of CD4+ T cells during acute disease [40],[41],[56] and then, as has been described for other T cell subsets [39],[57], Tregs regain access to the circulation after malaria is cured. The significant positive correlation of Treg numbers with parasitaemia, as well as the correlation between FOXP3 mRNA and IFN-γ mRNA levels in acute samples, further supports the notion that the initial infection induces a proportional increase in Tregs, attempting to balance the effector T cell response, and is in line with the recently proposed concept that antigenic challenge will give rise to an antigen specific Treg response, proportional in size to the inflammatory response [58]. Moreover, the Tregs circulating during acute malaria infections almost exclusively expressed an activated memory phenotype suggesting that they have expanded from a pre-existing pool of memory T-cells. This interpretation would be in line with recent elegant work in humans demonstrating that Tregs are derived by rapid turnover of memory populations in vivo [59], and with data from murine studies where, after CD25-depletion, malaria infection very rapidly drives differentiation of Tregs from circulating mature CD4+ T cells [60]. Obviously, it would be of interest to study the relationship between Tregs and effector T cell kinetics and parasite biomass, which is not readily measurable. Future studies may explore the usefulness of P. falciparum Histidine Rich Protein 2 in this context, which has recently been suggested as a surrogate marker for parasite biomass [61]. However, despite clear evidence of Treg induction and reallocation during acute malaria infection, we could not find any robust differences in Treg parameters between children with severe and uncomplicated disease. Thus, neither Treg numbers nor FOXP3 mRNA levels differed significantly between children with uncomplicated malaria and those with severe malaria, and three different indicators of Treg function - their capacity to suppress lymphoproliferation, their expression of SOCS-2 [45] and TNFR2 [46],[48] were all similar in severely ill children and children with uncomplicated disease. Furthermore, the similar distortion in the T-BET/FOXP3 mRNA ratio during acute disease and the lack of any marked differences between the two groups in ratios of inflammatory to anti-inflammatory cytokines, as well as the close correlation between IFN-γ and IL-10 in both groups which is in line with previous observations in experimental human malaria infections [62], suggests that the systemic shift towards a pro-inflammatory immune response is similar in children with either severe or uncomplicated disease. At first glance, these data do not appear to support the hypothesis that deficiencies in Treg function underlie the tendency of some children to develop severe, life threatening malaria. However, we did observe significantly higher Th1 effector responses (more T-effector cells, higher concentrations of IFN-γ and TNF-α) in severely ill children than in children with uncomplicated disease, suggesting that the classical FOXP3+ Treg response that develops during acute malaria infection may be insufficient to balance the florid effector T cell response that develops particularly in children with severe disease. This would be in line with evidence showing that as the strength of the inflammatory stimulus increases, the suppressive capacity of human Tregs declines and the resistance of T-effector cells to regulation increases [63]. The situation observed during an acute, clinical malaria infection is thus in clear contrast to the situation in healthy, malaria-exposed individuals where Treg numbers closely track numbers of T-effectors, precisely maintaining an apparently optimal T-effectorTreg ratio [36]. IL-10 is well-established as a vital homeostatic regulator of malaria-induced inflammation that prevents immune-pathology in mice [49],[64], promotes the necessary switch from early Th1 to subsequent Th2 responses [65],[66], and has been linked to protection from severe malaria anaemia [24],[25], and death [23] in humans. However, the cellular source of IL-10 in human malaria cases was, until now, ill defined. Contrary to our expectations, but in striking agreement with observations in P. yoelii-infected mice [33], CD45RO+ CD4+ T cells (that are CD25- and FOXP3-) and not classical Tregs are the only substantial source of IL-10 during acute malaria infection. This observation is reminiscent of that made by Nylen [67] in patients with acute visceral leishmaniasis. Moreover, in our patients, a significant proportion of IL-10 producing CD4+ T cells were simultaneously producing IFN-γ, identifying them as Th1 cells. Although IL-10 secreting Th1 cells have been described recently in two murine models of toxoplasmosis [50], and cutaneous leishmaniasis [51], as far as we are aware, this is the first demonstration of IL-10 producing Th1 cells during human infections. Intriguingly, the proportion of these cells within the total CD4+ T cell population was significantly higher in children with uncomplicated malaria than in children with severe malaria suggesting that in human P. falciparum infection, as in murine T.gondii infections [50], IL-10 producing Th1 cells, activated by a strong inflammatory stimulus, may act as anti-parasitic effector cells with a “built in” control mechanism to prevent the onset of immune pathology. If so, then the ability of these self-regulating effector cells to localize to sites of parasite sequestration in tissues, where they mediate parasite killing whilst simultaneously blocking tissue damage, may be key to clinical immunity to malaria. Thus, our data strongly suggest that the percentage of IL-10-producing Th1 effector cells, rather than the cocktail of circulating cytokines, may be the most relevant biomarker of effective immunity to severe malaria. Although (Tregs may not seem to determine the outcome of current P. falciparum infections) we did find evidence that they affect the magnitude of the malaria specific memory response induced by the current infection. A similar observation has been made in P. berghei ANKA-infected mice; animals that were depleted of CD25+ cells prior to infection and drug-cured on day 5 developed significantly stronger IFN-γ memory responses on day 14 than did intact infected/cured mice, and these mice also developed much more severe, and frequently fatal, clinical symptoms upon reinfection, despite more efficient parasite clearance [35]. Thus, malaria specific Tregs acquired during a primary infection may limit the magnitude of Th1 effector responses to subsequent infections to a level that allows parasite clearance without causing immunopathology. Future studies should be designed to test the hypothesis that Tregs may contribute to the very rapid development of resistance to severe malaria. In summary, our data indicate that classical FOXP3+ Tregs are unable to control the florid inflammation that accompanies acute malaria infections and this component of the immunoregulatory arsenal is rapidly overwhelmed in children with either mild or severe malaria. Importantly however we have identified, for the first time in an acute human infection, a population of IL-10 producing Th1 effector cells which appear to be a major source of this key anti-inflammatory cytokine during acute malaria infection, and which are associated with development of uncomplicated as opposed to severe malaria. We propose that IL-10-producing Th1 cells may be the essential regulators of acute infection-induced inflammation and that such “self-regulating” Th1 cells may be essential for the infection to be cleared without inducing immune-mediated pathology. Moreover, we have found evidence in support of the hypothesis that Tregs limit the magnitude of the Th1 memory response raising the intriguing possibility that they may play an important role in the rapid evolution of clinical immunity to severe malaria. Materials and Methods Subject recruitment, study design, and study procedures A case-control study was conducted in Gambian children with severe or uncomplicated malaria, resident in a peri-urban area within a 40 km radius south of the capital, Banjul, with low levels of malaria transmission [68],[69]. Patients were enrolled at Brikama Health Centre, the MRC Fajara Gate Clinic or the Jammeh Foundation for Peace Hospital in Serekunda between September 2007 and January 2008, after written informed consent was obtained from the parents or guardians. Uncomplicated disease was defined as an episode of fever (temperature >37.5°C) within the last 48 hours with more than 5000 parasites/µl detected by slide microscopy. Severe disease was defined using modified WHO criteria [70]: SA, defined as Hb<6 g/dl; SRD defined as serum lactate >7 mmol/L; CM defined as a Blantyre coma score ≤2 in the absence of hypoglycaemia, with the coma lasting at least for 2 hours. To avoid the confounding effects of other pathogens in children with concomitant systemic bacterial infections [71], children with clinical and/or laboratory evidence of infections other than malaria were not enrolled into the study. For some experiments, healthy children of the same age and recruited from the same area at the same time of the year were enrolled as controls. In total, 59 severe, 65 uncomplicated and 20 control cases were enrolled. On admission (D0) and after 4 weeks (D28±3 days) one ml of blood was collected in RNA stabilizing agent (PAXgene™ Blood RNA system, Pre-AnalytiX) and a maximum of 4 mls of blood (mean: 3.2 mls CI 95%: 3.1–3.3 mls) were collected into heparinized vacutainers® (BD). All patients received standard care according to the Gambian Government Treatment Guidelines, provided by the health centre staff. The children's health was reviewed 7 days after admission. The study was reviewed and approved by the Joint Gambian Government/MRC Ethics Committee and the Ethics Committee of the London School of Hygiene & Tropical Medicine (London, UK). P. falciparum parasites were identified by slide microscopy of 50 high power fields of a thick film. Full differential blood counts were obtained on days 0 and 28 using a Medonic™ instrument (Clinical Diagnostics Solutions, Inc); the presence of intestinal helminths was assessed by microscopy from stool samples collected into BioSepar ParasiTrap® diagnosis system, following the manufacturers' instructions. Sickle cell status was determined by metabisulfite test and confirmed on cellulose acetate electrophoresis [72]. Cell preparation Blood samples were processed within 2 hours of collection. Plasma was removed, stored at -80°C and replaced by an equal volume of RPMI 1640 (Sigma-Aldrich). PBMC were isolated after density centrifugation over a 1.077 Nycoprep (Nycomed, Sweden) gradient (800 g, 30 min) and washed twice in RPM 1640. Cells were either stained for flow cytometry directly ex-vivo, or cultured in RPMI 1640 containing 10% human AB+ serum, 100 µg/ml streptomycin, 100 U/ml penicillin (all Sigma-Aldrich), and 2 mM L-glutamine (Invitrogen Life Technologies), referred to as complete growth medium (GM). Flow cytometry Fresh PBMC were stained using the following fluorochrome labeled mouse or rat anti-human antibodies: FITC anti-TNF-receptor II (R&D), PE anti-FOXP3 (clone PCH101), Pacific Blue anti CD3, APC-Alexa Fluor 750 anti-CD127 (all Ebioscience), APC anti-CD25, PerCP anti CD4 (BD systems), ECD anti-CD45R0 (Beckman-Coulter), and appropriate isotype controls. IL-10 and IFN-γ production by PBMC from 30 children with acute P. falciparum (D0) was assessed after 5 hours stimulation in GM containing PMA (50 ng/ml) and Ionomycin (1000 ng/ml) or GM alone. Cells were stained with FITC anti-IFN-γ, PE anti-FOXP3, PE-Cy7 anti-CD25, APC-AF750 anti-CD8, Pacific Blue anti-CD3 (all Ebioscience), PerCP anti-CD4, APC anti-IL-10 (both BD), and ECD anti-CD45R0 (Beckman-Coulter). To ascertain specificity of the intracellular cytokine staining, aliquots of some samples were incubated with saturating amounts of purified non-labelled antibody of the same clone prior to staining with the fluorochrome labeled ICS antibody. The FOXP3 staining buffer set (Ebioscience) was used following the manufacturer's protocol. Samples were acquired on a 3 laser/9 channel CyAn™ ADP flowcytometer using Summit 4.3 software (Dako). Analysis was performed using FlowJo (Tree Star Inc.). All flowcytometric analysis was performed at the MRC laboratories, The Gambia on freshly isolated cells. Multiplex analysis of plasma cytokine concentration Plasma concentrations of IFN-γ, TNF-α and IL-10 were determined for each subject and time point on the Bio-Plex® 200 system, using X-Plex™ assays (both Bio-Rad Laboratories), according to the manufacturer's instructions. Data were analysed using the Bio-Plex® Manager software. The detection limit was defined as the concentration corresponding to a fluorescence value above the mean background fluorescence in control wells plus 3 SD, being 8.76 pg/ml for IFN-γ, 5 pg/ml for TNF-α and 0.57 pg/ml for IL-10. Values below this threshold were set to these levels. Plasmodium falciparum culture and schizont antigens P. falciparum parasites (3D7 strain) were cultured in vitro as described [73] and were routinely shown to be mycoplasma free by PCR (Bio Whittaker). Schizont-infected erythrocytes were harvested from synchronized cultures by centrifugation through a Percoll gradient (Sigma-Aldrich). PfSE was prepared by two rapid freeze-thaw cycles in liquid nitrogen and a 37°C water bath. Extracts of uninfected erythrocytes (uRBC) were prepared in the same way. Proliferation assay PBMC from 10 severe and 10 uncomplicated malaria cases collected on day 0 were depleted of CD25hi cells or mock depleted using magnetic beads (Dynal Biotech, UK), at a bead to PBMC ratio of 71, and cell proliferation was determined by [3H]-thymidine (Amersham, UK) incorporation after 6 days in culture with PfSE, uRBC (RBCPBMC ratio equivalent to 21), GM, or 2 days culture with PMA (10 ng/ml)+ Ionomycin (100 ng/ml), as described [27]. Cultured ELISpot Cultured ELISPOTs were performed to assess malaria specific IFN-γ memory responses, adapting an established method [74]. Up to 1 million PBMCs collected on day 28 were cultured in 24 well plates for 6 days in 1 ml GM and stimulated with either PfSE, uRBC (RBCPBMC ratio equivalent to 21), or GM respectively. At day 3, half the medium was exchanged and rIL-2 (final concentration 20 IU/well) was added. On day 6 cells were harvested, washed three times, and 1.5×105 cells seeded into duplicate wells onto Millipore MAIP S45 plates and restimulated overnight with PfSE, uRBC (concentrations as above), GM or PHA-L (5 µg/ml). IFN-γ ELISpot was performed using MabTech antibodies according to the manufacturer's instructions. Spot forming cell numbers were counted using an ELISPOT plate reader (AutoImmuneDiagnostica, Vers. 3.2). Results are expressed as spot forming units (SFU) per million PBMC after subtraction of individual background values (GM for PHA-L, uRBC for PfSE) being deducted. Assays were discounted if the positive control (PHA-L) was <50 SFU, or the negative control was >30 SFU. RT-PCR For quantitative reverse transcription-polymerase chain reaction (RT-PCR), total RNA was extracted from PAX tubes following the manufacturer's instructions and reverse transcribed into cDNA using TaqMan® reagents for reverse transcription (Applied Biosystems), following the manufacturer's protocol. Gene expression profiles for FOXP3, IL-10, SOCS-2 and IFN-γ were measured by RT-PCR on a DNA Engine Opticon® (MJ Research) with QuantiTect SYBR Green PCR kits (Qiagen Ltd) using primers (all Sigma Genosys) previously described: IFN-γ, IL-10; FOXP3 designed by [75], and SOCS-2 designed by [76]. T-BET and GATA-3 gene expression was determined using the TaqMan® Probe kit using the primers (all Metabion) designed by [77]. 18S rRNA, amplified using a commercially available kit (rRNA primers and VIC labeled probe, Applied Biosystems), was used as an internal control. Data were analysed using Opticon Monitor 3™ analysis software (BioRad) and are expressed as the ratio of the transcript number of the gene of interest over the endogenous control, 18S rRNA. Parasite genotyping Genomic DNA from each parasite isolate was genotyped by sequencing the highly polymorphic block 2 region of the msp1 gene to assess the number of clones infecting each patient [78]. Statistical analysis Analysis was performed using linear regression, with a random effect to allow for the within subject measurements over time, where the response variables were log transformed to improve the normality and constant variance assumptions. Significance (measured at the 5% level) tests for the effects of malaria group (uncomplicated, SA or SB), time (day 0 and day 28) and their interaction were adjusted for the possible confounding effects of age, sex, duration of prior symptoms and numbers of clones causing the infection. Where there was no significant malaria group and time interaction, p-values for the overall comparison of day 0 vs. day 28 are given. Within day 0, comparisons of severe vs. uncomplicated and the two groups of severely ill patients (SA vs SB) were adjusted for any malaria group and time interactions. To allow for the multiplicity of tests resulting from multiple responses and multiple comparisons within a response performed in the model, a false discovery rate (FDR) of 5% was assumed. Using the Benjamini and Hochberg approach [43] only tests with a p-value below 0.012 have an FDR of ≤5%. Due to the large number of tests family-wise error rate correction methods were too conservative. Analyses were performed using Stata version 9 and Matlab version R2008a. Supporting Information Figure S1 mRNA levels for IFN-γ, IL-10 and Th-1 as well as Th-2 lineage transcription factors in severe and mild disease. (1.99 MB TIF) Myeloid Neoplasia: CBFß is critical for AML1-ETO and TEL-AML1 activity Abstract AML1-ETO and TEL-AML1 are chimeric proteins resulting from the t(8;21)(q22;q22) in acute myeloid leukemia, and the t(12;21)(p13;q22) in pre-B-cell leukemia, respectively. The Runt domain of AML1 in both proteins mediates DNA binding and heterodimerization with the core binding factor ß (CBFß) subunit. To determine whether CBFß is required for AML1-ETO and TEL-AML1 activity, we introduced amino acid substitutions into the Runt domain that disrupt heterodimerization with CBFß but not DNA binding. We show that CBFß contributes to AML1-ETO's inhibition of granulocyte differentiation, is essential for its ability to enhance the clonogenic potential of primary mouse bone marrow cells, and is indispensable for its cooperativity with the activated receptor tyrosine kinase TEL-PDGFßR in generating acute myeloid leukemia in mice. Similarly, CBFß is essential for TEL-AML1's ability to promote self-renewal of B cell precursors in vitro. These studies validate the Runt domain/CBFß interaction as a therapeutic target in core binding factor leukemias. Introduction RUNX1 and CBFB are very frequent targets of mutations and gene rearrangements in leukemia.1,2 Among the most commonly identified rearrangements are the t(8;21)(q22;q22) in 12% of adult acute myeloid leukemia (AML) and the t(12;21)(p13;q22) in 25% of pediatric acute lymphocytic leukemia (ALL), both of which involve the RUNX1 gene.3 The molecular consequence of these translocations is the generation of fusion proteins, AML1-ETO and TEL-AML1, respectively.4–7 AML1-ETO contains the N-terminal 177 amino acids of the Runx1 DNA-binding subunit fused in frame with nearly all of ETO (821; encoded by RUNX1T1).4,5 Runx1 is required for normal hematopoiesis.8 The function of ETO itself is not particularly well understood. Its Drosophila homolog, Nervy, interacts directly with the transcription factor daughterless, and represses the activity of enhancers normally activated by the achaete-scute complex in the sensory organ precursor cell.9 Homozygous disruption of Runx1t1 in mice resulted in gastrointestinal defects, but no hematopoietic deficiencies.10 AML1-ETO has 5 conserved domains, one from Runx1 (the Runt domain) and 4 from ETO.11,12 The eTAFH (or NHR1) domain interacts with the nuclear hormone receptor corepressor (N-CoR),13 and also with the activation domain of E proteins (E2A and HEB).14 The HHR (NHR2) domain forms an α-helical tetramer that mediates oligomerization of AML1-ETO with itself and other ETO proteins and interacts with the corepressor Sin3, Gfi1, and histone deacetylases 1 and 3.12,15–20 Nervy (NHR3) is an α-helical domain that interacts with the regulatory subunit of type II cyclic AMP-dependent protein kinase. 21,22 Myeloid-Nervy-DEAF-1 (MYND or NHR4) is a zinc-chelating domain structurally homologous to the PHD and RING finger domains and mediates interactions with N-CoR, the silencing mediator of retinoid and thyroid hormone receptor (SMRT), and the DNA binding protein SON. 16,23–26 A foreshortened splice variant of AML1-ETO lacking both the Nervy and MYND domains (AML1-ETO9a), or full-length AML1-ETO with a mutation predicted to disrupt the structure of the MYND domain and shown to impair SON binding had increased leukemogenic potential.23,27,28 The Runt domain from Runx1 mediates both DNA binding and heterodimerization with CBFß.29 Both of these properties are essential for normal Runx1 function in vivo.30 CBFß enhances the affinity of the Runt domain for DNA by 7- to 10-fold through quenching conformational exchange in several dynamic regions 31–36 and may additionally inhibit Runx1 degradation mediated by the ubiquitin proteasome pathway.37 An R174Q mutation in Runx1 found in AML of the M0 subtype and in familial platelet disorder with predisposition for AML (FPD/AML) that disrupts only DNA but not CBFß binding generated a weakly dominant negative Runx1 allele.30,38–40 Single amino acid substitutions in Runx1 that impaired CBFß but not DNA binding resulted in hypomorphic alleles that diminished but did not eliminate Runx1's in vivo activity.30 The most commonly accepted models of AML1-ETO–mediated leukemogenesis posit that it functions either as a dominant inhibitor of CBF function41–43 or as a constitutive repressor of CBF target gene expression.44 Inhibition or repression is presumably mediated by the recruitment of corepressor complexes interacting with ETO domains to genomic targets recognized by the Runt domain.11,12 However, increasing evidence in cell line models suggests that the real situation may be substantially more complex.14,45–48 For example, it was proposed that AML1-ETO may function by sequestering transcription factors (such as E-box proteins), SON, or other regulatory proteins into subnuclear compartments13,23 and that its DNA binding function may not be necessary.13 It has also been proposed that AML1-ETO functions by dysregulating the expression of E-box protein targets.14 The essential contribution of DNA binding by the Runt domain to overt leukemia was, however, recently and unequivocally established by demonstrating that an R174Q mutation that disrupts DNA but not CBFß binding abolished AML1-ETO's activity.30,49 On the other hand, the role of CBFß binding for AML1-ETO's leukemogenic activity has not been accessed. TEL-AML1 contains the N-terminal, non-DNA binding region of TEL (translocation-ETS-leukemia; encoded by ETV6) fused to nearly all of Runx1.7,50 The components of TEL retained in the fusion protein include the Pointed domain, which forms a head-to-tail polymeric structure, as well as a corepressor binding domain, which together mediate interaction with N-CoR, SMRT, and mSin3a.6,51–55 The predominant model of TEL-AML1 function posits that it recruits corepressor complexes to CBF target genes in a Runt domain-dependent manner, resulting in inappropriate gene regulation.52,56,57 Mechanisms invoking off-DNA activities including sequestration of transcriptional complexes and disruption of wild-type TEL function have also been proposed.58–60 However, recent work using the equivalent of an AML1-ETO R174Q substitution in the Runt domain that specifically disrupted DNA binding resulted in a loss of TEL-AML1 activity in B-cell precursors, demonstrating the centrality of this function.61 These mechanistic uncertainties underscore the importance of characterizing the specific contributions of each domain in AML1-ETO and TEL-AML1 to their in vivo activities. Herein we analyzed the importance of CBFß binding by the Runt domain for both AML1-ETO and TEL-AML1 function. The residues involved in DNA binding and heterodimerization are on distinct surfaces of the Runt domain, and extensive mutational analysis has been performed on both interfaces.30,34,36,62–65 Importantly, the majority of mutations in the Runt domain destabilize its structure, and negatively impact both DNA and CBFß binding.30,63,65 We used the available biophysical information to introduce and carefully characterize amino acid substitutions that selectively target CBFß binding. We incorporated these mutations into AML1-ETO and TEL-AML1 and obtained evidence that CBFß binding is critical for their respective activities. Methods Cloning, expression, and protein purification A retroviral vector expressing TEL-PDGFßR plus human CD4 (hCD4) was made by cloning a blunted EcoRI fragment from the TEL-PDGFßR cDNA into the HpaI site of the MSV2.2-IRES-hCD4 vector. The MigR1-AML1-ETO-IRES-EGFP, pMSCV-IRES-EGFP (MSCV-EGFP), and pMSCV-TEL-AML1-IRES-EGFP (MSCV-T/A) vectors were described previously.15,66 CBFß (amino acids 1-141) and the wild-type Runt domain (residues 41-190) were cloned between the BamHI and EcoRI sites and the NcoI and XhoI sites of the pHis parallel vector,67 respectively. The proteins were expressed in Rosetta(DE3) cells (Novagen, San Diego, CA) by inducing with 0.8 mM isopropyl-ß-D-thiogalactopyranoside (IPTG) at 25°C (for CBFß) or 20°C (for Runt domain) when optical density (OD)600 nm reached 0.5 to approximately 0.7, and purified on a Ni-NTA column (QIAGEN, Valencia CA). The 6 × His tag was cleaved by AcTEV (Invitrogen, Carlsbad, CA), and proteins were further purified on an S-100 column (CBFß) or SP-Sepharose column (Runt domain; GE Healthcare, Piscataway, NJ). All mutations were generated using the QuikChange XLII Site-Directed Mutagenesis kit (Stratagene, La Jolla, CA). Urea denaturation, electrophoretic mobility shift assays, and isothermal titration calorimetry measurements All assays were performed as previously described.65,68 Retroviral transduction Bone marrow (BM) harvested from C57BL/6J mice treated 7 days previously with 150 mg/kg 5-fluorouracil (5-FU; Sigma-Aldrich, St Louis, MO) was suspended in 1 mL transplant media per mouse hind-leg [RPMI, 20% fetal calf serum (FCS), 100 IU/mL penicillin, 100 µg/mL streptomycin (Mediatech, Herndon, VA), 10 ng/mL interleukin-3 (IL-3), 20 ng/mL IL-6, 20 ng/mL stem cell factor (SCF; R&D Systems, Minneapolis, MN)] and incubated overnight at 37°C in ultra-low attachment plates (Costar, Corning, NY). One milliliter of each virus (MigR1-EGFP, AML1-ETO, and mutants thereof, TEL-PDGFßR, or MSCV-hCD4) was spun (1000g for 90 minutes at 31°C) onto an ultra-low attachment 6-well plate well, previously precoated with Retronectin (100 µg/well; Takara, Madison, WI), and the viral supernatant was subsequently removed. Cells were then resuspended in fresh transplant media with cytokines at 6 × 106 cells/3 mL and centrifuged (1000g for 90 minutes) onto the virally preloaded plates with an additional 0.5 mL each virus. A second spinfection (150g for 10 minutes) was performed 12 hours later onto new virally precoated plates, and cells were incubated at 37°C for 3.5 hours. Hematopoietic assays and flow cytometry Granulocyte differentiation, BrDU staining, serial replating, and c-Kit+ B-cell progenitor colony forming assays were performed as described previously.15,25,61,66 Antibodies used were: allophycocyanin (APC), phycoerythrin (PE), or PerCPcy5.5 conjugated anti–Mac-1 (clone M1/70), APC or PerCP-Cy5.5 conjugated anti–Gr-1 (clone RB6-8C5), APC-conjugated anti-Sca1 (clone D7), PE-Cy5.5 conjugated anti-CD117 (clone 2B8), and PE-conjugated anti–human CD4 (clone L3T4; eBioscience, San Diego, CA). Transplantation assays Retrovirally transduced cells were washed and resuspended in phosphate-buffered saline (PBS). C57BL/6J males (5 to 6 weeks old) were lethally irradiated with a split dose of 5.5 Gy 3 to 4 hours apart and injected intravenously with 106 infected BM cells and 2 × 105 splenocytes from a C57BL6/J donor. Mice were monitored for development of leukemia through peripheral blood analysis for infected cells by fluorescence-activated cell sorting (FACS) and by observation of physical symptoms characterized by lethargy, anemia, and splenomegaly. Sick mice were euthanized, and all remaining mice were killed at 63 days. All animal procedures were approved by the Institutional Animal Care and Use Committee at Dartmouth College. Western blot analysis AML1-ETO and TEL-AML1 proteins were detected in retrovirally transduced NIH3T3 cells as described previously.15,25,61,66 Splenocyte extracts were prepared by lysing approximately 5 × 105 cells in 3× sodium dodecyl sulfate (SDS) sample buffer (New England Biolabs, Ipswich, MA) with 0.1 M dithiothreitol for 10 minutes on ice, boiling 5 minutes at 95°C, and the proteins were resolved through NuPAGE Novex 4%-12% Bis-Tris mini gels (Invitrogen) and transferred onto a polyvinylidene fluoride, (PVDF) membrane. Blots were probed with primary anti-ETO Ab-1 (2.5µg/mL, 2 hours; Calbiochem, San Diego, CA) and anti-actin (1:1000, 1 hour; Sigma-Aldrich) antibodies, and secondary goat anti–rabbit IgG (H + L; 1:100000, 45 minutes; Caltag Laboratories, Carlsbad, CA) horseradish peroxidase-conjugated antibody in PBS/0.2% Tween/5% nonfat milk. Band intensities were determined using ImageQuant 5.0 software. Results Generating Runt domain mutations that independently disrupt DNA and CBFß binding We compared the contribution of DNA versus CBFß binding to AML1-ETO–mediated leukemogenesis by introducing amino acid substitutions at both interfaces. An R174Q substitution caused a greater than 40000-fold reduction in DNA binding by the isolated Runt domain without perturbing heterodimerization with CBFß, its structure, or its thermodynamic stability.30 A T161A mutation, on the other hand, caused a 40-fold reduction in CBFß binding while having no effect on DNA binding.65 The T161A mutation did not perturb the Runt domain's structure as assessed in 15N-1H HSQC spectra,65 although it did moderately decrease its thermodynamic stability.30 Since the T161A mutation, when introduced into the Runx1 locus in mice, did not completely eliminate Runx1's in vivo function,30 we sought to disrupt CBFß binding further by combining it with a second mutation. The majority (approximately two-thirds) of alanine substitutions in the Runt domain perturb its fold, particularly those within the ß-barrel, one surface of which comprises the CBFß interface.30,63,65 A Y113A mutation was one of only a few that decreased the affinity for CBFß (by approximately 5-fold), while causing only minimal, local effects on the Runt domain fold as assessed in 15N-1H HSQC spectra.65 We showed that the Y113A/T161A dual mutations did not further decrease the Runt domain's thermodynamic stability compared with the T161A mutation alone (Figure 1B). The dissociation constant (K2; Figure 1C) of the Y113A/T161A mutant Runt domain for DNA was unaltered (Figure 1D). Since DNA binding is highly sensitive to mutations in the Runt domain that perturb its structure,30,63 we reasoned that the Y113A/T161A mutant Runt domain's fold is intact despite its modestly decreased thermodynamic stability. Amino acid substitutions in the Runx1 Runt domain that specifically disrupt DNA and CBFß binding. (A) Structures of the Runt domain and CBFß are shown in gray and blue, respectively, and the DNA is purple.36 The R174 side chain in the Runt domain is green, and the T161 and Y113 side chains are orange. (B) Urea denaturation monitored by tryptophan fluorescence for the wild-type Runt domain and point mutants thereof. Plotted is the fraction of unfolded Runt domain in the presence of increasing concentrations of urea. Data from the wild-type Runt domain, R174Q, and T161A are from Matheny et al.30 (C) Diagram of the potential interactions between the Runt Domain, CBFß, and DNA. (D) EMSA measuring the affinity of the Y113A/T161A Runt domain for DNA (K2). Shown is a representative example of 3 experiments. Triangles indicate decreasing concentrations of the Runt domain (2 × 10-6 to 4 × 10-15 M). Arrow indicates the lane in which the Runt domain concentration approximates K2. (E) Isothermal titration calorimetric measurements of CBFß binding to the wild-type, T161A, and Y113A/T161A mutant Runt domains. Wild-type Runt domain (45 µM) was titrated with 440 µM CBFß at 26°C, and 42 µM T161A or 38 µM Y113A/T161A were titrated with 396 µM CBFß at 22°C. In each panel, the top portion is the raw data, and the bottom panel is a plot of the binding corrected for dilution enthalpy. Experimental data (squares) are fit to a one-site binding model (line). The equilibrium dissociation constants (K1) are indicated in the plots. The average values from 2 independent measurements (± standard deviation [SD]) are given. We measured the affinity of the T161A and Y113A/T161A mutant Runt domains for CBFß by isothermal titration calorimetry (Figure 1E). The dissociation constant (K1 in Figure 1C) for the T161A mutant Runt domain was increased by 63-fold, and for the Y113A/T161A mutant by 430-fold (Figure 1E). Whereas binding of the wild-type Runt domain to CBFß is exothermic (release of heat upon binding), that of the mutant Runt domains is endothermic (absorption of heat upon binding). For this reason, the raw heat data in Figure 1E switches from being negative for the wild-type to positive for the T161A and Y113A/T161A mutant Runt domains. Apparently, the interactions mediated by T161 contribute strongly to the enthalpy of binding, and loss of these interactions results in a binding process that is no longer enthalpically favorable but remains entropically favorable, albeit of weaker affinity. Disruption of DNA or CBFß binding impairs AML1-ETO's ability to repress granulocyte differentiation. We introduced the R174Q, T161A, and Y113A/T161A mutations into AML1-ETO (Figure 2A) and expressed AML1-ETO and its mutated derivates in lineage negative (CD5-, B220-, Mac-1-, Gr-1-, Ter119-) mouse BM cells (Lin- BM). The retroviruses also expressed the enhanced green fluorescent protein (EGFP) from an internal ribosomal entry site. We cultured cells following transduction in the presence of IL-3, IL-6, and SCF for 2 days, in the same cytokines plus granulocyte colony-stimulating factor (G-CSF) for an additional 7 days to induce granulocyte differentiation, and then analyzed EGFP+ cells for Mac-1 and Gr-1 expression (Figure 2B,C). Approximately 32% of cells expressing EGFP alone were Mac-1+Gr-1+ (Figure 2C) versus 10% of those expressing AML1-ETO. Disruption of DNA binding (R174Q) or impairment of CBFß heterodimerization (Y113A/T161A) increased the percentage of Mac-1+Gr-1+ cells. A triple Y113A/T161A/R174Q mutation in AML1-ETO did not further increase the percentage of Mac-1+Gr-1+cells compared with the individually mutated AML1-ETO proteins, indicating that the residual inhibitory effect of AML1-ETO resides in more C-terminal ETO sequences, and is not mediated through CBF binding sites. The mutated AML1-ETO proteins accumulated to similar steady state levels in transduced NIH3T3 cells (Figure 2E). AML1-ETO function is impaired by mutations that disrupt DNA or CBFß binding. (A) Diagram of AML1-ETO and location of the mutations affecting DNA (R174Q) and CBFß binding (T161A and Y113A/T161A). (B) Lin- BM cells infected with MigR1 retroviruses expressing AML1-ETO and mutated derivatives after 7 days of culture in the presence of IL-3, IL-6, SCF, and G-CSF. EGFP+ gated cells (not shown) were analyzed for Gr-1 and Mac-1 expression. (C) Summary of data illustrated in panel B compiled from 2 experiments, each with triplicate samples. **P ≤ .01 compared with AML1-ETO (#); ***P ≤ .001 (Dunnett test and analysis of variance [ANOVA]). (D) Serial replating analysis. Graphs represent the average number of colonies from each round of replating in the presence of IL-3, IL-6, and SCF. Week 1 represents colony numbers per 103 cells plated, weeks 2 and 3 are from 104 plated cells. Numbers are averaged from 2 experiments, each containing triplicate samples. (E) Western blot analysis probed with an antibody to the Runt domain, demonstrating expression of AML1-ETO and its mutated derivatives in MigR1-transduced NIH3T3 cells. (F) BrdU incorporation 48 hours after transduction of Lin- BM cells with AML1-ETO and its mutants. EGFP+ cells were analyzed after a 1 hour BrdU pulse for BrdU and 7-AAD incorporation. Shown is a representative of 3 experiments. (G) Average percentages of gated BrdU+ (S phase) cells from scatter plots in panel F (n = 3). Error bars indicate 95% confidence intervals. **P ≤ .01 compared with AML1-ETO (Dunnett test and ANOVA). Disruption of DNA binding or CBFß binding abrogates AML1-ETO's ability to promote self-renewal and inhibit proliferation. We assessed the importance of DNA binding and heterodimerization with CBFß for AML1-ETO's ability to confer increased self-renewal capacity to hematopoietic progenitors in vitro.69–71 Cells expressing AML1-ETO could be propagated for at least 3 weeks in methylcellulose cultures (Figure 2D) and yielded primarily immature myeloid lineage cells and a smaller percentage of differentiated macrophages15,70 (not shown). Impaired DNA or CBFß binding caused by the R174Q or Y113A/T161A mutations, respectively, completely abolished AML1-ETO's ability to sustain this clonogenic activity (Figure 2D). Interestingly, it appears that the threshold level above which AML1-ETO cannot tolerate disruption of CBFß binding and maintain clonogenicity lies above 60-fold, as the T161A mutant retains activity for at least 3 weeks in this assay (Figure 2D). AML1-ETO inhibits the short-term proliferation of human and mouse primary BM cells.70,71 The R174Q, T161A, and Y113A/T161A mutations alleviated this repressive effect, although the percentage of cells in S phase was still significantly lower than in MigR1 transduced cells (Figure 2F,G). CBFß binding is required for AML1-ETO's leukemogenic activity. Numerous murine models have demonstrated that full-length AML1-ETO alone is not capable of causing AML.69,72–76 Activating mutations in the platelet-derived growth factor family of receptor tyrosine kinases, including FLT3 and KIT, and in RAS are common in core binding factor leukemias,77–81 and correspondingly, AML1-ETO was shown to be capable of cooperating with activated forms of both platelet-derived growth factor receptor-b (TEL-PDGFßR) and FLT3 to generate AML in mice.82,83 Both aforementioned mouse studies reported that mutation of leucine 148 to aspartic acid (L148D) within the Runt domain of AML1-ETO, which disrupts DNA binding16 and presumably causes a severe disruption of domain stability,30,65 resulted in a loss of cooperation with either oncogene.82,83 These studies underscored the importance of the Runt domain for AML1-ETO's cooperative activity, but did not address whether disruption of DNA binding, CBFß heterodimerization, or both were responsible for the observed effect. We retrovirally transduced AML1-ETO (and derivative mutants) and TEL-PDGFßR into BM cells from 5-FU–treated mice (Figure 3A). The retrovirus expressing AML1-ETO produced EGFP from an internal ribosome entry site, and the TEL-PDGFßR virus produced hCD4. The percentages of doubly transduced cells (EGFP+hCD4+) were similar for AML1-ETO and various mutants thereof at 2 days after spinfection (Figure 3B). We transplanted lethally irradiated primary recipient animals with 106 retrovirally transduced, unsorted cells and monitored the animals for disease. The AML that develops in mice transplanted with BM cells expressing both AML1-ETO and TEL-PDGFßR is marked clinically by splenomegaly, hepatomegaly, thrombocytopenia, and anemia with an accumulation of myeloid blasts in the BM and an infiltration thereof in the spleen and liver.82 TEL-PDGFßR will independently generate a chronic myeloproliferative disorder (CMPD) marked by splenomegaly, hepatomegaly, and neutrophilic leukocytosis.84 We predicted that if mutations in AML1-ETO impaired its leukemogenic activity, the disease would more closely resemble the CMPD caused by TEL-PDGFßR alone. Mutations that disrupt DNA or CBFß binding impair AML1-ETO's leukemogenic activity. (A) Schematic of transplantation scheme. BM mononuclear cells harvested from 5-FU–treated C57BL/6 mice were co-infected with MigR1 expressing AML1-ETO (and its mutated derivatives) and TEL-PDGFßR. Internal ribosome entry site (IRES)–mediated expression of EGFP marks AML1-ETO expressing cells, while hCD4 marks TEL-PDGFßR expressing cells. Retrovirally transduced cells (106) were transplanted along with 2 × 105 normal BM cells into lethally irradiated mice. (B) FACS plots of BM cells 2 days after transduction to show that all were equivalently infected. (C) Kaplan-Meier survival curve of mice after transplantation with retroviruses expressing AML1-ETO (AE) or its mutated derivatives and TEL-PDGFßR (TP). The study end point was 63 days. The number of mice in each group is indicated. (D) Spleen weights of mice upon sacrifice. Significant differences from AML1-ETO plus TEL-PDGFßR recipients (#) are indicated with * (Dunnett test and ANOVA). (E) Representative plots of BM cells in transplant recipients upon sacrifice demonstrating the presence of EGFP+ and hCD4+ cells. The relatively high EGFP fluorescence in the AML1-ETO (Y113A/T161A) plus TEL-PDGFßR BM cells was observed in 19 of 24 mice. (F) Average percentage of EGFP- hCD4+ and EGFP+ hCD4+ BM cells in transplant recipients upon sacrifice. Error bars indicate 95% confidence intervals. Differences relative to AML1-ETO plus TEL-PDGFßR (#) for each group were determined using Dunnett test and ANOVA. (G) Cytospin preparations of BM cells from an AML1-ETO plus TEL-PDGFßR leukemic mouse purified based on hCD4 and EGFP expression. Mice transplanted with BM cells expressing both AML1-ETO and TEL-PDGFßR developed a completely penetrant lethal leukemia, as reported previously,82 and succumbed to their disease within 4 weeks (Figure 3C). In contrast, most control mice transplanted with cells expressing AML1-ETO alone (8/8), or TEL-PDGFßR alone (7/8) survived for 63 days, at which point they were killed for analysis. Diseased AML1-ETO plus TEL-PDGFßR transplanted mice presented with anemia, splenomegaly, and pale livers (not shown). Their BM contained blast cells characterized by a high nucleocytoplasmic ratio, heterochromatic nuclei, occasional cytoplasmic vacuoles, irregular nuclear contours, and prominent nucleoli (not shown). Blast cells were also present in peripheral blood (not shown). However the BM contained many more differentiated myeloid cells such as neutrophils with ring-shaped nuclei (characteristic of CMPD), presumably arising from the large population of EGFP-hCD4+ cells expressing TEL-PDGFßR alone (Figure 3E,F). As a result of the mixed diseases in these mice, BM differentials and histology were not useful diagnostic aids. We therefore isolated EGFP-hCD4+ and EGFP+hCD4+ cells from the BM of leukemic AML1-ETO plus TEL-PDGFßR mice and analyzed their morphology. The EGFP+hCD4+ population (expressing AML1-ETO plus TEL-PDGFßR) contained a markedly greater percentage of blast cells, consistent with AML, in comparison to the EGFP-hCD4+ population (Figure 3G). We were able to transplant AML1-ETO plus TEL-PDGFßR expressing cells into secondary recipients and reconstitute the disease, but only using large numbers (> 2 × 106) of freshly isolated splenocytes, and not with frozen cells (not shown), indicating that the percentage of leukemia initiating cells was low. Most mice transplanted with BM expressing TEL-PDGFßR plus either the non-DNA binding or non-CBFß binding AML1-ETO mutants (R174Q and Y113A/T161A, respectively) survived until the study end point of 63 days (Figure 3C). The majority of these mice had evidence of disease as manifested by enlarged spleens [15/22 for AML1-ETO (R174Q) and 22/24 for AML1-ETO (Y113A/T161A)]. Their average spleen sizes were similar to those of mice transplanted with BM expressing AML1-ETO plus TEL-PDGFßR or TEL-PDGFßR alone, and were significantly greater than those of mice transplanted with BM expressing AML1-ETO alone (Figure 3D), indicating that TEL-PDGFßR contributed significantly to the splenomegaly. A minority of the mice were anemic as evidenced by pale livers [2/22 for AML1-ETO (R174Q) and 6/24 for AML1-ETO (Y113A/T161A)], whereas anemia was more prevalent in the AML1-ETO plus TEL-PDGFßR mice (16/22). The BM of all groups contained EGFP-hCD4+ and EGFP+hCD4+ cells, but relatively few cells expressing EGFP (AML1-ETO) alone, suggesting a competitive advantage for cells that expressed TEL-PDGFßR (Figure 3E). Interestingly, most mice (19/24) transplanted with cells expressing the AML1-ETO (Y113A/T161A) mutant contained an EGFPhihCD4+ BM population, presumably resulting from a selection for cells expressing high AML1-ETO (Y113A/T161A) levels (Figure 3E). We speculate that the Y113A/T161A mutant has residual activity and that more highly expressing cells partially overcome the effects of the mutations and thus are positively selected. Despite having more EGFP+hCD4+ cells and higher AML1-ETO levels, the AML1-ETO (Y113A/T161A) plus TEL-PDGFßR group survived longer than the AML1-ETO plus TEL-PDGFßR group, thus AML1-ETO deficient for CBFß binding has greatly attenuated activity. We were unable to detect AML1-ETO protein in spleen samples from any of the transplanted mice using antibodies that recognize either ETO or the Runt domain from Runx1, although we could detect the protein in Kasumi-1 cells which contain the t(8;21), and actin was clearly visible in all of the samples (not shown). We suspect our inability to detect AML1-ETO was due to the relatively low percentage of AML1-ETO expressing spleen cells (Figure 3F). EGFP+hCD4+ BM cells (expressing AML1-ETO plus TEL-PDGFßR) from the AML1-ETO plus TEL-PDGFßR recipients contained significantly higher percentages of c-kit+ Sca-1+ cells than those expressing either of the mutated AML1-ETO proteins or TEL-PDGFßR alone (Figure 4A,B). The percentage of Gr1+Mac1+ cells in the EGFP+hCD4+ BM population was correspondingly lower in the AML1-ETO plus TEL-PDGFßR group than in any other, reflective of impaired granulocyte differentiation (Figure 4A,C). Interestingly, the FACS profiles of EGFP+hCD4+ BM cells from the AML1-ETO (Y113A/T161A) plus TEL-PDGFßR group were not identical to those of the AML1-ETO (R174Q) plus TEL-PDGFßR group and appeared to be intermediate between CMPD and AML. For instance, there were fewer Gr1hiMac1hi cells in the AML1-ETO (Y113A/T161A) plus TEL-PDGFßR group, suggesting that granulocyte differentiation was somewhat more impaired (Figure 4A). Flow cytometric analysis of BM from primary transplant recipients. (A) Whole BM of diseased mice was gated based on forward and side scatter characteristics and subsequently analyzed for expression of hCD4 (TEL-PDGFßR) and EGFP (AML1-ETO). EGFP+hCD4+ cells were then analyzed for the expression of myeloid and progenitor markers. Plots are from a single animal that is representative of the experimental group. (B) Differences in the percentage of EGFP+hCD4+ BM cells that were positive for both c-kit and Sca-1. Error bars indicate 95% confidence intervals. AE + TP, n = 21 mice; R174Q + TP, n = 22; Y113A/T161A, n = 24; TP, n = 8; AE, n = 8. Differences relative to AML1-ETO plus TEL-PDGFßR (#) for each group were determined using Dunnett test and ANOVA. (C) Differences in the percentage of EGFP+hCD4+ BM cells that were positive for both Gr-1 and Mac-1. Error bars indicate 95% confidence intervals. Differences relative to AML1-ETO plus TEL-PDGFßR (#) for each group were determined using Dunnett test and ANOVA. The percentage of Gr-1+Mac-1+ cells in mice transplanted with AML1-ETO (Y113A/T161A) plus TEL-PDGFßR transduced cells was significantly lower (P < .001) than in mice transplanted with BM expressing AML1-ETO (R174Q) plus TEL-PDGFßR or TEL-PDGFßR alone. In summary, cells expressing AML1-ETO plus TEL-PDGFßR gave rise to a rapid AML as reported previously.82 Mutations in AML1-ETO that impaired DNA binding significantly weakened its activity, resulting in an extended latency and disease specificity more closely resembling that caused by TEL-PDGFßR alone. Loss of CBFß binding also weakened AML1-ETO's activity, resulting in an extended disease latency, selection of higher AML1-ETO expressing cells, and a cell surface phenotype intermediate between AML and CMPD. These results indicate that the mutated AML1-ETO proteins were unable to appropriately regulate a subset of genes required for the development of AML. Disruption of CBFß binding by the Runt domain of TEL-AML1 abrogates its ability to enhance self-renewal of B-cell progenitors. TEL-AML1 is critically dependent upon DNA binding to enhance the self-renewal of B-cell progenitors in culture.61 To determine whether disrupting CBFß binding above a specific threshold results in functional impairment, we introduced mutations analogous to T161A and Y113A/T161A (T188A and Y140A/T188A) into TEL-AML1. As the Runt domains of AML1-ETO and TEL-AML1 are identical, we would expect the biophysical characteristics of the mutated proteins to be comparable. We transduced c-Kit+ fetal liver cells from embryonic day 12 C57BL/6 mice with retroviruses expressing TEL-AML1 or its mutated derivatives and EGFP,61,66 and serially replated them in methylcellulose under conditions that promote B-cell differentiation. Expression of similar steady-state levels of the TEL-AML1 mutant proteins in transduced NIH3T3 cells was confirmed by Western blot analysis, and retroviral transduction of c-kit+ cells was measured by EGFP fluorescence (Figure 5A,B). Cells expressing TEL-AML1 generated increased numbers of colonies and cells, which are B220 enriched, more than multiple rounds of replating relative to cells expressing EGFP alone (Figure 5C-E). A reduction in CBFß heterodimerization by greater than 400-fold through introduction of the Y140A/T188A mutations severely impaired TEL-AML1's ability to enhance the clonogenic potential of these B-cell precursors, as the numbers of cells and colonies were very similar to those of EGFP-transduced cells. Cells expressing the T188A TEL-AML1 mutant, which is expected to have an approximately 60-fold reduction in CBFß binding, produced numbers of colonies and cells similar to those of TEL-AML1–expressing cells during secondary replating, however, the numbers from the third round of replating were very significantly decreased and therefore somewhat intermediate between the TEL-AML1 and TEL-AML1 (Y140A/T188A) expressing cells. It appears that, as was the case for AML1-ETO, a decrease in CBFß binding from 60- to 400-fold is accompanied by a progressive impairment in TEL-AML1's clonogenic activity. As the thermodynamic stability and DNA binding affinity of the T118A and Y140A/T188A mutant Runt domains are identical, the decreased activity can only be explained by more severely impaired CBFß binding. Disrupting heterodimerization with CBFß substantially abrogates TEL-AML1's ability to enhance the clonogenic potential of c-kit+ B-cell precursors. (A) Detection of hemagglutinin (HA)–tagged TEL-AML1 and the T188A and Y140A/T188A mutants in NIH3T3 cells 48 hours after retroviral (murine stem cell virus [MSCV]) transduction. Vertical lines have been inserted to indicate a repositioned gel lane. (B) Flow cytometric analysis of EGFP expression in c-kit+ fetal liver cells 48 hours after infection with viral supernatants. (C) Transduced c-kit+ cells grown in methylcellulose serial replating assays in conditions promoting B-cell differentiation (IL-7, SCF, and FLT3 ligand). The number of colonies harvested from each culture, per 104 cells plated, is shown for the first (), second (), and third (¦) rounds of plating. Plots show means and SDs of duplicate cultures. The data shown are representative of 5 independent experiments. (D) Total number of cells, as in panel C. (E) Number of B220+ cells as in panel C. Discussion Assessing which protein-protein interactions mediated by oncogenic transcription factors are essential for their leukemogenic activity, and determining the structures of the domains responsible for those interactions will greatly facilitate the development of specific inhibitors by structure-aided drug design. Here, we show that ablation of DNA binding by 40000 fold30,49 or reduction of CBFß heterodimerization by greater than 400-fold partially impaired AML1-ETO's ability to inhibit granulocyte differentiation, almost completely reversed its acute inhibition of proliferation, eliminated its ability to confer serial replating to myeloid progenitors, and severely compromised its cooperation with TEL-PDGFßR to cause AML, and thus impaired or delayed AML disease. Similarly, inhibition of both DNA61 and CBFß binding by TEL-AML1 eliminated its ability to enhance the self-renewal of B-cell progenitors. We are optimistic that targeted disruption of either function may provide an effective therapeutic modality. The more promising of the 2 interactions for small molecule inhibitors is that between the Runt domain and CBFß, since the Runt domain-DNA interface is highly charged and may be difficult to target with a small molecule. The Runt domain-CBFß interface, on the other hand, contains a large hydrophobic area,36,62 which should be more conducive for binding small molecules with drug-like properties. Although small molecules targeting protein-protein interactions are relatively limited thus far, there has been an increase in activity and reported success in developing these inhibitors.85 For example, we recently developed a small molecule that binds CBFß and allosterically inhibits CBFß binding to the Runt domain, supporting the notion that disrupting this interaction with drug-like molecules is feasible.86 A potential drawback to this approach is that the inhibitors would also further impair the function of the normal core binding factors, in which case there would need to be a therapeutic window for them to be clinically useful. Alternatively, combined therapies using inhibitors targeting other functions specific for AML1-ETO (eg, oligomerization 15,49,87) may provide specificity. We find it encouraging that both AML1-ETO and TEL-AML1 rely on CBFß for their activity, and therefore a single compound may prove efficacious in both disease settings. In fact, we predict that all the Runx1 fusion proteins will require CBFß for their activity and that an inhibitor of the Runt domain-CBFß interaction may be useful for targeting all of these proteins. Although CBFß is clearly important for AML1-ETO and TEL-AML1 function, the nature of its activity is not completely understood. It has long been known that CBFß increases the Runx proteins' affinity for DNA,88,89 and it was more recently shown that it protects them from ubiquitin-mediated proteolysis.37,90 However, genetic data in both flies and worms hint at other functions. For example, overexpression of a Drosophila Runt protein that could bind DNA but not CBFß dominantly inhibited its wild-type counterpart, suggesting that it was occupying its DNA binding sites but forming nonfunctional complexes.91 More recently, it was shown that ectopic overexpression of CBFß (BRO-1) could partially restore a Runx (RNT)–deficient defect in Caenorhabditis elegans.92 Thus we speculate that CBFß participates in the formation of protein complexes that are necessary for AML1-ETO and TEL-AML1 function in vivo. If this is the case, and if the nature of those complexes differs from those formed by the wild-type Runx1-CBFß heterodimer then other protein-protein interactions specific for the fusion protein complexes could lend themselves to targeting with small molecules. It has been suggested that much of AML1-ETO's activity may be mediated in a manner that does not require binding to Runx-dependent promoters.13 Our data support the notion that some of AML1-ETO's activity indeed does not require either DNA or CBFß binding, in that ectopic (retroviral) expression of a mutant defective in both functions could still significantly repress granulocyte differentiation of BM cells in vitro. Although repression of granulocyte differentiation is a consistent feature of AML1-ETO overexpression,15,25,70,73,75 it is not entirely clear how important it is to AML1-ETO's leukemogenic function. We found several AML1-ETO mutants that were able to partially repress granulocyte differentiation but had no leukemogenic activity, nor was a granulocyte differentiation defect observed in a conditional AML1-ETO knock-in model.69 What is clear, however, from our data and that of Yan et al,49 is that DNA binding by AML1-ETO is essential for its leukemogenic activity and that this observation must be incorporated into any mechanistic model for AML1-ETO function. O-Linked N-Acetylglucosaminylation of Sp1 Inhibits the Human Immunodeficiency Virus Type 1 Promoter Abstract Human immunodeficiency virus type 1 (HIV-1) gene expression and replication are regulated by the promoter/enhancer located in the U3 region of the proviral 5' long terminal repeat (LTR). The binding of cellular transcription factors to specific regulatory sites in the 5' LTR is a key event in the replication cycle of HIV-1. Since transcriptional activity is regulated by the posttranslational modification of transcription factors with the monosaccharide O-linked N-acetyl-d-glucosamine (O-GlcNAc), we evaluated whether increased O-GlcNAcylation affects HIV-1 transcription. In the present study we demonstrate that treatment of HIV-1-infected lymphocytes with the O-GlcNAcylation-enhancing agent glucosamine (GlcN) repressed viral transcription in a dose-dependent manner. Overexpression of O-GlcNAc transferase (OGT), the sole known enzyme catalyzing the addition of O-GlcNAc to proteins, specifically inhibited the activity of the HIV-1 LTR promoter in different T-cell lines and in primary CD4+ T lymphocytes. Inhibition of HIV-1 LTR activity in infected T cells was most efficient (>95%) when OGT was recombinantly overexpressed prior to infection. O-GlcNAcylation of the transcription factor Sp1 and the presence of Sp1-binding sites in the LTR were found to be crucial for this inhibitory effect. From this study, we conclude that O-GlcNAcylation of Sp1 inhibits the activity of the HIV-1 LTR promoter. Modulation of Sp1 O-GlcNAcylation may play a role in the regulation of HIV-1 latency and activation and links viral replication to the glucose metabolism of the host cell. Hence, the establishment of a metabolic treatment might supplement the repertoire of antiretroviral therapies against AIDS.   Human immunodeficiency virus type 1 (HIV-1) gene expression is regulated by the long terminal repeat (LTR) promoter, which is composed of three regions: U3 (unique 3' end), R (repeated), and U5 (unique 5' end). The U3 region contains an upstream regulatory element including binding sites for several cellular transcription factors such as the nuclear factor of activated T cells 1 (NFATc1) and the activator protein 1 (AP-1), an enhancer with two binding sites for the nuclear factor κB (NF-κB), and the core promoter composed of three tandem binding sites for specificity protein 1 (Sp1) and a TATA box. While NF-κB is a strong enhancer of HIV-1 transcription (8), Sp1 is essential for basal transcription and Tat-mediated activation of HIV-1 (25, 69). This is concordant with the fact that Sp1 is upregulated in activated T cells (43), which compose the primary reservoir for HIV-1 replication (68). Importantly, deletion of all three Sp1-binding sites reduces viral replication in human T-cell cultures (58). In eukaryotic cells, myriad cytoplasmic and nuclear proteins are posttranslationally modified at the hydroxyl groups of specific serine and threonine residues by a single monosaccharide, N-acetyl-d-glucosamine (GlcNAc) (26, 80). The O-glycosidic linkage of GlcNAc to proteins is a highly dynamic and reversible posttranslational modification and differs from other glycosylation events in that it occurs in the cytosol and the nucleus rather than in the Golgi apparatus or the endoplasmic reticulum. Protein O-linked N-acetyl-d-glucosaminylation (O-GlcNAcylation) is catalyzed by O-GlcNAc transferase (OGT) and reversed by O-GlcNAc hexosaminidase (O-GlcNAcase) (30). The substrate for protein O-GlcNAcylation is UDP-GlcNAc. Therefore, O- GlcNAcylation is regulated by the cellular levels of OGT and O-GlcNAcase and by the availability of UDP-GlcNAc (81). As UDP-GlcNAc is synthesized de novo from glucose via the hexosamine biosynthetic pathway, glucose flux through the hexosamine biosynthetic pathway increases O-GlcNAcylation of proteins. Furthermore, compounds like glucosamine (GlcN), streptozotocin, O-(2-acetamido-2-deoxy-d-glucopyranosylidene)amino-N-phenylcarbamate (PUGNAc), and 2-deoxyglucose enhance O-GlcNAcylation of proteins either by increasing the availability of UDP-GlcNAc or by inhibiting the enzyme O-GlcNAcase (23, 37, 63). Protein O-GlcNAcylation has been shown to modulate (i) enzyme activity, (ii) protein-protein interactions, (iii) DNA-binding affinity, (iv) subcellular localization, and (v) the half-life and proteolytic processing of proteins (81). Furthermore, O-GlcNAcylation often plays an antagonistic role in relation to phosphorylation (36). The dynamic interplay between O-GlcNAcylation and phosphorylation has therefore been proposed to control protein-protein interactions and protein functions. Furthermore, O-GlcNAc plays a pivotal role in the regulation of gene expression via modification of RNA polymerase II (12, 39) and associated transcription factors (31, 32). Thus, O-GlcNAc serves as a nutrient sensor to couple the metabolic status to cellular processes, such as protein degradation, signal transduction, and gene transcription. Some of the transcription factors involved in HIV-1 gene regulation are modified by O-GlcNAc, including AP-1 (67), yin-yang 1 (YY1) (28), NFATc1 (21), NF-κB (21), and Sp1 (31, 62). Sp1 was the first transcription factor identified as being O-GlcNAcylated (31). Sp1 is ubiquitous and belongs to the Sp1-like/Krüppel-like family of transcription factors characterized by three C-terminal Cys2His2 zinc finger motifs (34) and a DNA-binding affinity to GC-rich sites (16). Two decades ago, Tjian and coworkers showed that Sp1 bears at least eight O-GlcNAc sites and that the O-GlcNAcylated form of Sp1 was more active than the non-O-GlcNAcylated protein (31). In the meantime, other studies revealed that O-GlcNAcylation of Sp1 can also result in a reduction of transcriptional activation (62, 77), indicating that the specific effect of O-GlcNAcylation on Sp1 activity strongly depends on the targeted promoter. Since Sp1 is critical and important for HIV-1 transcription (25) and Sp1 activity is modulated by O-GlcNAc (42), we investigated the effect of Sp1 O-GlcNAcylation on the activity of the HIV-1 LTR promoter. Here we show that an increased O-GlcNAc level represses the HIV-1 LTR promoter activity. This required O-GlcNAcylation of Sp1 and the presence of Sp1-binding sites in the HIV-1 LTR. Thus, modulation of Sp1 O-GlcNAcylation may be useful as a potential therapeutic approach for the inhibition of HIV-1 replication. MATERIALS AND METHODS Jurkat and T1 (174 × CEM.T1) cells were maintained in very-low-endotoxin-Roswell Park Memorial Institute (RPMI) 1640 medium (Biochrom, Berlin, Germany) supplemented with 10% fetal calf serum (FCS; Biochrom) and 2 mM l-glutamine (PAA, Cölbe, Germany) at 37°C in a humidified atmosphere at 5% CO2. CD4+ T cells were isolated using anti-CD4 magnetically activated cell sorting beads (Miltenyi, Bergisch Gladbach, Germany) according to the manufacturer's instructions. CD4+ T cells were cultured in MLPC medium consisting of RPMI 1640, 10% human serum, 2 mM l-glutamine, 20 mg/liter gentamicin, 10 mM HEPES, 1 mM sodium pyruvate, and 1% minimum essential medium nonessential amino acids (all from PAA, Cölbe, Germany) at 37°C under 5% CO2. HeLa-Tat-III/LTR/d1EGFP cells were obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH, from Masahiko Satoh (57). These cells stably express d1EGFP (which was derived from d2EGFP, a destabilized, redshifted variant of the enhanced green fluorescent protein [EGFP], and has a half-life of approximately 1 hour) under the control of the HIV-1 LTR promoter. HeLa-Tat-III/LTR/d1EGFP cells were cultivated in Dulbecco's modified Eagle's medium (PAA) supplemented with 10% FCS, 2 mM l-glutamine, and 1 mg/ml G418 at 37°C under 8.5% CO2. HEK 293T cells were cultivated in Dulbecco's modified Eagle's medium supplemented with 10% FCS and 2 mM l-glutamine at 37°C under 8.5% CO2. Pseudoviruses were produced by cotransfection of HEK 293T cells with pNL4-3LucR-E- (obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH, from Nathaniel Landau [13, 27]) and a vesicular stomatitis virus G protein (VSV-G)-encoding plasmid by using the calcium phosphate technique. Pseudovirus-containing supernatants were harvested 2 days after transfection, and cellular debris was removed by centrifugation at 300 × g for 5 min. HIV-1 p24 concentration was estimated by antigen enzyme-linked immunosorbent assay. Pseudovirus-containing supernatants were stored in 1-ml aliquots at -80°C until use. Jurkat and T1 cells were cultured to a density of 1.5 × 106 cells/ml and split 1:2 on the day prior to infection. Cells were infected with VSV-G env pseudotyped HIV-1NL4-3LucR-E- for 4 h at 37°C, using a concentration of 30 ng p24 per 1 × 106 cells. Cells were centrifuged at 300 × g for 5 min to remove excessive pseudovirus and resuspended in 2 ml very-low-endotoxin-RPMI 1640 containing 2 mM l-glutamine and 10% FCS supplemented with the respective concentration of GlcN (Sigma-Aldrich, Munich, Germany) in triplicate. CD4+ T cells were stimulated for 6 days with 10 U/ml interleukin-2 (Roche, Mannheim, Germany) and 10 µg/ml phytohemagglutinin P (Sigma-Aldrich) prior to infection. Cells were infected with reporter HIV-1 for 4 h at 37°C, using a concentration of 30 ng p24 per 2 × 106 cells. Cells were centrifuged at 250 × g for 10 min to remove excessive pseudovirus and resuspended in 1.5 ml MLPC medium supplemented with the respective concentration of GlcN in triplicate. After incubation for 24 h, cells were collected, washed once with phosphate-buffered saline (PBS; Biochrom), lysed in 1× passive lysis buffer (Promega, Mannheim, Germany), and assayed for luciferase activity as described below. Cell viability was measured using the CellTiter 96 nonradioactive cell proliferation assay (MTT [3-(4,5-dimethyl-2-thiazolyl)-2,5-diphenyl-2H-tetrazolium bromide] assay; Promega) according to the manufacturer's protocol. Briefly, 100-µl cell suspensions containing either 1 × 105 HIV-1-infected Jurkat or T1 cells or 2 × 105 HIV-1-infected CD4+ primary T lymphocytes were applied for the assay. The coding sequence of OGT was cloned by amplifying 820 bp corresponding to the N terminus of nucleocytoplasmic OGT, GenBank accession number NM_181672 (24, 41), from human cDNA with specific primers containing the restriction sites for BamHI and EcoRV (underlined) and a Kozak sequence (italics) (forward primer, 5'-TAGGGATCCGATATCGCCACCATGGCGTCTTCCGTGGGCAACG-3') and the restriction site for StuI (underlined) (reverse primer, 5'-AGATCTATCAGGCCTTGCTCATAG-3'). The PCR fragment was then ligated with the StuI restriction site to the part of the Lv4F fragment (also termed mitochondrial OGT; GenBank accession number U77413 [46, 47]) which corresponds to the C terminus of nucleocytoplasmic OGT and harbors a NotI restriction site at the 3' end. The complete sequence for the human nucleocytoplasmic OGT was subsequently inserted into pcDNA4/myc-His B (Invitrogen, Karlsruhe, Germany) at the EcoRV and NotI sites. The rescue mutant resOGT was created by mutagenesis PCR using pcDNA-OGT as template and the QuikChange XL site-directed mutagenesis kit (Stratagene, Amsterdam, The Netherlands) according to the manufacturer's instructions. The mutagenic primers were designed by the QuikChange primer design program: forward primer, 5'-GAGGCACGGCAACCTGTGCTTAGATAAGATCAACGTGCTGCACAAGCCACCATATGAACATCCAAAAGA-3'; reverse primer, 5'-TCTTTTGGATGTTCATATGGTGGCTTGTGCAGCACGTTGATCTTATCTAAGCACAGGTTGCCGTGCCTC-3' (nucleotide changes compared to the wild-type sequence are in bold). The expression plasmid for Sp1 was generated by subcloning the XbaI/SmaI fragment of the pBS-Sp1-f1 vector (GenBank accession number AF252284 [35]), representing the open reading frame of Sp1, into the XbaI/EcoRV-digested pcDNA3.1(-) vector (Invitrogen). The reporter constructs pXP1-LTR-κB-Sp1wt-Luc, pXP1-LTR-κB-Sp1mut-Luc, and pXP1-LTR-Sp1wt-Luc were kindly provided by Manuel López-Cabrera (2, 22). The plasmid pcDNA4-LTRwt-d2EGFP (destabilized, redshifted variant of the EGFP with a half-life of approximately 2 hours) was cloned by insertion of d2EGFP into pcDNA4/myc-His B plasmid (Invitrogen) via XhoI and XbaI. HIV-1 wild-type LTR (LTRwt) was amplified from pNL4-3, obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH, from Malcolm Martin (1), using the following primers: forward primer, 5'-GACAATTGAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTC-3'; reverse primer, 5'-ACCTCGAGTTGGCTCACTGCAACCTCTACCTCCTGGGTGCT-3'. The amplified LTRwt DNA was digested with MfeI and XhoI (underlined) and ligated into pcDNA4-d2EGFP. The plasmid pcDNA4-LTRmutSp1-d2EGFP contained the Sp1-binding-site-mutated LTR (LTRmutSp1) and was created by QuikChange XL site-directed mutagenesis PCR using the forward primer 5'-GGGGACTTTCCAGGGATTCGTGGCCTGTTCGGGACTGGTTAGTGGCGA-3' and the reverse primer 5'-CTCGCCACTAACCAGTCCCGAACAGGCCACGAATCCCTGGAAAGTCCCC-3' (nucleotide changes compared to the wild-type sequence are in bold). The Tat expression construct pCEP4-Tat was obtained through the AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH, from Lung-Ji Chang (9). The control reporter plasmid pEF1α-Luc was constructed by subcloning the elongation factor 1 alpha (EF1α) promoter from pEF1/Myc-His vector (Invitrogen) into the promoterless pGL3-Basic vector (Promega). pcDNA4-EF1α-d2EGFP was generated by replacing the cytomegalovirus promoter in the plasmid pcDNA4-d2EGFP with the EF1α promoter. All cloned constructs were confirmed by full-length sequencing and expression experiments. The plasmid pGEM4Z-OGT-64A was generated by subcloning the OGT gene from pcDNA4-OGT into the RNA production vector pGEM4Z-5'UTR-sig-huSurvivin-DC.LAMP-3'UTR-64A (kindly provided by Kris Thielemans [4]), replacing the huSurvivin-DC.LAMP. The pGEM4Z-EGFP-64A vector was provided by I. Tcherepanova, Argos Therapeutics, Durham, NC (66). The constructs were confirmed by restriction digest and expression experiments. HeLa-Tat-III/LTR/d1EGFP cells were treated with the respective GlcN concentrations for 5 h in triplicate. HEK 293T-LTRwt-d2EGFP and HEK 293T-LTRmutSp1-d2EGFP cells were treated with 16 mM GlcN for 20 h in duplicate. The cells were harvested with trypsin-EDTA (PAA), washed twice with FACS-PBS (PBS containing 5% FCS and 0.1% sodium azide [Sigma-Aldrich]) and resuspended in FACS-PBS. Viable GFP-positive cells (1 × 104) were analyzed for each sample using a FACSCalibur flow cytometer with CellQuest Pro software (both from BD Biosciences, Heidelberg, Germany). The relative fluorescence intensity is given as a ratio of the geometric mean of untreated cells to that of GlcN-treated cells. The geometric mean of untreated cells was defined as 100%. In vitro transcription of RNA was performed as described previously (65). In short, the pGEM4Z-OGT-64A and pGEM4Z-EGFP-64A vectors were linearized with NotI and SpeI enzyme, respectively, purified by phenol-chloroform extraction and ethanol precipitation, and used as DNA templates. The in vitro transcription was performed with T7 RNA polymerase (mMessage mMachine T7 Ultra kit; Ambion/Applied Biosystems, Austin, TX) according to the manufacturer's instructions. After DNase I (Ambion) digestion and poly(A) tailing (Ambion), the transcribed RNA was recovered on RNeasy columns (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions. RNA quality was verified by agarose gel electrophoresis, and the RNA concentration was measured spectrophotometrically. The protein concentration was determined in a microplate reader model 680 (Bio-Rad, Munich, Germany) at 750 nm using a detergent-compatible (DC) protein assay kit (Bio-Rad) according to the manufacturer's protocol. Cells were lysed as described under “Cell transfection” and “HIV-1 infection and GlcN treatment.” Samples were prepared in 6× sodium dodecyl sulfate (SDS) buffer (350 mM Tris-HCl, pH 6.8, 30% glycerol, 10% SDS, 0.6 M dithiothreitol, 0.012% bromophenol blue), boiled for 10 min at 99°C, and loaded on 7.5% acrylamide-bisacrylamide gels or 4 to 20% Tris-glycine gradient gels (Anamed, Gross-Bieberau, Germany). Gel-separated proteins were transferred to a polyvinylidene fluoride membrane (Roth, Karlsruhe, Germany), which was blocked with 5% nonfat dry milk in PBS-Tween (PBS containing 0.1% Tween 20) for 2 h, washed with PBS-Tween, and incubated with the primary antibody in 2.5% nonfat dry milk in PBS-Tween at room temperature for 2 h. After washing, the membranes were incubated for 45 min at room temperature with the secondary antibody in 2.5% nonfat dry milk in PBS-Tween (71, 76). In order to prevent cross-reactions with milk proteins, Western blot staining with the O-GlcNAc-recognizing antibody was performed as follows: the membranes were blocked with 10% Western blocking reagent (Roche) in PBS-Tween and the antibody was diluted in 0.5% Western blocking reagent (Roche). Subsequent to the final washing step, chemiluminescence was detected according to the manufacturer's protocol (Pierce Biotechnology, Rockford, IL). The following primary antibodies were used: polyclonal rabbit anti-human Sp1 Pep2 (1:500; Santa Cruz Biotechnology, Heidelberg, Germany), polyclonal goat anti-human lamin A/C N-18 (1:1,000; Santa Cruz Biotechnology), polyclonal rabbit anti-human OGT TI-14 (1:1,000; Sigma-Aldrich), polyclonal rabbit anti-human actin (1:1,000; Sigma-Aldrich), monoclonal mouse anti-human histone H1 (1:1,000; Santa Cruz), monoclonal mouse anti-GFP (1:10,000, Roche); monoclonal mouse anti-eukaryote O-GlcNAc CTD110.6 (1:1,000; Hiss Diagnostics, Freiburg, Germany), and monoclonal mouse anti-human glyceraldehyde-3-phosphate dehydrogenase (GAPDH; 1:70,000, Millipore, Schwalbach, Germany). The monoclonal rat anti-Tat immunoglobulin G2a (IgG2a) antibody (clone 1C9; 1:150) was produced by immunization of LOU/C rats with His-tagged purified recombinant Tat protein (50 µg) according to a previously described procedure (48). The secondary, horseradish peroxidase-coupled antibodies goat anti-rat IgG, donkey anti-rabbit IgG, sheep anti-mouse IgG, and rabbit anti-goat IgG (all from Dako, Hamburg, Germany) were diluted 1:5,000. Fractionation was performed with a nuclear/cytosol fractionation kit (BioVision, Wiesbaden, Germany) according to the manufacturer's protocol (64). For gel shift experiments, cells were counted, and the nuclear fractions were lysed in 10 µl nuclear extraction buffer per 1 × 106 HEK 293T cells in order to concentrate the fractions. The protein concentration was determined as described above. Equal protein amounts of each fraction were loaded onto the gel and analyzed by Western blotting or applied to gel shift experiments. HEK 293T cells (2 × 106) were seeded in 9 ml medium in 10-cm cell culture dishes. The cells were transfected 24 h later (via the calcium phosphate method) with pcDNA4, pcDNA-Sp1, or pcDNA-Sp1 and pcDNA-OGT. The total transfected DNA amount was adjusted to 30 µg with the control plasmid pcDNA4. At 24 h after transfection, cells were washed with PBS and subsequently lysed in IP lysis buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl, 5 mM MgCl2, and 1% Igepal [all purchased from Sigma-Aldrich]), supplemented with one tablet of Complete Mini, EDTA-free protease inhibitor cocktail (Roche) per 10 ml. The protein concentration was determined, and 1 mg total protein in a maximal volume of 1 ml was used for the immunoprecipitation. The lysates were precleared by incubation with 100 µl Sepharose CL-6B (Sigma-Aldrich) at 4°C for 1 h. The Sp1 protein complex was immunoprecipitated at 4°C for 3 h with 30 µg anti-Sp1 Pep2 antibody covalently coupled to agarose beads (Santa Cruz Biotechnology). Bead-coupled protein complexes were washed twice with 1 ml IP lysis buffer and three times with IP wash buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl, 5 mM MgCl2, and 0.1% Igepal). The immunoprecipitates were analyzed by SDS-polyacrylamide gel electrophoresis and Western blotting as described above. HEK 293T cells were seeded 24 h prior to transfection in 1.5 ml medium at a density of 3 × 105 cells per well in a six-well plate. Cells were transiently cotransfected (via the calcium phosphate method) in triplicate with 0.5 µg pXP1-LTR-Sp1wt-Luc or 0.5 µg pEF1α-Luc, along with 0.75 µg pcDNA-Sp1 and/or 0.25 µg pCEP4-Tat and 1 µg pcDNA-OGT (unless otherwise indicated). The total DNA amount was adjusted to 2.5 µg with the control plasmid pcDNA4. For RNA interference experiments, 0.22 µg (amount corresponds to 10 nM) short interfering RNA (siRNA) oligonucleotides specific for human OGT (hsOGT_7 SI02665131; Qiagen), human Sp1 (hsSP1_1 SI00150976; Qiagen), or human GAPDH (Silencer GAPDH siRNA; Ambion, Darmstadt, Germany), or a nontargeting control siRNA (Silencer Negative Control no. 1 siRNA; Ambion) was cotransfected. Twenty-four hours later, cells were lysed in 200 µl 1× passive lysis buffer (Promega) and assayed for luciferase activity as described below. Stable transfection of HEK 293T cells with pcDNA4-LTRwt-d2EGFP and pcDNA4-LTRmutSp1-d2EGFP was performed via the calcium phosphate method. Stably transfected cells were selected with 600 µg/ml Zeocin (Invitrogen). After 10 days, single colonies were transferred to new culture plates. Fluorescence intensity of clones was analyzed by flow cytometry. For the final experiment, three clones of each construct, which expressed comparable fluorescence intensities, were selected. Electroporation of Jurkat cells was performed with the Nucleofector II (Amaxa/Lonza, Cologne, Germany). Therefore, cells were resuspended in 100 µl Nucleofector solution V at a final concentration of 2 × 107 cells/ml and mixed with 50 µg/ml DNA (pcDNA4-EF1α-d2EGFP or pcDNA4-OGT). Electroporation was carried out with 100 µl cell DNA suspension using program X-001. Immediately after transfection, cells were transferred to prewarmed RPMI 1640 medium supplemented with 10% FCS and 4 mM l-glutamine. At 48 h posttransfection, stably transfected Jurkat cells were selected with 400 µg/ml Zeocin (Invitrogen). After initial selection, cells were singularized by limited dilution into 96-well plates. After 13 days, d2EGFP-expressing Jurkat clones were pooled, while OGT-expressing Jurkat clones were cultivated separately. T lymphocytes were electroporated as described previously (65). Briefly, cells were washed once with pure RPMI 1640 and once with OptiMEM without phenol red (Invitrogen) (all at room temperature). The cells were resuspended in OptiMEM at a concentration of 1 × 108 cells/ml. RNA was transferred to a 4-mm cuvette (Molecular Bioproducts, San Diego, CA) at a final concentration of 150 µg/ml. A volume of 100 µl of cell suspension was added and immediately pulsed in a Gene Pulser Xcell (Bio-Rad). Pulse conditions for CD4+ T lymphocytes were square-wave pulse, 500 V, and 5 ms. Immediately after electroporation, cells were transferred to prewarmed MLPC medium. Cell lysates were centrifuged at 10,000 × g at 4°C for 3 min. Firefly luciferase activity of the supernatants was measured using the luciferase assay system (Promega) in a Luminoskan Ascent instrument (Thermo Fisher Scientific, Langenselbold, Germany). Each luciferase reporter gene assay was performed in triplicate (except for that shown in Fig. 3). OGT inhibits HIV-1 LTR promoter activity in infected primary CD4+ T cells. Primary CD4+ T cells from two different donors were infected with VSV-G env pseudotyped HIV-1NL4-3LucR-E-. At 36 h postinfection, cells were electroporated with in vitro-transcribed polyadenylated mRNA encoding either EGFP or OGT. (A) At 8 h postelectroporation, luciferase activity was measured. (B) The O-GlcNAcylation pattern and overexpression of EGFP and OGT were verified by Western blotting. Staining of GAPDH demonstrates equal loading of proteins. HEK 293T cells were transfected with the expression constructs for Sp1, OGT, or both, and nuclear proteins were isolated as already described. The electrophoretic mobility shift assay was essentially performed as described previously (54, 56). Briefly, binding reaction mixtures (final volume, 15 µl) contained 10 mM Tris-HCl (pH 7.5), 80 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol, 5% glycerol, 3 µg of poly(dI-dC) · poly(dI-dC) (GE Healthcare), 10 µg of nuclear extract, and 4 × 104 cpm of the 32P-end-labeled double-stranded oligonucleotide probe. The oligonucleotides corresponded either to the wild-type HIV-1 LTR region containing the three binding sites for Sp1 (wt LTR-Sp1, 5'-GGATCGGGAGCGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCC-3'; Sp1-binding sites are underlined) or to the Sp1-binding-site-mutated HIV-1 LTR (mut LTR-Sp1, 5'-GGATCGGGATTCGTGGCCTGTTCGGGACTGGTTAGTGGCGAGCCC-3'; nucleotide substitutions in bold). After incubation for 45 min on ice, the protein-DNA complexes were resolved on nondenaturing 5% polyacrylamide gels run in 1× Tris-borate-EDTA buffer containing 89 mM Tris, 89 mM boric acid, and 2 mM EDTA (pH 8.0). For competition experiments, a 10-, 20-, or 50-fold molar excess of unlabeled competitor oligomers was added to the gel shift mixtures prior to the addition of the 32P-labeled oligonucleotide probe and incubated for 30 min at 4°C. For supershift assays, antibodies directed against Sp1, O-GlcNAc, or OGT were added to the gel shift mixture and incubated for 30 min at 4°C prior to the addition of the labeled probe. Supershift antibodies were Sp1 Pep2 sc-59X (Santa Cruz Biotechnology), O-GlcNAc CTD110.6 (Hiss Diagnostics), and OGT TI-14 (Sigma-Aldrich), as well as IgG (NF-κB p50 sc114X [Santa Cruz Biotechnology]) and IgM (heparan sulfate 370255 [Seikagaku Corporation]) isotype controls. All experiments were repeated three times. One representative gel shift of each assay is shown. RNA was isolated with RNeasy columns (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions. RNA quality was verified by agarose gel electrophoresis, and RNA concentration was measured spectrophotometrically. Reverse transcription of total RNA and amplification of OGT and GAPDH cDNA were performed as described previously (53, 72). PCR was carried out in a total reaction volume of 25 µl with 1 µl of either undiluted (1) or diluted (1:5 and 1:10) cDNA. Primers specifically amplifying recombinant OGT were selected: the forward primer recognized specifically the vector-encoded 5' untranslated region of the recombinant OGT mRNA (5'-TCCAGTGTGGTGGAATTCTG-3'); the reverse primer was homologous to a sequence corresponding to the N-terminal region of both endogenous and recombinant OGT (5'-TTGCGTCTCAATTGCTTTCA-3'). Amplification of GAPDH was performed using the forward primer 5'-AGCCACATCGCTCAGAACAC-3' and the reverse primer 5'-GAGGCATTGCTGATGATCTTG-3' as described previously (44). Statistical significances were calculated with the Student t test for paired samples using the SPSS 15.0 and 16.0 software for Microsoft Windows (SPSS Inc., Chicago, IL). P values smaller than 0.05 were considered statistically significant (*), P values smaller than 0.01 were considered highly significant (**), and P values smaller than 0.001 were considered the most significant (***). RESULTS In order to evaluate whether O-GlcNAcylation affects HIV-1 replication, Jurkat, T1, and primary CD4+ T lymphocytes were infected with HIV-1NL4-3LucR-E-, a replication-deficient HIV-1 clone pseudotyped with the envelope protein VSV-G. Upon integration, this recombinant virus expresses the firefly luciferase gene driven by the HIV-1 LTR promoter. Consequently, the luciferase activity in infected cells correlates with the rate of HIV-1 gene transcription. Reporter-HIV-1-infected cells were either left untreated (0 mM) or treated for 24 h with increasing concentrations of GlcN (0.25 mM, 1 mM, 4 mM, and 16 mM), and luciferase activity was measured. GlcN significantly inhibited the HIV-1 gene transcription by more than 60% in Jurkat cells and 50% in T1 cells (Fig. 1A and B) without significantly decreasing cell viability (Fig. 1A and B). Western blot analyses using an antibody which recognizes O-GlcNAcylated proteins demonstrated increased O-GlcNAcylation upon GlcN treatment, whereas expression of OGT remained unchanged (Fig. 1C). Detection of GAPDH and actin demonstrated that equal amounts of protein were loaded. Similar results were obtained in primary CD4+ T lymphocytes from two different donors (Fig. 1D and E), although HIV-1 replication in donor 2 (Fig. 1E) was inhibited to a lesser extent than was that in donor 1 (Fig. 1D). This difference is also reflected in smaller amounts of O-GlcNAcylated proteins in the lysates of donor 2 than in lysates of donor 1 (compare donor 1 and donor 2 in Fig. 1F, upper panel). GlcN inhibits HIV-1 transcription in lymphocytes. Jurkat cells (A), T1 cells (B), and primary CD4+ T cells from two different donors (D and E) were infected with VSV-G env pseudotyped HIV-1NL4-3LucR-E- and cultured in the absence (0 mM) or presence of different concentrations of GlcN (0.25 mM, 1 mM, 4 mM, and 16 mM) for 24 h. Subsequently, HIV-1 LTR-driven luciferase activity was measured (solid lines). The effect of GlcN on cytotoxicity and proliferation was monitored by MTT assay (dashed lines). The results are presented in terms of percent activities of untreated control cells. The means ± standard deviations from triplicate determinations are indicated. P values are calculated in comparison with control: *, P ≤ 0.05; **, P ≤ 0. 01. (C and F) Western blot analyses of O-GlcNAcylated proteins and OGT in Jurkat and T1 cells (C) and in primary CD4+ T lymphocytes (F). Actin and GAPDH served as loading controls. To exclude the possibility that reduced HIV-1 gene transcription is caused by impaired nuclear transport of the preintegration complex or reduced proviral integration, we evaluated the impact of GlcN treatment on the HIV-1 LTR promoter activity. Therefore, HeLa-Tat-III/LTR/d1EGFP cells, which stably express d1EGFP under the control of the HIV-1 LTR, were either left untreated (0 mM) or treated with increasing concentrations of GlcN (0.25 mM, 1 mM, 4 mM, and 16 mM). Subsequently, fluorescence intensity was measured by flow cytometry. GlcN treatment inhibited the HIV-1 LTR-triggered d1EGFP expression in a dose-dependent manner, as detected by the shift of the fluorescence emission peak to a lower fluorescence intensity (Fig. 2A, bar diagram). Quantification of the different values demonstrated that GlcN significantly decreased the HIV-1 LTR activity by more than 60% (Fig. 2A, bar diagram). Comparable results were obtained after treatment of HeLa-Tat-III/LTR/d1EGFP cells with the UDP-GlcNAc analog streptozotocin (data not shown), which inhibits the O-GlcNAcase activity and thereby increases O-GlcNAcylation (63). GlcN inhibits HIV-1 LTR promoter activity and increases O-GlcNAcylation in HeLa cells. (A) HeLa-Tat-III/LTR/d1EGFP cells were either left untreated (0 mM) or stimulated for 5 h with increasing concentrations of GlcN (0.25 mM, 1 mM, 4 mM, and 16 mM). The HIV-1 LTR activity was assessed by measuring the d1EGFP fluorescence intensity (FI) with flow cytometry analyses. The shift of the fluorescence emission peak (inset) and the bar diagram of the respective geometric mean values compared to those for unstimulated control cells are shown. The means ± standard deviations calculated from triplicate determinations are indicated. P values are given for comparison with control: n.s., not significant; **, P ≤ 0.01; ***, P ≤ 0. 001. (B) Nuclear and cytosolic fractions of unstimulated (control) or stimulated (16 mM GlcN) HeLa-Tat-III/LTR/d1EGFP cells were analyzed by Western blotting to detect O-GlcNAcylated proteins. Arrowheads mark proteins whose O-GlcNAcylation patterns are increased upon GlcN treatment. Staining of OGT (an example of an O-GlcNAcylated protein) and histone H1 (non-O-GlcNAcylated) shows that protein expression levels were not altered. Staining of GAPDH and lamin A/C demonstrates successful fractionation. To demonstrate O-GlcNAcylation of cellular proteins in the presence of GlcN, cytosolic and nuclear fractions of control-treated versus GlcN-treated (16 mM) HeLa-Tat-III/LTR/d1EGFP cells were analyzed by Western blotting (Fig. 2B). In both fractions, several proteins showed increased O-GlcNAcylation upon GlcN treatment (Fig. 2B, upper panels), whereas the expression of OGT (example of an O-GlcNAc-modified protein) and of histone H1 (non-O-GlcNAc-modified protein) remained unchanged (Fig. 2B, middle panels). Successful fractionation of cytoplasmic and nuclear proteins and loading of equal protein amounts were verified by staining of lamin A/C and GAPDH (Fig. 2B, lower panels). Altogether, these results demonstrate that GlcN treatment enhances O-GlcNAcylation of cytoplasmic and nuclear proteins and that this correlates with decreased gene expression from the HIV-1 LTR promoter. Since OGT is the only known enzyme to mediate O-GlcNAcylation, we investigated the effect of OGT on HIV-1 transcription in primary lymphocytes. To this goal, primary CD4+ T lymphocytes from two different donors were infected with pseudotyped reporter HIV-1. At 36 h postinfection the cells were transfected with in vitro-transcribed polyadenylated mRNA encoding either EGFP or OGT. OGT expression levels were maximal at 8 h after transfection (data not shown), and the cells were harvested. Reporter analyses showed that overexpression of OGT inhibited HIV-1 transcription to an extent similar to that of treatment with 16 mM GlcN (compare Fig. 3A with Fig. 1D and E). The inhibition was slightly stronger in the cells of donor 2 than in the cells of donor 1, which correlated well with the increased OGT expression and O-GlcNAcylation level observed in the cells of donor 2 (Fig. 3B, upper panels; compare donor 1 and donor 2). Assuming that O-GlcNAcylation alters the activity of transcription factors, Sp1 and NF-κB were prime candidates for the O-GlcNAc-mediated inhibition of the HIV-1 LTR. Binding sites for both factors constitute the core promoter/enhancer of HIV-1, and both have been reported to be modified by O-GlcNAc (21, 31). In order to prove this hypothesis, luciferase reporter assays were performed using LTR reporters lacking the upstream modulatory region but containing the NF-κB- and/or Sp1-binding sites (either combined or isolated), as well as a TATA box and the trans-activation response (TAR) element for Tat-mediated activation (Fig. 4A, LTR-κB-Sp1wt, LTR-κB-Sp1mut, and LTR-Sp1wt). HEK 293T cells were cotransfected with the reporter constructs and a Tat-encoding plasmid, along with an OGT-encoding plasmid or a control plasmid (Fig. 4B). Overexpression of OGT inhibited the Tat-induced activity of the promoters containing functional Sp1-binding sites (Fig. 4B, LTR-κB-Sp1wt and LTR-Sp1wt). In contrast, the activity of the promoter containing mutated Sp1-binding sites was increased upon overexpression of OGT (Fig. 4B, LTR-κB-Sp1mut). The transient expression of Tat itself was not significantly affected by overexpression of OGT (Fig. 4B, lower panels). These findings suggest that the presence of functional Sp1-binding sites is crucial for the inhibitory effect of OGT on the HIV-1 LTR. The presence of Sp1-binding sites in the HIV-1 LTR promoter is required for OGT-mediated inhibition. (A) Schematic representation of the promoter constructs used in the luciferase assay in panel B. The truncated promoters lack the upstream modulatory region but contain the NF-κB- and/or Sp1-binding sites (either combined or isolated), as well as a TATA box and the TAR element for Tat-mediated activation. Quartered circles represent mutated Sp1-binding sites. Numbers reflect the positions on the wild-type LTRLAI. (B) Promoter activities were detected by measuring the luciferase activity in the cell lysates of HEK 293T cells transiently transfected with the reporter constructs (LTR-κB-Sp1wt, LTR-κB-Sp1mut, and LTR-Sp1wt) together with plasmids coding for OGT and Tat as indicated. The total DNA amount was adjusted with pcDNA4. The values were normalized to the total amount of protein and are presented in terms of percentages of the corresponding Tat-induced promoter activity. The means ± standard deviations from triplicate determinations are shown (upper panel). Western blot analyses of Tat and OGT in transfected cells are shown in the lower panel. GAPDH staining demonstrates that equal amounts of protein were loaded. (C) Schematic representation of the full-length wild-type and full-length Sp1-mutated promoter constructs used for the stable transfection of HEK 293T cells. (D) HEK 293T cells were stably transfected with d2EGFP either under the control of the wild-type HIV-1 LTR (LTRwt) or under the control of the LTR containing mutated Sp1-binding sites in the basal promoter (LTRmutSp1). The effect of GlcN (16 mM) on the HIV-1 LTR was analyzed by flow cytometry, and the mean fluorescence intensity is represented in the bar diagram. The means ± standard deviations were calculated from duplicate determinations of three independent clones with similar fluorescence intensities. P values are given for comparison with control: *, P ≤ 0.05; n.s., not significant. Western blot analyses of one representative clone verify the increased O-GlcNAcylation pattern upon GlcN treatment and constant OGT expression. GAPDH staining demonstrates that equal amounts of protein were loaded. In a next step, the role of the Sp1-binding sites in the full-length HIV-1 LTR integrated into the cellular genome was analyzed. Toward this goal, HEK 293T cells were stably transfected with plasmids expressing d2EGFP either under the control of the full-length wild-type HIV-1 LTR (Fig. 4C, LTRwt) or under the control of the full-length LTR containing mutated Sp1-binding sites (Fig. 4C, LTRmutSp1). Three independent HEK 293T cell clones with similar fluorescence intensities were either left untreated or stimulated with 16 mM GlcN. Subsequently, the promoter activities were measured by flow cytometry. Mutation of the Sp1-binding sites strongly decreased HIV-1 LTR activity (Fig. 4D, compare fluorescence intensities in the bar diagram). However, GlcN treatment significantly decreased the fluorescence intensity only in cells expressing d2EGFP under the control of the wild-type HIV-1 LTR and had no significant effect on cells expressing d2EGFP under the control of the Sp1-mutated HIV-1 LTR (Fig. 4D, bar diagram). Western blot analyses demonstrated that GlcN treatment similarly increased the O-GlcNAcylation pattern in the two cell types, that OGT expression was not altered, and that equal amounts of proteins were loaded (Fig. 4D, right panels). Altogether these results demonstrate that the Sp1-binding sites are required for the GlcN-mediated inhibition of the full-length HIV-1 promoter integrated into the cellular genome. Since the presence of the Sp1-binding sites is crucial for the inhibitory effect of OGT on the HIV-1 LTR promoter, we investigated whether this is also the case for the Sp1 protein itself. To this goal, the sensitivity of the Tat-induced LTR-Sp1wt promoter to OGT was analyzed in HEK 293T cells in the absence and presence of Sp1. Sp1 was recombinantly overexpressed, and its expression was inhibited with a specific siRNA. A nontargeting siRNA was used as control. Cotransfection of the control siRNA did not affect the OGT-mediated inhibition of the LTR-Sp1wt (Fig. 5A, compare bars 3 and 4). Tat-mediated activation of the LTR-Sp1wt was less pronounced after knockdown of Sp1 (Fig. 5A, compare protein content-normalized relative light unit values given above bars 3 and 7). However, OGT clearly did not inhibit LTR-Sp1wt promoter activity under these conditions (Fig. 5A, compare bars 7 and 8). The expression of OGT and Tat was not altered upon depletion of Sp1 expression (Fig. 5B). Thus, in addition to the Sp1-binding sites, the presence of the Sp1 protein itself is also necessary for the OGT-mediated inhibition of the HIV-1 LTR activity. Sp1 is crucial for the inhibitory effect of OGT on the HIV-1 LTR. (A) HIV-1 LTR activity was measured by luciferase assay after cotransfection of HEK 293T cells with an LTR-Sp1wt reporter construct along with Tat-and Sp1-encoding plasmids in the absence or presence of an OGT-encoding vector together with 10 nM control siRNA or an siRNA specifically targeting Sp1. Total DNA amount was adjusted with pcDNA4. The relative light units normalized to the total protein content indicate the means ± standard deviations from triplicate determinations (values above the bars). The results were normalized for each siRNA data set individually and are presented in terms of percentages of control (Tat- and Sp1-induced LTR-Sp1wt activity). P values are given for comparison with control: **, P ≤ 0.01; n.s., not significant. (B) Western blot analyses of the lysates verify the knockdown of Sp1 by siRNA and prove that the expression of OGT and Tat remains unaffected upon silencing of Sp1. The GAPDH staining demonstrates that equal amounts of protein were loaded. To investigate the role of Sp1 in the OGT-mediated inhibition of the HIV-1 LTR in more detail, we explored whether overexpression of OGT enhances O-GlcNAcylation of Sp1 (Fig. 6A). For this purpose, Sp1 was expressed alone or in combination with OGT (Fig. 6A, input). Immunoprecipitation experiments demonstrated that O-GlcNAcylation of Sp1 was substantially increased when OGT was overexpressed (Fig. 6A, Sp1-IP). Coimmunoprecipitations revealed that OGT physically interacts with Sp1 under these conditions (data not shown). O-GlcNAcylation of Sp1 selectively inhibits HIV-1 LTR promoter activity in a dose-dependent manner. (A) HEK 293T cells were transfected with control vector (CV), Sp1, or Sp1- and OGT-expressing vectors. Total DNA amount was adjusted with pcDNA4. Expression of transfected plasmids and O-GlcNAcylation were assessed via Western blotting of cell lysates (input). Immunoprecipitation of Sp1 (Sp1-IP) and O-GlcNAcylation of Sp1 were verified by Western blotting of Sp1-IPs with anti-Sp1 and anti-O-GlcNAc antibodies, respectively. (B) The effects of OGT on the LTR-Sp1wt and the control promoter EF1α in HEK 293T cells cotransfected with the reporter and an Sp1-encoding plasmid were analyzed by luciferase assay. (C) Luciferase activity of HEK 293T cells transiently cotransfected with the LTR-Sp1wt reporter construct along with Tat- and Sp1-encoding plasmids and increasing concentrations of an OGT-encoding vector (0.05 µg, 0.2 µg, and 1 µg). The total DNA amount in panels B and C was adjusted with pcDNA4. The values were normalized to the total amount of protein and are presented in terms of percentages of control (Sp1 induced [B] and Tat- and Sp1-induced [C] promoter activity). The means ± standard deviations from triplicate determinations are indicated. P values are given for comparison with control: n.s., not significant; *, P ≤ 0.05; **, P ≤ 0. In order to determine whether O-GlcNAcylation of Sp1 inhibits solely HIV-1 LTR activity or transcription in general, luciferase reporter assays were performed using LTR-Sp1wt and an EF1α promoter construct. The latter triggers the expression of the housekeeping gene EF1α and harbors Sp1-binding sites. Unlike the LTR-Sp1wt promoter, the EF1α promoter has no TAR element. In order to ensure comparable induction levels, both reporter promoters were activated with Sp1 alone (Fig. 6B). In agreement with the results above, overexpression of OGT significantly inhibited the Sp1-triggered expression of LTR-Sp1wt (Fig. 6B, LTR-Sp1wt). In contrast, the activity of the EF1α promoter was not repressed but significantly increased by OGT (Fig. 6B, EF1α), suggesting that OGT selectively inhibits the HIV-1 LTR promoter and does not generally repress Sp1-regulated gene expression. Furthermore, we evaluated by luciferase reporter assay whether the inhibitory effect of OGT on the Sp1-regulated HIV-1 LTR promoter is dose dependent. To this end, the reporter LTR-Sp1wt was cotransfected with Sp1- and Tat-encoding constructs in HEK 293T cells together with increasing amounts of OGT-encoding plasmid (0.05 µg, 0.2 µg, and 1 µg). OGT clearly inhibited dose dependently the HIV-1 LTR promoter activity (Fig. 6C). The inhibitory effect of OGT on the HIV-1 LTR may rely on impaired nuclear translocation or decreased DNA-binding affinity of Sp1 upon O-GlcNAcylation. The impact of OGT on the nuclear translocation of Sp1 was investigated by comparing the nuclear amounts of Sp1 in cells expressing endogenous or increased levels of OGT. The amount of nuclear Sp1 was not decreased upon overexpression of OGT (Fig. 7A) but rather increased, suggesting that the observed decrease in the HIV-1 LTR promoter activity was not due to impaired nuclear translocation of Sp1. OGT does not interfere with Sp1 expression and DNA binding of Sp1. (A) HEK 293T cells were transfected with an Sp1-encoding plasmid, along with an OGT-encoding plasmid or control plasmid. Protein expression was detected in nuclear protein extracts via Western blotting. Staining of lamin A/C was used as a loading control. (B) Electrophoretic mobility shift assays were carried out with 32P-end-labeled double-stranded oligonucleotides corresponding to the Sp1-binding sites in the HIV-1 LTR promoter (wt LTR-Sp1) or with oligonucleotides containing mutated Sp1-binding sites in order to prevent binding (mut LTR-Sp1). One representative gel shift assay out of three is shown. Reactions were performed either without nuclear extracts (free) or with lysates from cells transfected with an Sp1-encoding plasmid alone (Sp1) or in combination with an OGT-encoding plasmid (Sp1+OGT). The Sp1-oligonucleotide complex is indicated as Sp1-C. (C) Competition experiments were carried out with 10-, 20-, and 50-fold molar excesses of unlabeled wild-type (wt LTR-Sp1) or mutated (mut LTR-Sp1) oligonucleotides. (D) Supershift analyses were performed using 1 µg, 2 µg, or 5 µg anti-Sp1, anti-O-GlcNAc, or anti-OGT antibodies as well as 5 µg anti-IgG or anti-IgM as an isotype control. To investigate whether OGT interferes with the DNA-binding affinity of Sp1, gel shift assays were carried out with templates corresponding to the Sp1-binding sites in the HIV-1 LTR promoter (Fig. 7B, wt LTR-Sp1). No decrease in the formation of protein-oligonucleotide complexes was detected upon overexpression of OGT (Fig. 7B, compare lane 3 and 5). As a specificity control, gel shift experiments were performed using templates containing mutated Sp1-binding sites (Fig. 7B, mut LTR-Sp1). The specific Sp1 complex (Fig. 7B, Sp1-C) was not detectable with these oligonucleotides (Fig. 7B, lanes 4 and 6). The specificity of Sp1-C was confirmed by competition experiments with increasing molar excess of unlabeled wt LTR-Sp1 (Fig. 7C, lanes 2 to 4) and mut LTR-Sp1 oligonucleotides (Fig. 7C, lanes 5 to 7). In addition, the presence of O-GlcNAc-modified Sp1 in Sp1-C was confirmed by supershift analyses using increasing amounts of anti-Sp1 (Fig. 7D, lanes 2 to 4) and anti-O-GlcNAc (Fig. 7D, lanes 5 to 7) antibodies. Both antibodies shifted Sp1-C almost completely (Fig. 7D, lanes 4 and 7), suggesting that most of the DNA-bound Sp1 is O-GlcNAcylated in OGT-overexpressing cells. Addition of an anti-OGT antibody did not shift Sp1-C (Fig. 7D, lanes 8 to 10). Isotype control anti-IgG (Fig. 7D, lane 11) and anti-IgM (Fig. 7D, lane 12) antibodies had no effect on Sp1-C. These results demonstrate that increased O-GlcNAcylation of Sp1 inhibits neither the ability of Sp1 to translocate into the nucleus nor the DNA-binding affinity of Sp1. To confirm that the inhibition of the HIV-1 LTR is dependent on Sp1 O-GlcNAcylation, a specific siRNA was used to deplete the expression of OGT. The specificity of the siRNA was verified with an OGT rescue mutant (resOGT)-encoding plasmid, which contains six silent nucleotide exchanges in the siRNA-binding site (Fig. 8A). Western blot analyses proved that overexpression of wild-type OGT (wtOGT) was inhibited in HEK 293T cells cotransfected with the wtOGT-specific siRNA (Fig. 8B, top panel, lanes 2 and 4), whereas expression of resOGT was not affected (Fig. 8B, top panel, lanes 3 and 5). Staining of O-GlcNAc-modified proteins served as an additional control for the functionality of resOGT (Fig. 8B, middle panel). Sp1 O-GlcNAcylation is necessary for the inhibition of the HIV-1 LTR. (A) Generation of a plasmid encoding an OGT rescue mutant in order to escape silencing by the siRNA targeting wild-type OGT: six silent mutations (bold) were introduced into the siRNA-binding sequence of OGT. (B) Western blot analyses of wild-type (wt) and rescue mutant (res) OGT were performed after transfection with the siRNA specifically targeting wtOGT. Detection of O-GlcNAcylation levels was used as a control for the functionality of the rescue mutant. Immunodetection of GAPDH demonstrates that equal amounts of protein were loaded. (C) HEK 293T cells were transfected with the reporter construct LTR-Sp1wt along with plasmids encoding Sp1 and OGT (wild type or rescue mutant). Total DNA amount was adjusted with pcDNA4. Control siRNA or siRNA targeting wtOGT (10 nM) was cotransfected, and the luciferase activity was measured. The values were adjusted to the total amount of protein and indicate the means ± standard deviations from triplicate determinations. The results are presented in terms of percentages of control (Sp1-induced LTR activity). Applying these tools to test the effect of Sp1 O-GlcNAcylation on HIV-1 LTR, we showed that the promoter activity (Fig. 8C, bar 4) was strongly reduced after overexpression of wtOGT or resOGT (Fig. 8C, bars 5 and 6). Cotransfection of a control siRNA targeting GAPDH had no impact on the HIV-1 LTR activity (Fig. 8C, bars 7 and 8). Knockdown of wtOGT restored the HIV-1 LTR activity (Fig. 8C, bar 9), whereas resOGT escaped silencing and was still able to suppress the promoter activity (Fig. 8C, bar 10). These findings demonstrate that O-GlcNAcylation of Sp1 is required for the inhibition of the HIV-1 LTR promoter by OGT. In all previous experiments the O-GlcNAcylation level was increased after infection with HIV-1. Thus, we aimed to investigate whether increased OGT expression prior to infection may amplify the inhibitory effects on HIV-1 transcription in lymphocytes. Jurkat cells were stably transfected with OGT or, as a control, with d2EGFP. Expression of recombinant OGT in selected clones was confirmed by RT-PCR (Fig. 9A). The stably transfected cells were infected with VSV-G env pseudotyped HIV-1NL4-3LucR-E-, and HIV-1 LTR activity was determined by luciferase assay. Of note, HIV-1 LTR activity was 40-fold lower in both OGT-overexpressing Jurkat cell clones than in control cells expressing d2EGFP (Fig. 9B). These results demonstrate that OGT is a potent inhibitor of HIV-1 LTR activity. OGT overexpression prior to infection amplifies the inhibitory effect on HIV-1 replication. (A) Jurkat cells were stably transfected with plasmids encoding either d2EGFP or OGT. Expression of recombinantly expressed OGT mRNA was confirmed by RT-PCR (upper panel) using specific primers. Amplification of GAPDH served as a loading control (lower panel). Cellular cDNA was subjected to the amplification reactions in increasing dilutions: undiluted (1), 1:5, and 1:10. Control reactions were carried out in the absence of reverse transcriptase (-RT) with undiluted cDNA template. (B) Jurkat cells stably expressing d2EGFP or OGT were infected with VSV-G env pseudotyped HIV-1NL4-3LucR-E-. Luciferase activity was measured 48 h after infection. DISCUSSION Regulation of gene expression by nutrients like glucose and glucosamine is well established (20, 37, 52, 74) and demonstrates that cellular transcription adapts to environmental and metabolic changes. Viruses critically depend on the host cell metabolism (29). This directed us to investigate the impact of the monosaccharidic metabolite O-GlcNAc on HIV-1 gene expression. We showed that increased O-GlcNAc levels inhibit HIV-1 transcription in human T-cell lines as well as in human primary CD4+ T lymphocytes. This effect appeared to be mediated by the transcription factor Sp1, as supported by several lines of experimental evidence. First, we demonstrated that O- GlcNAc-mediated inhibition of HIV-1 transcription required the presence of Sp1-binding sites in the HIV-1 LTR promoter. The Sp1-binding sites are well conserved throughout the lentiviral HIV-1 LTRs (18), and mutation of one or more Sp1-binding sites in the basal HIV-1 LTR promoter leads to a strong delay in replication in human peripheral blood lymphocytes and in T-cell lines (45, 58). This is also reflected in our observations that mutation of the Sp1-binding sites decreases HIV-1 LTR activity in transiently and stably transfected cells. But although Sp1-mutated HIV-1 LTR activity was still measurable and inducible, no significant O-GlcNAc-mediated inhibition of promoter activity was observed. Second, the presence of the Sp1 protein was crucial for the inhibitory effect of O-GlcNAc on the HIV-1 LTR. Inhibition of Sp1 expression with siRNA completely abolished OGT-mediated inhibition of HIV-1 LTR activity. Interestingly, although Sp1 is implicated in the activation of a large number of genes, depletion of the ubiquitous transcription factor did not induce cytotoxicity. This is in agreement with the observation of Philipsen and coworkers, who disrupted the mouse Sp1 gene and found that Sp1-deficient embryonic stem cells were viable and showed normal growth characteristics (50). This might be attributed to the compensation of Sp1-triggered gene transcription by other Sp family members, such as Sp3, which is also ubiquitously expressed and has the potential to activate transcription of certain otherwise Sp1-regulated promoters (70). Third, Sp1 was found to be O-GlcNAcylated by OGT, and this decreased its ability to activate the HIV-1 LTR promoter. These findings are in accordance with the results of Kudlow and colleagues, who showed that O-GlcNAcylation decreases the capability of Sp1 to activate GC-box-containing promoters (77). But it has to be emphasized that O-GlcNAcylation of Sp1 does not generally inhibit transcription. We have shown here that the human EF1α promoter, which also contains Sp1-binding sites (55, 73, 75), is activated by O-GlcNAcylation of Sp1. Additionally, others have shown that the expression of plasminogen activator inhibitor 1 (14, 19, 20), calmodulin (38), and argininosuccinate synthetase (6, 7) is also increased by Sp1 O-GlcNAcylation. Thus, O-GlcNAcylation of Sp1 differentially modulates gene expression. These differential effects may depend not only on the individual composition of the transcription complexes at different promoters but also on differential Sp1 O-GlcNAcylation patterns in different transcription complexes. The latter is supported by the fact that transcriptional corepressors recruit OGT to promoters (78), indicating that O-GlcNAcylation is modulated directly at the promoter site. Furthermore, it has been shown that phosphorylation of distinct residues differentially regulates Sp1 activity (5, 10, 11). As O-GlcNAcylation appears to have a yin-yang relationship with phosphorylation, the same may apply for O-GlcNAcylation. Several mechanisms have been suggested for how O-GlcNAcylation of Sp1 inhibits promoter activity. For example, O-GlcNAcylation reduces the ability of Sp1 to homomultimerize and to interact with transcriptional coactivator TATA-binding protein-associated factor II 110 (TAFII110) (62, 77). These mechanisms are in accordance with our findings that O-GlcNAcylated Sp1 localizes at the HIV-1 LTR promoter and does not exhibit a decreased DNA-binding activity. Furthermore, O-GlcNAcylated Sp1 seems to act additionally as a repressor of HIV-1 transcription, as the presence of the Sp1-binding sites in the HIV-1 LTR is crucial for the inhibitory effect of OGT. The HIV-1 LTR promoter containing only functional NF-κB-binding sites but mutated Sp1-binding sites (LTR-κB-Sp1mut) was activated by OGT overexpression, indicating that O-GlcNAcylation of NF-κB increases the transcription rate from the LTR-κB-Sp1mut promoter. Accordingly, O-GlcNAcylation of NF-κB has been described and it was suggested to enhance the nuclear translocation of NF-κB (21). However, the transcription of the wild-type HIV-1 LTR (LTR-κB-Sp1wt) containing both NF-κB- and Sp1-binding sites was inhibited by O-GlcNAcylation of Sp1. This indicates that O-GlcNAcylation of Sp1 inhibits the activity of the HIV-1 LTR in a dominant manner. Unlike the NF-κB-binding sites, the Sp1-binding sites are well conserved throughout primate lentiviral LTRs (18, 79), suggesting a conserved regulation of Sp1-mediated gene transcription for lentiviral LTRs, which is probably dominant over NF-κB regulation. Furthermore, the dominant effect of Sp1 over NF-κB might be attributed to the synergistic mode of action of both transcription factors on the HIV-1 LTR (59, 60). A potential molecular mechanism for the dominant repressive effect of O-GlcNAcylated Sp1 on the HIV-1 LTR relies on the ability of Sp1 to recruit corepressors to the promoter. It has been reported that Sp1 can act as an anchor site for a repressor complex consisting of histone deacetylases and mSin3a (82). Intriguingly, Sp1 has also been detected in a DNA-protein complex with c-Myc and histone deacetylase 1 at the HIV-1 LTR, and it has been shown that this complex is involved in the establishment of proviral latency (33). Thus, O-GlcNAcylation of Sp1 might quickly put initiated viral replication into a state of latency. Most importantly, increased O-GlcNAcylation of Sp1 prior to HIV-1 infection might efficiently prevent the onset of viral replication. This is supported by the fact that stable overexpression of OGT strongly inhibited replication of HIV-1 in de novo-infected lymphocytes. Altogether, these results suggest that O-GlcNAcylated Sp1 efficiently inhibits gene expression from the HIV-1 LTR and may be involved in the regulation of the viral life cycle. Many reports have evaluated the effect of highly active antiretroviral therapy (HAART) on glucose metabolism (15, 17, 49) and the appearance of diabetes as a consequence of HAART (40, 51). But up to now, no studies have investigated the effect of glucose metabolism on HIV-1 replication. Our results indicate that the O-GlcNAc level, and thus the glucose metabolism, may influence the life cycle of HIV-1. Accordingly, inducers of Sp1 O-GlcNAcylation such as GlcN and 2-deoxyglucose, which are approved in clinical treatment of osteoarthritis and human genital herpes infections, respectively (3, 61), may support HAART. The establishment of a metabolic treatment may supplement the repertoire of antiretroviral therapies against AIDS. Inactivation of Drosophila Huntingtin affects long-term adult functioning and the pathogenesis of a Huntington’s disease model SUMMARY A polyglutamine expansion in the huntingtin (HTT) gene causes neurodegeneration in Huntington’s disease (HD), but the in vivo function of the native protein (Htt) is largely unknown. Numerous biochemical and in vitro studies have suggested a role for Htt in neuronal development, synaptic function and axonal trafficking. To test these models, we generated a null mutant in the putative Drosophila HTT homolog (htt, hereafter referred to asdhtt) and, surprisingly, found that dhtt mutant animals are viable with no obvious developmental defects. Instead, dhtt is required for maintaining the mobility and long-term survival of adult animals, and for modulating axonal terminal complexity in the adult brain. Furthermore, removing endogenous dhtt significantly accelerates the neurodegenerative phenotype associated with a Drosophila model of polyglutamine Htt toxicity (HD-Q93), providing in vivo evidence that disrupting the normal function of Htt might contribute to HD pathogenesis. INTRODUCTION Huntington’s disease (HD) is an autosomal dominant, progressive neurodegenerative disorder characterized clinically by deteriorating choreic movements, psychiatric disturbances and cognitive deficits (Gusella and MacDonald, 1995; Martin and Gusella, 1986; Vonsattel et al., 1985). HD is caused by an abnormal expansion of a polyglutamine (polyQ) tract at the N-terminus of a large cytoplasmic protein, huntingtin (Htt) (The Huntington’s Disease Collaborative Research Group, 1993). The polyQ tract contains between 6 and 35 repeats in the wild-type Htt protein, whereas it is expanded to beyond 36 repeats in HD (The Huntington’s Disease Collaborative Research Group, 1993). Numerous studies have demonstrated that mutant Htt containing an expanded polyQ tract is toxic to neurons (Cattaneo et al., 2001; Gusella and MacDonald, 2000). PolyQ expansion is also linked to at least eight other neurodegenerative disorders, collectively referred to as polyQ diseases (Riley and Orr, 2006; Zoghbi and Orr, 2000). Although Htt is ubiquitously expressed in the brain, HD mainly affects medium-sized spiny neurons in the striatum and to a lesser extent cortical pyramidal neurons that project to the striatum, suggesting that other cellular factors also contribute to pathogenesis (Cattaneo et al., 2001; Vonsattel and DiFiglia, 1998). Recent studies indicate that an alteration of wild-type Htt function might contribute to this specificity and to subsequent disease progression (Cattaneo et al., 2001). For example, mutant Htt can sequester wild-type Htt into insoluble aggregates, thereby exerting a dominant negative effect (Huang et al., 1998; Kazantsev et al., 1999; Narain et al., 1999; Preisinger et al., 1999; Wheeler et al., 2000). In addition, wild-type Htt can suppress the cell death induced by mutant polyQ-expanded Htt in vitro (Leavitt et al., 2001; Van Raamsdonk et al., 2005). Furthermore, wild-type Htt is proposed to have a neuroprotective role as expression of Htt can protect cultured striatal neurons from stress- and toxin-mediated cell death (Rigamonti et al., 2000). Since its identification, the normal function of Htt has been subject to extensive investigation (Cattaneo et al., 2001; Harjes and Wanker, 2003). The murine Htt homolog (also known as Hdh) is essential during early mouse development, as Htt-null mice die during gastrulation at embryonic day 7.5 (Duyao et al., 1995; Nasir et al., 1995; Zeitlin et al., 1995). Chimeric analysis demonstrated that the early embryonic lethality is the result of a crucial role of Hdh in extraembryonic membranes, as this lethality can be rescued by providing wild-type Hdh function in extraembryonic tissue (Dragatsis et al., 1998). Conditional knockout of Hdh in the mouse forebrain at postnatal or late embryonic stages causes a progressive neurodegenerative phenotype, lending support to the hypothesis that depletion of normal Htt activity during disease progression contributes to HD pathogenesis (Dragatsis et al., 2000). A more recent study in zebrafish, which used morpholino oligos to transiently knockdown endogenous Htt, suggests that Htt has a role in normal blood function and iron utilization (Lumsden et al., 2007). Currently, little is known about the normal biological function of wild-type Htt (Cattaneo et al., 2005). Htt encodes a large cytoplasmic protein of 350 kDa. Structural analysis of Htt proteins identified the presence of many HEAT (huntingtin, elongation factor 3, the A subunit of protein phosphatase 2A and TOR1) repeats, which are approximately 40-amino acid (a.a.) long structural motifs, composed of two anti-parallel helices, of unknown function (Andrade and Bork, 1995). No other domains have been identified in Htt to suggest a biological function for the protein. Functional studies in mammalian systems, mainly from protein interaction assays, have associated Htt with diverse cellular processes including: endocytosis; modulation of synapse structure and synaptic transmission; transcriptional regulation, especially of the brain-derived neurotrophic factor (BDNF) which is essential for the survival of the striatal neurons affected in HD; axonal transport of BDNF and vesicles; and apoptosis (Cattaneo et al., 2001; Cattaneo et al., 2005; Harjes and Wanker, 2003; Zuccato et al., 2001; Zuccato and Cattaneo, 2007). Importantly, only a few of these proposed functions of Htt have been directly tested in vivo owing to the early embryonic lethality associated with Htt-null mutant mice. In an extensive search for Htt homologs in other species, Li et al. (Li et al., 1999) identified a single HTT homolog in Drosophila (htt, hereafter referred to as dhtt). By sequence comparison, the homologous regions between Drosophila and human Htt proteins are mainly located within five discrete areas, including three relatively large continuous regions and two small segments, that cover about one-third of the total protein length (supplementary material Fig. S1A). At the amino acid level, these homologous regions, which are comprised of about 1200 a.a. residues in Drosophila Htt (dHtt), share around 24% identity and 49% similarity (Li et al., 1999). In addition to the sequence similarity, other shared features support the proposal that the Drosophila gene identified in the study by Li et al. is indeed the fly homolog of human Htt (Li et al., 1999). For example, in terms of the protein size, both the Drosophila and human Htt proteins are unusually large and contain 3583 and 3144 a.a. residues, respectively (Li et al., 1999). In addition, the regions with a relatively high level of conservation are not only clustered in large continuous stretches, but are also located in the same order and distributed over the entire length of the proteins. Moreover, dhtt and mammalian HD genes share similar patterns of gene expression (Li et al., 1999) (Fig. 1). Interestingly, although an HTT homolog exists in Drosophila, no HTT-like gene has been found in other less complex eukaryote species such as C. elegans or the yeast S. cerevisiae (Li et al., 1999). Ubiquitous expression of dhtt in Drosophila. (A–F) dhtt is widely expressed at a low level during Drosophila development, as revealed by whole-mount in situ hybridization. (A,B) Stage 15 Drosophila embryos stained with digoxigenin (DIG)-labeled dhtt antisense probes revealed the low-level and ubiquitous expression of the dhtt transcript (A); control embryos at the corresponding stage, which were stained with dhtt sense probes, showed only minimal background signals (B). All embryos are lateral views, anterior to the left and dorsal side up. (C–F) Third instar larval tissues hybridized with dhtt antisense probes. Low-level and ubiquitous dhtt expression was observed in the brain (C), wing and leg (D) and eye imaginal discs (E). (F) A negative in situ control – an eye imaginal disc from a dhtt deletion mutant stained in parallel – showed only minimal background signals. (G–K) The dHtt protein predominantly localizes to the cytoplasm. (G,H) In transfected Drosophila S2 cells, ectopically expressed dHtt protein (green) was found predominantly in the cytoplasm; it was also found on cellular protrusions, but was mostly excluded from the nucleus. (G) Overlaying images of the S2 cells co-stained with phalloidin (red), which detects F–actin, and the DNA dye DAPI (blue) reveal the overall cell morphology and the cell nuclei, respectively. (I–K) Cytoplasmic localization of the dHtt protein (green) ectopically expressed in Drosophila third instar larval imaginal disc tissues. (I) The anti-dHtt antibody can recognize overexpressed dHtt in the patched expression domain driven by patched-Gal4 and shows the characteristic striped dHtt expression pattern in the middle of the wing and leg imaginal discs. Genotype: patched-Gal4/+>UAS-dhtt/+. (J,K) High-magnification view of an eye imaginal disc with ectopically expressed dHtt protein [green (J)], which shows a mainly cytoplasmic localization. (K) Overlaying images of the same eye disc region co-stained with DAPI (red) to reveal the cell nuclei. Genotype: GMR-Gal4/+>UAS-dhtt/+. The identification of a Drosophila Htt homolog provides a unique opportunity to evaluate the role of Htt in this well-established genetic model system. Several cellular processes implicated in Htt function, including axonal transport and synapse formation, have been well-characterized in Drosophila, allowing an in vivo evaluation of their relationship with Htt. Further, as fly models of HD have been well-established, this model allows an in vivo examination of the function of endogenous Htt in HD pathogenesis (Marsh and Thompson, 2006; Steffan et al., 2001). In this study, we report the isolation of a dhtt mutant and describe its phenotype. Further, we examine how the removal of endogenous dhtt affects several cellular processes that have previously been implicated with Htt, and test how the loss of endogenous dhtt affects the pathogenesis associated with an established Drosophila model of polyQ toxicity (HD-Q93). RESULTS HEAT repeats in dHtt Considering the limited sequence homology between mammalian and fly Htt, it is important to examine the extent of the structural similarity between these proteins. In the Htt family proteins, the HEAT repeat is the only identifiable structural motif (Andrade and Bork, 1995; Cattaneo et al., 2005). A previous phylogenetic study identified 16 HEAT repeats in human Htt and, notably, 14 of these 16 repeats were also found in insect Htt proteins including dHtt (Tartari et al., 2008). A less stringent structural analysis predicted up to 40 HEAT repeats (including the AAA, ADB and IMB subgroups) in human Htt (see Methods). Interestingly, using the same parameter, 38 HEAT repeats could be identified in dHtt (see supplementary material Fig. S1 for details of the predicted HEAT repeats). Further, these HEAT repeats span the entire length of each protein and have a similar distribution, clustering in four groups at the N-, middle- and C-terminal regions, which have a large overlap with their segments of homologous sequences (supplementary material Fig. S1B). Although further studies are needed to elucidate the structure of the Htt proteins, this rather remarkable similarity raises the possibility of a conserved secondary structure among Htt family proteins and that both human and Drosophila Htt proteins are composed largely of repeated HEAT motifs. Ubiquitous expression of dhtt in Drosophila Previous analysis has shown that dhtt is widely expressed during all developmental stages from embryos to adults (Li et al., 1999). We confirmed the expression of the dhtt transcript in adults using reverse transcription (RT)-PCR (data not shown) (Fig. 5C). To examine the tissue-specific dhtt expression, we performed whole-mount RNA in situ hybridization. Staining with two DIG-labeled antisense probes, targeting different regions of dhtt, revealed similar ubiquitous dhtt expression at different stages of fly embryogenesis and in larval tissues (Fig. 1A–F and data not shown), whereas a positive control performed in parallel gave rise to robust in situ signals (supplementary material Fig. S2). Importantly, of the two negative controls included in the assay, one using sense dhtt RNA probes on wild-type samples (Fig. 1B) and the other using the same set of antisense dhtt RNA probes against tissues from a dhtt deletion mutant that had been generated subsequently (Fig. 1F), both produced much weaker background signals. Together, these data indicate that dhtt is widely expressed at low levels during all stages of Drosophila development. Compromised mobility and viability of aging dhtt-ko mutants. (A) Spontaneous locomotion assay. dhtt-ko mutants show normal mobility at day 15 but older animals show significantly reduced mobility. (B) Age-dependent survival rate of adult animals. dhtt-ko mutants have a reduced life span. Both the mobility and viability defects in dhtt-ko mutants were rescued by the presence of a dhtt genomic minigene construct (‘dhtt-ko Rescue’). Flies were collected from at least three different batches. The total number of flies counted for viability quantification were: wild type, n=659; dhtt-ko, n =1573; dhtt-ko Rescue, n=804; elav-Gal4/+; dhtt-ko, n=550. The difference between wild-type and dhtt-ko flies is statistically significant, P=0.0001, Student’s t-test. The difference between wild-type and dhtt-ko Rescue flies is not statistically significant, P=0.36. The difference between wild-type and elav-Gal4/+; dhtt-ko flies is statistically significant, P<0.00001. The data in (A,B) are presented as the means±s.e.m. (C) RT-PCR analysis confirmed that expression of endogenous dhtt was lost in dhtt-ko mutants (lane 3) but was restored by the presence of the dhtt genomic minigene rescue construct (lane 4), similar to that in wild-type controls (lane 2). RT-PCR was performed on total RNA samples extracted from adult animals of each of the indicated genotypes. Primers for RT-PCR were located in adjacent exons in the control rp49 gene (the group of four wells on the left) or in neighboring exons at the N-terminal (targeting exons 5 and 6, dhtt-N), middle (targeting exons 13 and 15, dhtt-M) and C-terminal (targeting exons 23 and 24, dhtt-C) regions of the dhtt gene (see Methods). The lane 1s are controls of PCR products from a wild-type genomic DNA template using these primer pairs, which are longer than the RT-PCR products generated by the same pair of primers owing to the spliced-out introns, thus confirming that RT-PCR products were indeed amplified from transcribed RNA templates. w1118: wild-type control. dHtt is a cytoplasmic protein Mammalian Htt proteins are largely cytoplasmic with a widespread expression pattern (DiFiglia et al., 1995; Gutekunst et al., 1995; Sharp et al., 1995). To determine the expression and subcellular localization of the endogenous dHtt protein, we developed an affinity-purified polyclonal antibody against dHtt. The specificity of this antibody was confirmed by its ability to recognize ectopically expressed dHtt protein in transfected Drosophila S2 cells (Fig. 1G,H) and larval tissues (Fig. 1I–K). When the dHtt protein was ectopically expressed from a UAS-dhtt transgene by using a patched-Gal4 driver, our anti-dHtt antibody could easily detect the striped pattern of dHtt expression in the middle of imaginal discs, which is the characteristic domain of Patched expression (Fig. 1I). In wild-type animals, use of the anti-dHtt antibody resulted in low-level ubiquitous staining in embryos, larval and adult tissues, with no specific pattern of protein expression (data not shown). At the subcellular level, ectopically expressed dHtt was found predominantly in the cytoplasm in transfected S2 cells and in larval tissues (Fig. 1G–K and data not shown), suggesting that dHtt, similar to its human counterpart, is mainly a cytoplasmic protein. Creating a dhtt deletion Similar to human Htt, dhtt encodes an unusually large protein of 3583 a.a. residues. The cDNA for the dhtt gene is 11,579 base pairs (bp) long and is derived from 29 exons in a 38 kb transcribed genomic region at cytological interval 98E2 (Li et al., 1999) (Fig. 2A,C). No null mutations in dhtt have been isolated previously. To generate a null mutant for dHtt, we selected two FRT-bearing insertion lines surrounding dhtt: p-element d08071, which is inserted at the 5' end of the neighboring CG9990 gene, and piggyBac insertion f05417, which is located inside the intron between dhtt exon 27 and exon 28 near the 3' end of the gene (Fig. 2A). Using flipase (FLP)–FRT-mediated recombination (Parks et al., 2004), we generated a precise deletion of 55 kb between the two insertions (see Methods). This deletion allele, termed Df(98E2), removed most of the CG9990 gene and 34 kb of the 38 kb genomic-coding region for dhtt, with only the last two exons of the dhtt gene remaining (Fig. 2A–C). The deletion was confirmed by inverse PCR from extracted genomic DNA and by DNA sequencing (data not shown). Genomic organization of the dhtt locus and the dhtt-ko deletion. (A) The genomic structure of cytological region 98E2. The scale bar on top indicates the gene size (in base pairs). The position and transcriptional direction of dhtt and nearby genes (open boxes) is shown; introns (dashed lines) and exons (colored filled boxes) are labeled, as well as the FRT insertions (triangles) used for generating the Df(98E2) deficiency. (B) dhtt-ko was generated by an FRT-mediated precise deletion [Df(98E2)] of a 55 kb genomic region covering most of CG9990 and dhtt. A genomic transgene covering the entirety of CG9990 (open box) was reintroduced as a transgenic construct in the Df(98E2) background. (C) Detailed genomic structure of the dhtt gene. The scale bar, with predicted BamHI fragments, is drawn at the top. The exons of the dhtt gene are depicted as red arrows and squares, whereas introns are highlighted as red dashed lines, both are drawn to scale. (D) The dhtt-ko removes all but the last two of the 29 exons in dhtt, as verified by Southern blotting using BamHI digestion of genomic DNA. DNA extracted from control and dhtt-ko adult animals was hybridized with a DNA probe targeting all exons of dhtt (see Methods). The two BamHI fragments (1.95 kb and 1.59 kb) that contain the last two remaining exons in dhtt-ko mutants are highlighted in (C). Drosophila containing the Df(98E2) deletion, which removes both CG9990 and dhtt, are homozygous lethal at the embryonic stage. CG9990 encodes a previously uncharacterized protein belonging to the ABC transporter superfamily (Dean et al., 2001). To separate the mutant phenotype of dhtt from CG9990, we generated transgenic flies carrying a CG9990 genomic rescue transgene in the Df(98E2) background (see Methods). These lines, referred to as dhtt-ko (dhtt-knockout only), carry both the CG9990 rescue transgene and the Df(98E2) deletion, and thus are mutant only for dhtt (Fig. 2B–D). dhtt is dispensable for Drosophila development The dhtt-ko allele removes 27 of the 29 exons of dhtt (Fig. 2). Since Htt homozygous knockout mice die during early embryogenesis (Duyao et al., 1995; Nasir et al., 1995; Zeitlin et al., 1995), we expected that loss of dhtt in the fly would be associated with prominent developmental defects. However, dhtt-ko flies are homozygous viable, demonstrating that the lethality observed following the Df(98E2) deletion is caused by loss of CG9990. To verify that the dhtt gene is indeed deleted in dhtt-ko flies, we extracted genomic DNA from homozygous dhtt-ko adults and performed a Southern blot analysis. As shown in Fig. 2D, all genomic DNA containing the dhtt gene was removed in dhtt-ko flies except for the final two 3' exons. Further analyses demonstrated that homozygous dhtt-ko flies develop at a similar rate to wild-type flies and give rise to fertile adults with no discernible morphological abnormalities. The progeny derived from homozygous dhtt-ko flies did not show a reduction in viability or display other obvious developmental defects, suggesting that maternally contributed dhtt has no significant effect on animal development or function. To determine whether developmental defects are present in dhtt-ko animals, we characterized dhtt-ko mutants using a variety of cellular and neuronal markers. These studies failed to reveal any obvious developmental abnormalities during embryogenesis, or during larval and adult stages (Fig. 3 and data not shown). In particular, the embryonic central nervous system (CNS) (Fig. 3A) and muscles, and the larval muscles, CNS, eye and other imaginal discs all appeared normal (Fig. 3B and data not shown). Further, in aged adults (40-day-old flies), the external eye morphology was normal and the eight neuronal photoreceptor cells in each ommatidium were clearly present, together with their accessory cells (Fig. 3C,D). We note that our finding is different from a recent study that examined dhtt function using RNA interference (RNAi), which implicated a role for dhtt in axonal transport and eye integrity (Gunawardena et al., 2003). The exact nature of this phenotype discrepancy is not clear. Given the experiment was carried out at 29°C, it is possible that the observed RNAi phenotypes were the result of cellular toxicity caused by the high-level expression of Gal4 (Gunawardena et al., 2003). The more severe phenotypes might also be the result of the non-specific RNAi off-target effects caused by the knockdown of unrelated genes (Kulkarni et al., 2006). Nevertheless, our results suggest that dhtt is dispensable for normal Drosophila development. This result is in contrast to the essential role of Htt in mouse (Duyao et al., 1995; Nasir et al., 1995; Zeitlin et al., 1995). These phenotypic differences are probably the result of differences in mouse and fly embryogenesis, as the early lethality of Htt-null mice is due to the function of Hdh in extraembryonic membranes, for which there are no equivalents in Drosophila (Dragatsis et al., 1998) (see Discussion). dhtt is dispensable for Drosophila development. (A) Normal development of the CNS during embryogenesis in dhtt-ko flies (A3,A4) compared with wild-type controls (A1,A2), as revealed by anti-Armadillo staining. (A2,A4) Enlarged views of the ventral nerve cord in wild-type (A2) and dhtt-ko (A4) embryos show its regular ladder-like structure. (B) Differentiation and patterning of the eye during the third instar larval stage in dhtt-ko flies (B3,B4) is indistinguishable from wild-type controls (B1,B2), as revealed by anti-Elav staining (red) (B2,B4) to label differentiated neurons and phalloidin staining for F-actin (green) (B1,B3) to reveal the overall cytoskeleton organization. (C,D) Eye images of 40-day-old dhtt-ko mutants. Both the overall external eye morphology (C) and the internal organization of neuronal photoreceptors (D) are normal, even in the 40-day-old dhtt-ko adults. (E–I) Synaptic development is normal in dhtt mutants. Low (E1,E5) and high (E2–E4,E6–E8,F1–F8) magnification confocal images of glutamatergic NMJs in the abdominal segment A3 of third instar muscles 6 and 7. (E) NMJs are double-labeled with the neuronal membrane marker anti-HRP (red) and anti-Dlg (green), which reveal the well-defined presynaptic and postsynaptic NMJ structures in wild-type (E1–E4) and dhtt-ko mutants (E5–E8). (E2–E4,E6–E8) Magnified views of the areas highlighted in (E1,E5). (F) NMJs are labeled with anti-HRP (white) (F1,F5), the periactive zone marker anti-FasII (green) (F2,F6) and the active zone marker nc82 (red) (F3,F7), revealing the normal periactive zone and active zone organization in wild-type (F1–F4) and dhtt-ko flies (F5–F8). (F4,F8) Overlayed images of (F1–F3,F5–F7), respectively. WT: w1118 wild-type control. Bar, 5 µm (in all panels). (G–I) Quantitative analysis of NMJs in muscle 6 and 7 of the abdominal segment A3 for wild-type control (WT, blue) and dhtt-ko mutant (red) flies. (G) Average number of type 1b boutons: WT control=34.5±1.6 (n=24), dhtt-ko mutants=33.5±2.4 (n=21); the difference is statistically insignificant, P=0.72 (Student’s t-test). (H) Average number of total boutons: WT control=63.1±1.8 (n=24), dhtt-ko mutants=62.3±2.0 (n=21); P=0.76- (I) Total number of branches: WT control=18.0±0.6 (n=28), dhtt-ko mutants=17.4±0.6 (n=34); P=0.52. The data in (G–I) are presented as the means±s.e.m. (standard error of the mean). Normal synapse organization in dhtt-ko Htt has been reported to interact with a diverse group of proteins whose functions have been directly or indirectly linked to synapse organization and synaptic activity (Harjes and Wanker, 2003), including proteins that regulate cytoskeleton dynamics and clathrin-mediated endocytosis [e.g. HIP1 (Sla2p), HIP12, PACSIN/syndapin 1 and endophilin 3] (Chopra et al., 2000; Higgins and McMahon, 2002; Kalchman et al., 1997; Modregger et al., 2002; Seki et al., 1998; Singaraja et al., 2002; Sittler et al., 1998; Wanker et al., 1997); axonal vesicle transport (e.g. HAP1) (Engelender et al., 1997; Gunawardena et al., 2003); and dendritic morphogenesis and synaptic plasticity (e.g. the postsynaptic density protein DLG4/PSD95 and the adaptor proteins GRB2 and TRIP10/CIP4) (Holbert et al., 2003; Liu et al., 1997; Sun et al., 2001). To test whether dhtt plays a role in synapse organization, we examined the formation of glutamatergic neuromuscular junctions (NMJs) in third instar larvae, a well-characterized system for studying synapse formation and function in Drosophila (Budnik and Gramates, 1999). Examination with a panel of synapse markers showed that axonal pathfinding, muscle innervation and overall synapse structure are normal in dhtt mutants (Fig. 3 and data not shown). Double-labeling with the axonal membrane marker anti-HRP and the postsynaptic density marker anti-Dlg (the Drosophila PSD95 homolog) revealed well-organized presynaptic and postsynaptic structures (Fig. 3E1–E8), as well as an enrichment of synaptic vesicles at the synapses, with no obvious synapse retraction phenotype (data not shown). Moreover, dhtt mutants showed normal organization of the presynaptic microtubule (MT) cytoskeleton, with the presence of stable MT bundles traversing the center of NMJ branches and a dynamically reorganized MT network at distal boutons (data not shown). Quantification of NMJ bouton number and axonal branching did not reveal a significant difference between wild-type controls and dhtt mutants (Fig. 3G–I). Finally, dhtt mutants displayed the stereotypical complementary pattern of active zones surrounded by the honeycomb-like organization of periactive zones, and examination of multiple synaptic proteins such as nc82 (an active zone marker) and fasciclin II (FasII, a periactive zones marker) showed normal synaptic localization (Fig. 3F1–F8). As a control, we also examined synapse organization in CG9990 transgenic animals, which displayed similar well-organized NMJ structures (supplementary material Fig. S3 and data not shown). Together, these results suggest that dhtt is not essential for synapse formation and organization at NMJs. dhtt is not essential for axonal transport Htt has also been proposed to regulate axonal vesicle transport because one of its binding partners, HAP1, interacts directly with p150Glued, which is an essential subunit of the dynactin complex involved in regulating dynein-mediated retrograde axonal transport (Engelender et al., 1997; Gunawardena et al., 2003; Harjes and Wanker, 2003). In Drosophila, individuals that are defective for essential components of the axonal transport machinery often display characteristic mobility phenotypes, such as tail flipping during larva crawling, and progressive lethargy and paralysis (Gindhart et al., 1998; Gunawardena and Goldstein, 2001; Martin et al., 1999). Such mutant animals also develop an axonal swelling phenotype, owing to the abnormal accumulation of synaptic vesicles along axons, and fail to properly deliver and localize synaptic components to the termini (Gindhart et al., 1998; Gunawardena and Goldstein, 2001; Martin et al., 1999). dhtt mutants showed normal crawling behavior during larval stages. Furthermore, the distributions of the synaptic vesicle markers anti-synaptotagmin (Syt) and anti-cysteine string protein (CSP) were normal, revealing no obvious accumulation or axonal swellings (Fig. 4 and data not shown). Further, immunostaining with other synaptic components, including synapsin (a reserved vesicle pool marker), FasII (a periactive zone marker) and Dlg, demonstrated that synaptic components were properly delivered to synapses (Fig. 3E–F and data not shown). Similarly, examination of CG9990 transgenic animals failed to reveal discernible larval mobility or axonal transport defects (supplementary material Fig. S3). Together, these results suggest that dhtt does not play an essential role in axonal transport. Normal axonal transport in dhtt-ko mutants. Confocal images of wild-type control (A–C) and dhtt-ko (D–F) NMJs and neighboring axons (white arrows) of the larval peripheral nervous system (anti-HRP, red). Synaptic vesicles (green, anti-Syt) are properly delivered to NMJs and show no obvious accumulation in the axons of dhtt-komutants (D–F), similar to the wild-type controls (A–C). Signals for Syt were overexposed in (B,C,E,F) in order to reveal any possible abnormal accumulations of synaptic vesicles within axons (white arrows). (C,F) Overlayed images of double staining with anti-HRP and anti-Syt. WT: w1118 wild-type control. Bars, 10 µm. dhtt is crucial for aged adults We next investigated whether dhtt might function in adult animals. We examined whether newly emerged dhtt adults were hypersensitive to stress tests including prolonged heat and cold exposure, vortexing and feeding with the oxidative stress compound paraquat. In these tests, dhtt mutants showed similar responses to wild-type controls (data not shown). Thus, unlike flies that are mutated for Parkinson’s disease genes such as parkin and pink1, which are sensitive to multiple stress challenges (Clark et al., 2006; Greene et al., 2003; Park et al., 2006), loss of dhtt does not render young adult animals more vulnerable to environmental stresses. Next, we followed the activity and viability of dhtt animals throughout the adult life cycle. Although no discernible difference in the activity and survival rate was observed between dhtt-ko and wild-type young adults, we observed striking defects in older adult dhtt-ko flies. dhtt-ko animals showed similar spontaneous locomotion to that of wild-type controls at day 15 and earlier (Fig. 5A; supplementary material Movie 1). However, as flies aged, dhtt-ko mutants displayed a rapidly declining mobility, which was evident by day 25. By day 40, almost all dhtt mutants showed severely compromised mobility (Fig. 5A; supplementary material Movie 2) and their viability declined quickly. Whereas, on average, half of the wild-type controls died around day 59 but could live for up to 90 days, half of the dhtt-ko flies died around day 43 and almost all by day 50 (Fig. 5B). To verify that the late-onset mobility and viability defects were because of loss of dhtt, we constructed a dhtt minigene that expressed full-length dhtt under the control of its endogenous regulatory region (see Methods). In the presence of this mini-dhtt transgene, the expression of the dhtt gene was restored in dhtt-ko mutants, as confirmed by RT-PCR (Fig. 5C) Both the late-onset mobility and viability phenotypes observed in dhtt-ko mutants were rescued by the dhtt transgene (Fig. 5A,B; supplementary material Movie 3). Importantly, introduction of an unrelated transgene construct, such as elav-Gal4 (Fig. 5B) or UAS-eGFP (data not shown), into the dhtt-ko background could not rescue the mobility and viability phenotypes of dhtt-ko mutants, confirming that the rescue was because of the restored expression of dhtt. Thus, although dhtt is not essential for normal Drosophila development, its function is important in maintaining the long-term mobility and survival of adult animals. dhtt-ko is dispensable for normal neurotransmission The observed mobility and viability phenotype could be because of an underlying neurotransmission defect in dhtt mutants as several proposed Htt functions, such as axonal vesicle transport and clathrin-mediated endocytosis in neurons, are essential for the delivery and recycling of synaptic vesicles at nerve terminals to ensure effective neurotransmission (Cattaneo et al., 2001; Eaton et al., 2002; Harjes and Wanker, 2003; Hinshaw, 2000; Slepnev and De Camilli, 2000). To test the neuronal communication in dhtt mutants, we measured synaptic physiology at the well-characterized third instar larval NMJ. We quantified the amplitude of evoked excitatory junctional potentials (EJPs), resting membrane potential and paired-pulse facilitation (PPF) (Fig. 6A–D and data not shown). In adult animals, we recorded electroretinogram (ERG) responses in the eye and quantified DLM (dorsal longitudinal flight muscle) bursting activity in the giant fiber flight circuit (Fig. 6E and data not shown). Following exhaustive analysis, we found no significant defect in either synaptic transmission or short-term plasticity in dhtt mutants. Together, these data suggest that dhtt is not essential for neurotransmission. Normal synaptic transmission in dhtt-ko mutants. (A–F) Electrophysiological analyses of dhtt mutants (see Methods). (A) Voltage traces of evoked EJPs recorded from muscle fiber 6 in dhtt mutant or control third instar larvae. (B) Measurements of evoked EJP amplitude. The dhtt-ko mutants exhibited no significant difference in synaptic vesicle release following stimulation (P=0.18, Student’s t-test). Resting membrane potential was unchanged in animals lacking dhtt. (C) Voltage traces and (D) quantification of PPF in control and dhtt mutant third instar larvae. No change in the amplitude of PPF was observed in dhtt mutants, indicating that short-term plasticity is intact. (E) ERGs recorded from controls and dhtt-ko mutants aged 40–45 days or from dhtt-ko mutants aged 1–3 days at 20°C (top graphs) or at 37°C (bottom graphs); heat pulses were given at regular intervals. The black bar below the trace indicates the test light pulses. (F) Percentage of adult animals aged 40–45 days with a loss of phototransduction at 37°C. dhtt-ko mutants showed a more severe temperature-induced loss of phototransduction than control flies, suggesting that photoreceptors in aged dhtt-ko mutants were stress-sensitive compared with the controls. The dhtt-ko Rescue animals were used as controls in the above electrophysiological analyses to ensure a consistent genetic background (A–F). The data in (B) and (D) are presented as the means±s.e. To investigate whether the observed mobility and viability phenotypes in aged dhtt mutants are correlated with neuronal communication defects, we screened for electrophysiological phenotypes in animals aged 40–45 days, when behavioral motor abnormalities are prominent. Aged dhtt mutant adults did not display abnormal seizure activity in extracellular recordings from DLM flight muscles in the giant fiber escape pathway, as has been observed in temperature-sensitive Drosophila mutants that have altered synaptic transmission (Guan et al., 2005). Aged mutants also showed normal visual transduction and synaptic transmission in the visual system at room temperature (Fig. 6E and data not shown), consistent with the observation that dhtt does not play a crucial role in neurotransmission. Previous studies have shown that mutations affecting synaptic function often manifest a more prominent phenotype under temperature-induced stress (Atkinson et al., 1991; Coyle et al., 2004). To assess whether phototransduction can be maintained under an elevated temperature in dhtt mutants, we performed ERG recordings at 37°C. Although dhtt mutants aged 1–3 days displayed normal ERGs at both 20°C and 37°C, aged dhtt mutants displayed abnormal sensitivity to 37°C, with over 60% of aged mutants losing light-induced phototransduction and the on/off transients at 37°C compared with only 20% of control aged adults (Fig. 6E,F), indicating an important role of dhtt in maintaining phototransduction under temperature-induced stress. Given that aged dhtt mutant adults did not show additional motor defects when placed at 37°C, we hypothesize that the loss of photoreceptor depolarization found in ERG recordings reflects temperature-sensitive defects in the phototransduction cycle, rather than in synaptic transmission. These results suggest that aged dhtt mutants are more sensitive to stress than young dhtt animals or aged controls. Reduced axon terminal complexity in dhtt-ko brains To further test for dhtt function in the adult brain, we examined dhtt brain morphology using a series of cellular markers. The overall patterning and gross morphology of dhtt-ko brains appeared normal, as shown by staining with antibodies such as the glial cell marker anti-Repo, the pan-neuronal marker anti-Elav and the synaptic vesicle marker anti-CSP (data not shown). In addition, within the ventral nerve cord, the neuropile was similarly enriched with synaptic vesicles. Further, the overall axonal morphology of dhtt-ko mutants appeared normal with no clear axonal blebbing or defasciculation phenotypes (data not shown). Interestingly, examination of the mushroom bodies (MBs), which are involved in learning and memory in flies, by anti-FasII staining revealed that the signal intensity was weaker in dhtt mutants than in wild-type controls, despite appearing morphologically normal (Fig. 7J,M). Quantification of MB size and FasII staining signals revealed that, although there was a slight reduction in the average size of MBs in dhtt-ko mutants compared with wild-type controls (Fig. 7P) (average area covered by each MB: wild type=13,352±156 µm2, dhtt-ko mutant=11,927±310 µm2; P=0.0003), the average signal intensity of FasII signals in the MBs of dhtt-ko mutants was decreased by ~50% (Fig. 7Q) (relative signal intensity of MBs: wild type=100±5.8/µm2, dhtt-ko mutant=47.7±6.1/µm2; P<0.0001; total number of MBs quantified: wild-type control, n=13; dhtt-ko mutants, n=10). Reduced complexity of axonal termini in dhtt-ko brains. (A–C) Brain morphology of a 40-day-old wild-type adult fly as revealed by anti-FasII staining (red) (A), which strongly labels the MBs. In the same brain, A307-positive neurons and their axonal projections are revealed by using a membrane-bound mCD8-eGFP reporter (green) (B). (C) An overlay of images (A) and (B) showing the relative positions of MBs and A307-positive neurons in the brain. A307-positive neurons with prominent axonal projections are highlighted [white arrows in (C)]. The white dashed lines delineate the region magnified in the following pictures (D–O). (D–I) Axonal projection patterns and axon terminal structures of A307-positive neurons in 40-day-old wild-type (D–F) and dhtt-ko mutant (G–I) brains. For clear visualization, the posterior (D,G) and anterior (E,H) of the brains are projected separately. (D,G) A307-positive neurons have a similar axonal projection pattern (white arrows) in dhtt-ko mutant (G) and control brains (D). (E,H) Anterior views of the same brain regions showing the axon terminal structure. The white dashed lines border one axon terminus in each brain, which are further magnified in (F) and (I), respectively. Notice the significant reduction in branching and varicosities at the axonal termini of the dhtt-ko mutant (H,I). (J–O) The MBs (J,M) and axonal termini of A307-labeled neurons (K,N) in another pair of 40-day-old wild-type (J–L) and dhtt-ko mutant (M–O) brains. (M) The overall morphology of the MBs is normal but the signal intensity is weaker in the dhtt-ko mutant. The white dashed lines border one axon terminus in each brain. Note the reduced complexity of axonal termini in the dhtt-ko mutant. The top of each image is the dorsal end of the brain. WT=w1118 wild-type control. Bars, 10 µm (F,I) and 30 µm (all other panels). (P,Q) Quantification of the average area covered by each MB (P) and the relative signal intensity of MBs (Q) between wild-type controls (WT, blue) and dhtt-ko mutants (red), as revealed by anti-FasII staining. When calculating the relative signal intensity of MBs, the value for the average signal intensity from the wild-type controls was set as 100, (s.e.m.=5.8); the relative signal intensity of the dhtt-ko mutants=47.7±6.1; P<0.0001. (R) Quantification of the average total area covered by each A307-positive axonal terminus. The genotypes of each fly line tested are indicated under each chart. WT=w1118 wild-type control. The data in (P–R) are presented as the means±s.e. To examine the effect of dhtt loss-of-function on the detailed structure of individual neurons in the brain, we used the A307-Gal4 line, which labels the pair of giant fiber (GF) neurons and a small number of other neurons of unknown identity in the adult brain, to examine axonal projection patterns and the fine axonal terminal structure of individual neurons (Phelan et al., 1996) (Fig. 7). Among the A307-Gal4 labeled neurons, one pair, located at the dorsal-lateral edge of the brain, projects a prominent axon tract along the dorsal-posterior surface to the dorsal-central region of the brain, forming extensive dendritic connections (Fig. 7B–D; supplementary material Fig. S4). These neurons further extend their projections anteriorly, establishing a complex axon terminal structure with extensive varicosities and fine branches above the antennal lobe region of the brain (Fig. 7E,F,K; supplementary material Fig. S4). In dhtt-ko mutants, the axonal projections of A307-positive neurons follow the same path and their axons terminate at similar locations to those in wild-type flies; this is consistent with the observation that dhtt does not affect axonal integrity or pathfinding (Fig. 7G,H). Interestingly, axonal termini from both wild-type and dhtt-ko flies show a similar age-dependent maturation process, whereas axonal termini in young adult brains are mainly composed of a network of variable thin branches with no clearly recognizable synaptic boutons (supplementary material Fig. S4C,F) (3-day-old flies). Mature boutons develop as the animals age, with many prominent boutons being easily identifiable in the brains of 40-day-old flies (Fig. 7F,I; compare with supplementary material Fig. S4C,F). Owing to the significant variation in their structure and the lack of recognizable boutons, it is difficult to directly quantify and compare the size of these axonal termini in young adults. However, in aged dhtt-ko mutants, it is apparent that the axonal termini contain a significantly reduced number of varicosities and branches (Fig. 7H,I,N). Quantification of the total area covered by each axonal terminus revealed that the A307-positive axonal termini in 40-day-old dhtt-ko mutants cover about half of the area compared with controls (Fig. 7R) (average area covered by each axonal terminus: wild-type control=168.7±8.0 µm2, dhtt-ko mutant=86.1±7.2 µm2; P<0.0001; total number of A307-positive axonal termini quantified: wild-type control, n=18; dhtt-ko mutants, n=17). To rule out the possibility that this reduced complexity was because of the accelerated aging process or a secondary effect associated with the reduced mobility of dhtt-ko mutants, we examined the axonal termini of the A307-positive neurons in 83-day-old wild-type flies, as animals at this age are near the end of their life span and have severely reduced mobility. The structure of the axonal termini in these flies is similar to that of 40-day-old flies and shows no obvious reduction in terminal complexity (supplementary material Fig. S5). Using the membrane-bound mCD8-eGFP reporter driven by the GF-specific A307-Gal4 line, we further analyzed the axonal projection and terminal morphology of the GF neurons in dhtt-ko mutants. The GF neurons are a pair of large interneurons located in the central brain and project their prominent axons, which are unbranched, to the mesothoracic neuromere (T2) in the ventral nerve cord, where they bend laterally and synapse with other interneurons and motor neurons (Phelan et al., 1996). In both young (3-day-old) and aged (40-day-old) dhtt-ko mutants, GF neurons project normally to the T2 neuromere and form the characteristic terminal bends, resembling those observed in wild-type controls (supplementary material Fig. S6). In 3-day-old animals, the signal intensity of the mCD8-eGFP reporter at the GF axonal termini was similar between dhtt-ko mutants and wild-type controls (supplementary material Fig. S6B,C) (total number of 3-day-old GF axonal termini examined: wild-type control, n=10; dhtt-ko mutants, n=12). However, in 40-day-old animals, the GF axonal termini in wild-type controls showed a much stronger enrichment for the mCD8-eGFP reporter than in dhtt mutants (supplementary material Fig. S6D–G) (total number of 40-day-old GF axonal termini examined: wild-type control, n=16; dhtt-ko mutants, n=20). The exact nature behind such a difference remains to be clarified, but might represent subtle alterations in axonal transport of membrane proteins in aged neurons. Nevertheless, these results suggest that dhtt does not affect axonal pathfinding and overall brain organization, but has a functional role in regulating the complexity of axonal termini in the adult brain. Loss of dhtt enhances the pathogenesis of HD flies Earlier studies in cell culture and Htt mutant mice have shown that wild-type Htt has a protective role for CNS neurons (Dragatsis et al., 2000; O’Kusky et al., 1999; Rigamonti et al., 2000; Van Raamsdonk et al., 2005). Further, existing evidence suggests that normal Htt activity can be inactivated by mechanisms such as abnormal sequestration into insoluble aggregates (Huang et al., 1998; Kazantsev et al., 1999; Narain et al., 1999; Preisinger et al., 1999; Wheeler et al., 2000). These and other observations have lead to the hypothesis that the perturbation of endogenous Htt function, such as by late-onset inactivation of endogenous Htt, contributes to HD pathogenesis (Cattaneo et al., 2001). Attempts to test this directly in mouse Hdh mutants have been complicated by the crucial role of Hdh in early mouse embryogenesis and in the later stages of brain development (Auerbach et al., 2001; Dragatsis et al., 2000; Duyao et al., 1995; Leavitt et al., 2001; Nasir et al., 1995; Van Raamsdonk et al., 2005; White et al., 1997; Zeitlin et al., 1995). However, we have a unique opportunity to examine this in Drosophila because dhtt is not essential for fly embryogenesis and dhtt mutants appear to behave normally at young ages, with only a mild axon terminal defect in the brain. We used a well-established fly HD model for polyQ toxicity (HD-Q93), in which the human HTT exon 1, with 93 glutamine repeats, is expressed in all neuronal tissues (genotype: elav-Gal4/+; UAS-Httexon1-Q93/+) (Steffan et al., 2001). The HD-Q93 flies develop age-dependent neurodegenerative phenotypes in adults, which manifest as initial hyperactivity followed by a gradual loss of coordination and a decline in locomotor ability, with eventual death at around 20 days of age (Fig. 8F–H). The HD-Q93 flies also develop a progressive degeneration of both the brain and other neuronal tissues, most prominently in the photoreceptor cells in the eye (Steffan et al., 2001). Loss of endogenous dhtt enhances the mobility and viability phenotypes of HD flies. (A–D) Retinal organization, as revealed by pseudopupil imaging of 7-day-old adult eyes. Both wild-type (A) and dhtt-ko flies (B) had well-patterned ommatidium with seven rhabdomeres in each. Adult eyes from HD-Q93 (C) and HD-Q93; dhtt-ko (D) flies both showed an extensive loss of photoreceptors (arrows highlight ommatidia with only three or four rhabdomeres). (E) Histogram showing the number of remaining photoreceptors per ommatidium in 11-day-old adults. A similar profile of degeneration was observed for HD-Q93 flies (red) and HD-Q93; dhtt-ko flies (blue). (F–H) Quantification of climbing ability (F), spontaneous locomotion (G) and age-dependent survival rate (H). The genotypes for each of the fly lines tested are provided within each chart. HD flies with a background dhtt-ko mutation show an accelerated loss of mobility (F,G) and earlier lethality (H). The data in (E–H) are presented as the means±s.e. After introducing HD-Q93 into the dhtt mutant background (‘HD-Q93; dhtt-ko’ flies; genotype: elav-Gal4/+; UAS-Httexon1-Q93/+; dhtt-ko/dhtt-ko), we examined the eye degeneration phenotype by quantifying the number of rhabdomeres in each ommatidia as the animals age. The loss of photoreceptor cells in HD-Q93 flies was not significantly enhanced in the absence of endogenous dhtt (Fig. 8A–E). For example, at 11 days of age, approximately 40% of ommatidia lost three photoreceptors in both HD-Q93 and HD-Q93; dhtt-ko flies (Fig. 8E) (number of ommatidia with four photoreceptors: HD-Q93=42.6±2.9%, n=562, eight adult eyes analyzed; HD-Q93; dhtt-ko=37.6±4.1%, n=266, seven adult eyes analyzed; the difference was statistically insignificant, P>0.5). Further, the overall profile of the remaining photoreceptors per a declining mobility that could be detected as early as 5 days of age ommatidia was also similar between these flies (Fig. 8E). Interestingly, although these HD-Q93; dhtt-ko flies showed normal mobility at the beginning of their adult life, they displayed a declining mobility that could be detected as early as 5 days of age and that rapidly deteriorated over the next few days (supplementary material Movies 4–8). During this time, the flies become progressively uncoordinated, showing an increasing frequency of both faltering while walking and falling during climbing (Fig. 8G; supplementary material Movies 4–8). In a standard climbing assay, almost all 3-day-old HD-Q93; dhtt-ko flies could successfully climb to the top of a vial, which is similar to that observed for wild-type controls. However, by day 11, only around 10% of viable HD-Q93; dhtt-ko flies could make it to the top, whereas the success rate was 61% for HD-Q93 flies, and more than 94% for wild-type and dhtt-ko flies (Fig. 8F) [based on an average of at least five independent assays of 20 flies for each genotype at the given age; compared with wild-type controls on day 11, the results from dhtt-ko flies were not significant (P=0.30), whereas the results from HD-Q93 and HD-Q93; dhtt-ko flies were significant (P<0.0001)]. Similarly, when spontaneous locomotion was tested, the HD-Q93; dhtt-ko flies showed a more rapid decline in motility over time, and by day 9, these flies were less than half as active as those in the other groups (Fig. 8G) (activity was measured by the number of spontaneous turns performed every 4 minutes; on day 9, HD-Q93; dhtt-ko=26.4±5.9, n=9; HD-Q93=66.5±5.0, n=8; elav-Gal4; dhtt-ko control=60.6±4.3, n=9; UAS-Httex1Q93; dhtt-ko control=66.2±3.0, n=9; the HD-Q93; dhtt-ko results were statistically significant when compared with UAS-Httex1Q93; dhtt-ko control flies, P=0.00002). Furthermore, the life span of HD-Q93; dhtt-ko flies was shortened significantly, with half of them dying by day 8 and almost all of them by day 14. In contrast, only about 7% of the HD-Q93 flies died at day 8 and half of them by day 14 (Fig. 8H) [viability at day 8: HD-Q93; dhtt-ko=45.0±3.2%, n=409 (from four different crosses); HD-Q93=93.1±1.7%, n=1082 (from eight different crosses); the difference between HD-Q93 and HD-Q93; dhtt-ko flies was statistically significant, P=0.0003]. Notably, at day 15, dhtt-ko mutants were healthy and displayed similar mobility to wild-type controls (Fig. 5A,B). To understand the underlying pathology of these phenotypes, we further examined the brain structure of these flies. Compared with HD-Q93 flies of the same age, the brains of HD-Q93; dhtt-ko flies had already developed a more severe pathology at 5 days. Notably, the MBs appeared less organized – the characteristic bulged tip of the vertical α-axonal lobes, which were prominent in HD-Q93 flies (Fig. 9C, white arrows) (n=14) and other controls, was largely unrecognizable in 95% of HD-Q93; dhtt-ko flies (n=17/18) (Fig. 9H, white arrows) (supplementary material Figs S7 and S8). The clear separation, along the midline, between the pair of medially projected ß-lobes, which was obvious in HD-Q93 flies (Fig. 9C, white arrow) (supplementary material Figs S7 and S8) (n=14) and wild-type flies (Fig. 7J), also became less distinct and often appeared to be merged in HD-Q93; dhtt-ko flies (n=13/18) (Fig. 9H, white arrow) (supplementary material Figs S7 and S8). In addition, the anti-FasII staining of the MBs was not as strong in HD-Q93; dhtt-ko flies as in HD-Q93 flies (Fig. 9C,H; supplementary material Figs S7 and S8). Quantification of MB size and FasII staining signals revealed that there was approximately a 7% reduction in the average size of MBs in HD-Q93; dhtt-ko mutants compared with HD-Q93 controls (Fig. 9K). The γ-lobe signals were too weak to be reliably tracked in both HD-Q93 and HD-Q93; dhtt-ko flies, so only the α- and ß-lobes in each MB were measured (average MB size: HD-Q93=11,137±314 µm2, n=6; HD-Q93; dhtt-ko=10,412±212 µm2, n=7; P=0.04). The average signal intensity of FasII signals in the MBs of dhtt-ko mutants was also decreased, by around 31% (Fig. 9L) (relative signal intensity of α- and ß-lobes in the MBs: HD-Q93=100±6.7 µm2; dhtt-ko mutant=69.3±4.0 µm2; P=0.0004). Furthermore, the HD-Q93; dhtt-ko brains showed larger areas that were devoid of neuronal cells (compare Fig. 9A and 9D with Fig. 9F and 9I, respectively; also see supplementary material Figs S7 and S8). The results obtained from quantifying the brain size of these flies suggested that, although the overall size of the brains was similar between HD-Q93 and HD-Q93; dhtt-ko flies (Fig. 9M) (average brain size: HD-Q93=119,279±2015 µm2, n=7; HD-Q93; dhtt-ko=115,389±2412 µm2, n=7; P=0.003), the total area that was devoid of neuronal cells was increased by about 25% (Fig. 9N) (total area devoid of neurons cells: HD-Q93=26,324±1869 µm2, n=7; HD-Q93; dhtt-ko=32,945±2494 µm2, n=7; P=0.03). Together, these results suggest that there is increased disorganization and an increase in neuronal loss in the brains of HD-Q93; dhtt-ko flies. Thus, loss of endogenous dhtt renders animals more vulnerable to the toxicity associated with polyQ-expanded Htt. Loss of endogenous dhtt affects the pathogenesis of HD flies. (A–J) Loss of endogenous dhtt causes enhanced brain pathology in HD flies. Confocal images of adult brains showing the distribution of neuronal cells (anti-Elav, green), cell nuclei (DAPI, white) and MBs (anti-FasII, red) in 5-day-old HD-Q93 (D–H) and HD-Q93; dhtt-ko (I–M) brains. As cells in the fly brain are mainly localized at its surface, the anterior (A,B,F,G) and posterior (D,E,I,J) halves of the brains are projected separately for better visualization of the distribution pattern. Note the enlarged regions devoid of neuronal cells in the HD-Q93; dhtt-ko brain (F) (the areas within the white lines) compared with the corresponding regions in the HD-Q93 brain (A). The arrows in (I) indicate the areas lacking neuronal cells at the posterior of the brain. (C,H) The MB in the HD-Q93; dhtt-ko brain (H) is also less well organized and shows weaker signal intensity than in the HD-Q93 brain (C). The white arrowheads highlight the clear separation, along the midline, between the two medial ß-lobes in the HD-Q93; dhtt-ko brain (C), this is less distinct and appears merged in the HD-Q93 brain (H). White arrows indicate the bulged tip of the vertical α-lobes, which become less distinct in HD-Q93; dhtt-ko brains (H). WT=w1118 wild-type control. Bars, 50 µm (all panels). (K,L) Quantification of the average MB size (K) and relative signal intensity (L) of MBs in HD-Q93 (blue) and HD-Q93; dhtt-ko mutants (red), as revealed by anti-FasII staining. In both HD-Q93 and HD-Q93; dhtt-ko flies, the FasII signals for the γ-lobe in MBs were too weak to be tracked reliably (see supplementary material Figs S7 and S8); therefore, only the α- and ß-lobes in each MB were measured. (M,N) Quantification of the average brain size (M) and the total size of the regions devoid of neuronal cells in the anterior brain (N), as revealed by using neuronal-specific staining with an antibody against Elav. The data in (K–N) are presented as the means±s.e. DISCUSSION Htt has been characterized extensively in mammalian cell culture and in mouse systems (Cattaneo et al., 2001; Harjes and Wanker, 2003). However, only a few functional studies have been performed on its homologs in other model organisms. By deleting the dhtt gene, we demonstrate that Htt is not required for normal development of Drosophila, but instead has an essential role for the long-term mobility and survival of adult animals. Subsequent analyses revealed that loss of dhtt mildly affects the integrity of adult brains. Further, in the absence of endogenous dhtt, the neurodegenerative phenotypes associated with a Drosophila model of polyQ toxicity were enhanced significantly. The role of Htt homologs in animal development Earlier studies showed that mice lacking Htt die during early embryogenesis (Duyao et al., 1995; Nasir et al., 1995; Zeitlin et al., 1995). It is surprising to find that dhtt is dispensable during Drosophila development. As no other Htt homolog exists in the fly genome (Li et al., 1999), such a mild phenotype is unlikely to be caused by a functional redundancy resulting from another Htt-like gene in Drosophila. Considering the evolutionary distance between Drosophila and mammals, such an observation might indicate that the Drosophila and mammalian Htt proteins, with their relatively restricted sequence homology, are not functionally conserved. However, this phenotypic discrepancy might also reflect intrinsic differences during mouse and fly embryogenesis. In a chimeric analysis of Htt mutant mice, Dragatsis et al. showed that the early embryonic lethality of Htt-null mice was primarily the result of a crucial role of Hdh in extraembryonic membranes (Dragatsis et al., 1998). It is probable that Drosophila does not require equivalent tissues, such as extraembryonic membranes, to support its early development and that its embryogenesis can proceed normally in the absence of dhtt. Although an Htt homolog is found in Drosophila and vertebrates, no Htt homolog has been found in yeast or C. elegans(Li et al., 1999). The absence of an Htt homolog in C. elegans suggests that Htt does not have a function that is essential for the development of invertebrates in general, which is in agreement with our observation that dhtt is dispensable for normal development of Drosophila. Interestingly, a phylogenetic comparison of Htt proteins from different species postulates that Htt in the protostome (which includes Drosophilids) might be dispensable because, when compared with the deuterostome branch, evolution of the Htt genes along the protostome branch is more heterogeneous (Tartari et al., 2008). Further studies will be required to determine whether the mammalian Htt gene can rescue Drosophila dhtt-null phenotypes. Analyses of dhtt-ko mutants suggest that loss of dhtt does not affect synapse formation, neurotransmission or axonal transport (Figs 3–6), which is in agreement with the absence of an Htt homolog in the worm, in which the essential components of these cellular processes are conserved. It is possible that Htt still regulates these cellular processes, but with a minor role. Extrapolating, there is a possibility that the function of Htt is associated with a novel cellular process and/or animal function that has been acquired during evolution. For example, although dhtt is dispensable for Drosophila development, it is important for maintaining the long-term mobility and viability of adult animals (Figs 5 and 6). Compared to the worm, Drosophila has a relatively long life span, a more complex nervous system and a more active life cycle, raising an intriguing possibility that the function of Htt might be directly related to these higher functions in adult animals. The role of Htt in adult brain Although dhtt is dispensable for normal Drosophila development, dhtt mutants show significantly reduced mobility and viability as they age, indicating an important role of dhtt in maintaining the long-term functioning and survival of adult animals (Fig. 5). Analysis of dhtt mutants revealed a mild abnormality in MB structure and a reduction in the complexity of axonal termini in the brain (Fig. 7). Similarly, Htt mutant mice with a reduced level of Htt expression display severe brain abnormalities, even though non-neuronal tissue forms normally (White et al., 1997). Targeted inactivation of Hdh in the mouse forebrain also causes a progressive neurodegeneration phenotype, suggesting that Hdh is required in the development and survival of neuronal cells (Dragatsis et al., 2000). It remains to be determined whether common underlying molecular mechanisms are responsible for the observed brain phenotypes in the fly and mouse. Nonetheless, considering the evolutionary distance between Drosophila and the mouse, the existence of structural defects in the adult brain in both species could indicate a conserved role of Htt in maintaining neuronal integrity. The axonal terminal phenotype in dhtt mutants is also reminiscent of that observed in a mouse knockout model for Htt (Hdhex5), in which exon 5 has been deleted (Nasir et al., 1995). Although mice homozygous for this Hdh deletion are early embryonic lethal, Hdhex5 heterozygotes survive to adulthood and display increased neuronal loss, motor and cognitive deficits, and a significant loss of synapses in specific regions of the brain (Nasir et al., 1995; O’Kusky et al., 1999). Synapse complexity can be modulated by many factors, including neuronal activity and membrane and cytoskeleton dynamics. In the future, many questions remain to be answered, such as the significance of this axonal terminal phenotype, the exact function of Htt in axonal termini complexity and the possibility of a causal link between this brain defect and the observed adult mobility and viability phenotypes. The role of endogenous dhtt in HD pathogenesis Extensive studies on HD and other polyQ diseases have demonstrated that an expanded polyQ tract can itself be neurotoxic (Zoghbi and Orr, 2000). Given that the distinctive neuronal loss observed in each polyQ disease is caused by otherwise unrelated disease genes that are widely expressed, it has been hypothesized that other cellular factors affect disease pathogenesis (Cattaneo et al., 2001; Zoghbi and Orr, 2000). In HD, accumulating evidence from cell culture and mutant mouse studies suggest that wild-type Htt has a neuroprotective function (Cattaneo et al., 2001). When endogenous wild-type Hdh is replaced by yeast artificial chromosomes (YACs) containing full-length human Htt with an expanded polyQ tract (YAC46, YAC72 and YAC128) (Leavitt et al., 2001; Van Raamsdonk et al., 2005), the mice develop massive cell death in the testes that can be suppressed by the wild-type Hdh gene, suggesting that the normal function of Htt might mitigate the cellular toxicity associated with the polyQ-expanded mutant Htt protein (Leavitt et al., 2001; Van Raamsdonk et al., 2005). In addition, a genome-wide study of HD animal models and postmortem tissues has shown that neuronal genes regulated by the transcriptional repressor REST/NRSF, including the gene encoding BDNF, could be similarly repressed by either the presence of the polyQ-expanded Htt protein or by depleting endogenous wild-type Htt (Zuccato et al., 2007). Together, these and other studies support the hypothesis that loss of normal Htt function affects HD pathogenesis. We were able to examine the effect of removing endogenous dhtt on the pathogenesis of an established Drosophila HD model for polyQ toxicity (HD-Q93). Our results show that the phenotypes of HD-Q93 flies, including mobility, viability and brain pathology, are significantly exacerbated in the absence of endogenous dhtt, providing in vivo evidence that loss of normal Htt function can accelerate HD pathogenesis. It should be noted that our results do not directly demonstrate that loss of the endogenous Htt protein specifically affects HD pathogenesis, because the enhanced pathology might be the result of an additive effect of two detrimental factors in the animals, namely the presence of a toxic polyQ tract and the disturbance of the normal function of Htt. Although dhtt-ko mutants exhibit only a mild age-dependent adult phenotype, it is possible that, in the absence of endogenous dhtt, some undetected cellular defects might develop that render the animals more susceptible to other cellular attacks. In the presence of a toxic polyQ tract, this vulnerability might be exposed further, leading to additive phenotypes. Our result is in agreement with the early hypothesis that HD might be caused by the combination of an acquired toxicity, conferred by the expanded polyQ tract in the mutated Htt, and an incurred neuronal vulnerability, arising from the loss of endogenous Htt function (Cattaneo et al., 2001). It is important to note that previous studies have demonstrated that expansion of polyQ tracts within the Htt protein does not abolish its endogenous function, as full-length Htt with an expanded polyQ tract can fully support the development of Htt-null mice (Leavitt et al., 2001; Van Raamsdonk et al., 2005; White et al., 1997). Recently, RNAi-mediated depletion of the polyQ-expanded Htt protein has been proposed as a therapeutic approach against HD (Farah, 2007). Our data, together with previous mammalian studies, suggest that the normal function of Htt has a conserved neuroprotective role in the brain and that its depletion could render neuronal cells more vulnerable to the toxicity associated with the polyQ tract (Auerbach et al., 2001; Cattaneo et al., 2001; Dragatsis et al., 2000; Leavitt et al., 2001). Application of such an RNAi-based strategy requires a consideration that the normal Htt function be preserved. Given the devastating consequence of fully progressed HD, together with the observations that Htt-null neurons can develop and survive in the adult mouse brain and that dhtt-ko animals are largely normal with only mild adult brain defects, our data indicate that the benefit gained from balanced administration of RNAi knockdown therapy in the adult brain might justify the relatively mild loss caused by depletion of the Htt-associated neuroprotective function. METHODS Drosophila stocks and genetics Flies were maintained at 25°C and raised on standard Drosophila medium unless specified otherwise. To establish transgenic animals, DNA constructs were injected into w1118 embryos together with the pπ25.7wc helper plasmid to generate germ line transformation; transformants were then selected in the next generation, according to standard procedures. Unless specified otherwise, flies of the genotype w1118/w1118 were used as controls in all assays because the dhtt-ko mutant allele was generated from this w1118 genetic background during the genetic crosses. The p-element line d08071, piggyBac insertional line f05417 and the FLP recombinase transgene line [genotype: y, w, pCaspeR-(hs-flipase)] were from Exelixis (Parks et al., 2004). To analyze the A307-Gal4 labeled neurons in adult brains, flies with genotypes of A307-Gal4/+; dhtt-ko/TM6C, Tb and UAS-mCD8-eGFP/+; dhtt-ko/TM6C, Tb were generated and crossed together; their adult progenies carrying A307-Gal4/UAS-mCD8-eGFP; dhtt-ko/dhtt-ko were selected and analyzed. The controls were the progenies from crossing the A307-Gal4 line with the UAS-mCD8-eGFP line. HD-Q93 flies were obtained by crossing the pan-neuronal line elav-Gal4 (C155) with the UAS-Httexon1-Q93 line (P468) provided generously by L. Thompson and J. L. Marsh, respectively (Steffan et al., 2001). To generate ‘HD-Q93; dhtt-ko’ flies, flies with genotypes of elav-Gal4(c155)/+; dhtt-ko/TM6C, Tb and UAS-Httexon1-Q93/Cyo; dhtt-ko/TM6C, Tb were established and then crossed together; adult progenies with the genotype of elav-Gal4/+; UAS-Httexon1-Q93/+; dhtt-ko/dhtt-ko were then selected and analyzed. Controls flies had the following genotypes: elav-Gal4 (C155)/+; UAS-Httexon1-Q93/+; elav-Gal4 (C155)/+; dhtt-ko/dhtt-ko; and UAS-Httexon1-Q93/+; dhtt-ko/dhtt-ko. Genetics and molecular clonings to generate the dhtt-ko mutant allele The Df(98E2) deficiency, in which both CG9990 and dhtt are deleted, was generated by following the Flp-FRT-based procedure, as described previously (Parks et al., 2004). Briefly, the p-element line d08071 (inserted at the 5' end of the neighboring CG9990 gene) was crossed to virgin flies carrying an FLP recombinase transgene [genotype: y, w, pCaspeR-(hs-flipase)]. For the next generation (F1), the male progenies carrying both the d08071 and the hs-flipase lines were selected and mated with virgins carrying the f05417 line, inserted near the 3' end of the dhtt gene. The progenies (i.e. the F1 flies) from the above crosses were heat-shocked at 37°C for 1 hour, 48 hours after egg-laying, and were then heat-shocked for a further 1 hour during each of the following 4 days. In the third generation (F2), the virgin females were collected and crossed to males carrying balancer chromosomes (w; TM3, Sb/TM6B, Tb, Hu). In the fourth generation (F3), progeny flies carrying the Df(98E2) deletion from the above crosses (between F2 flies) were selected based on their darker eye color, and crossed with flies carrying balancer chromosomes to establish individual fly lines. The presence of the deletion in these lines was confirmed by PCR analysis of extracted genomic DNA and by DNA sequencing. To generate the genomic rescue construct for the CG9990 gene, a 24.7 kb genomic DNA fragment, covering the CG9990 genomic region, was isolated from the bacterial artificial chromosome (BAC) clone BACR10P23 (CHORI) following double digestion with the restriction enzymes XbaI and XmaI. This 24.7 kb genomic DNA fragment starts at an XbaI site near the end of the neighboring CG9989 gene, 2.66 kb from the inserted site of the p-element line d08071, and ends at the XmaI site within the second exon of the dhtt gene, thus covering the whole genomic region of the CG9990 gene (Fig. 1B). To generate the CG9990 rescue transgene, this 24.7 kb genomic DNA fragment was cloned into the NotI and SmaI sites in the pCaspeR-4 transgenic vector. The DNA for the pCaspeR-4–CG9990 genomic rescue construct was injected into w1118 embryos and transformants were then selected following standard procedures. Three independent CG9990 transgenic lines were established and then tested by crossing them into the Df(98E2) deletion. Flies with the Df(98E2) deletion, which removes both CG9990 and dhtt, are homozygous lethal at the embryonic stage. Through genetic crossings, we reintroduced the CG9990 genomic rescue transgene back into the Df(98E2) deletion background, generating fly lines that are defective at the molecular level for only the dhtt gene (referred to as dhtt-ko flies). Three independent genomic CG9990 transgenic lines were tested by crossing them into the Df(98E2) deletion; all of them rescued the embryonic lethality of the Df(98E2) deletion, producing viable adults. Thus, dhtt-ko flies, which carry both the Df(98E2) and the CG9990 transgene, are homozygous viable and can fully develop into adulthood, demonstrating that the lethality observed with the Df(98E2) deletion is caused by the loss of CG9990. To verify further that the dhtt gene is indeed deleted, as expected, in this dhtt-ko allele, we extracted genomic DNA from the homozygous dhtt-ko and wild-type control adults and performed Southern blots. Genomic DNA was extracted from the dhtt-ko adults or from control w1118 adults and digested with the restriction enzyme BamHI. DNA fragments were separated on a 1% agarose gel and transferred onto a nitrocellular membrane, according to the standard protocol for Southern blotting. DNA fragments specifically targeting all exons of the dhtt gene were labeled with 32P-dCTP using the Klenow polymerase and random-hexamer primer method (Amersham) and used as probe. Hybridization was performed overnight at 65°C and the fragments were subsequently washed with SDS/SSC buffers, following standard procedures for Southern blotting. The size of the genomic region covering dhtt is about 43 kb (Li et al., 1999), which is too large to be cloned into a transgenic vector by conventional approaches. Therefore, we engineered a dhtt minigene rescue construct that expresses full-length dhtt under the control of its own endogenous regulatory region. In Drosophila, the regulatory elements controlling the endogenous expression pattern of a gene are normally located at the 5' region of the gene, within the upstream 5' untranscribed region and in the first few introns. In addition, the expression level of a gene is also affected by its 3' untranslated region. As a result, we isolated a 14.9 kb genomic DNA fragment from the BAC clone BACR10P23 (CHORI) following double digestion with the restriction enzymes XbaI and AscI. The XbaI site was located at the end of the CG9990 gene and the AscI site was within the tenth exon of the dhtt-coding region, thus the DNA fragment covered all of the 5' untranscribed region of the dhtt gene together with the first ten introns and exons of the gene. We also isolated a 1.2 kb genomic DNA fragment, by double digestion with StuI-EcoRI, that covers the 3' end of the dhtt gene, including the last exon of dhtt and the remaining polyA sites and untranscribed region at the 3' end of the gene. Next, we assembled and isolated a 6.6 kb dhtt cDNA fragment, by digestion with AscI and StuI, that covered most of the 3' cDNA region of dhtt, from the single AscI site within the tenth exon to the StuI site within the last exon. The dhtt minigene rescue construct was generated by ligating and fusing, in frame, the three fragments that cover all of the dhtt regulatory and coding regions: (1) the 5' 14.9 kb genomic DNA fragment that covered all of the 5' untranscribed region of the dhtt gene, as well as the first ten introns and exons of dhtt; (2) the middle 6.6 kb of the dhtt cDNA fragment that covered from exon 10 to exon 29; and (3) the 3' 1.2 kb genomic DNA fragment that covered exon 29 and the polyA sites at the 3' end of the gene, as well as the remaining 3' untranscribed region. The 22.7 kb dhtt minigene was cloned into the NotI and XbaI sites in the pCaspeR-4 transgenic vector. After generating transgenic flies with this dhtt minigene rescue construct, we crossed these animals into the dhtt-ko mutant background to test whether the mobility and viability phenotypes of dhtt-ko mutants could be rescued. Based on the published dhtt cDNA sequence (Li et al., 1999), we isolated overlapping fragments covering the full-length of the dhtt cDNA by PCR amplification using Pfu polymerase. We then assembled a full-length dhtt cDNA construct using these sequencing-verified dhtt fragments, cloned it into a pUASP vector, and generated transgenic fly lines by following standard protocols. Total RNA samples were isolated, using Trizol reagent (Invitrogen), from adult animals of the following genotypes: w1118 wild-type control, homozygous dhtt-ko mutants and dhtt-ko Rescue. RT-PCR reactions were performed using Superscript one-step RT-PCR with Platinum Taq (Invitrogen) following the manufacturer’s instructions. PCR primers were designed as follows: rp49 control: forward (in exon 1 of rp49) 5'-ACCATCCGCCCAGCATACAGG-3', reverse (in exon 2 of rp49) 5' -TTGGCGCGCTCGACAATCTCC-3'; dhtt-N: forward (in exon 5 of dhtt) 5' -GCCAATGTAGCCAGAGTCTG-3', reverse (in exon 6 of dhtt) 5' -CGCATTCGCTGATGCTGCGTG-3'; dhtt-M: forward (in exon 13 of dhtt) 5'-AAGCTATTCGAGCCGATGGTC-3', reverse (in exon 15 of dhtt) 5'-GCACCAGGAATCTCAGCATGG-3' ; dhtt-C: forward (in exon 23 of dhtt) 5'-TCGGGAATTGACTTTCGCAGC-3', reverse (in exon 24 of dhtt) 5'-TGCAGTTTGAGGCAGCGTTCC-3'. Using the motif-predicting program developed by M.A. Andrade at the EMBL (http://www.embl-heidelberg.de/~andrade/papers/rep/search.html), both dHtt and human Htt were analyzed for the presence of HEAT repeats. Using the default parameters at the site and including the AAA, ADB and IMB groups of HEAT repeats (Andrade et al., 2001), a total of 40 and 38 HEAT repeats were identified for Htt and dHtt, respectively. Sample preparation and staining for embryos, larval tissues and adult fly eyes have been described previously (Sullivan et al., 2000). To analyze NMJs, wandering third instar larvae were dissected in Ca2+-free saline (128 mM NaCl, 2 mM KCl, 0.1 mM CaCl2, 4 mM MgCl2, 35.5 mM sucrose, 5 mM Hepes at pH 7.2, 1 mM EGTA) and stained, as described (Sullivan et al., 2000). 1–2 µM thin sections of embedded adult eyes were cut using a microtome and imaged directly without further dye staining. To stain and image the adult brains, flies were dissected in 1× PBS or in Drosophila M3 medium, to remove cuticles and external eye tissues, then fixed in 4% formaldehyde in 1× PBS for 1 hour at room temperature (RT), washed six times with 1× PBT for 1 hour, and stained with primary and secondary antibodies. Collection and fixation of Drosophila embryos, larvae tissue dissection and fixation, as well as RNA in situ hybridization were carried out according to standard procedures (Hauptmann and Gerster, 2000). The DIG-labeled RNA probes were generated according to the manufacturer’s instructions (Roche, Indianapolis, IN). The dhtt exon 8 and exon 20 sequences were used to generate dhtt-specific in situ probes, which gave similar in situ results. The samples for in situ hybridization were analyzed with a Zeiss Axiophot 2 compound microscope. To generate the antibody against dHtt, a cDNA fragment corresponding to the N-terminal 459 amino acids of dHtt was cloned into the pGEX-4T1 vector; the corresponding GST-fusion protein was purified according to manufacturer’s instructions (Promega) and used to produce polyclonal antibody in a rabbit (Convance). The antiserum was affinity-purified and used at a final dilution of 1:200. Primary antibodies were applied at 4°C overnight at the following dilutions: rabbit anti-GFP (1:1000, Molecular Probes); rabbit anti-fasciclin II (1:400) and anti-DIG (1:40,000), both generously provided by Mary Packard and Vivian Budnik (Budnik et al., 1996; Koh et al., 1999); rabbit anti-synaptotagmin (1:500) (Littleton et al., 1993); mouse anti-α-tubulin (1:10,000, Sigma); monoclonal mouse anti-fasciclin II (1D4, 1:20), anti-DIG (1:20), anti-CSP (1:20), anti-synapsin (1:50), anti-α-spectrin (1:20), anti-Armadillo (1:100), nc-82 (1:50) and anti-Futsch (22C10, 1:100) (all from Developmental Studies Hybridoma Bank) (Hummel et al., 2000; Roos et al., 2000); goat anti-HRP (1:500, Jackson Labs); Alexa488-, Alexa594- and Alexa647-conjugated secondary antibodies (all used at 1:500, Molecular Probes). Rhodamine Red X-conjugated goat anti-HRP and CY5-conjugated secondary antibodies were used at 1:200 (both from Jackson Labs). DAPI (0.2 µg/ml, Molecular Probes) and TRITC-conjugated phalloidin (10 ng/ml, Sigma) were applied in PBS-Tween (PBST) for 30 minutes to label nuclei and F-actin, respectively. Image analysis To quantify MBs, adult brains were prepared for imaging as described above. Dissected adult brains from both wild-type controls and dhtt-ko mutants were stained side by side in a 12-well plate with mouse anti-Fas II antibody (1D4, 1:20) overnight at 4°C. After washing six times with 1× PBT for 2 hours, samples were stained with Alexa-594-conjugated secondary antibodies (1:500) at RT for 2 hours, followed by extensive washing with 1× PBST for 2 hours at RT. To ensure that the stained samples had a similar background signal during imaging analysis, brains from both wild-type controls and dhtt-ko mutants were mounted onto opposite ends of the same slide. The samples were photographed at 20× magnification, under the same optical parameters, using a Zeiss fluorescent microscope (Axioskop 2 Mot Plus). The boundaries of MBs were traced manually using AxioVision Rel 4.5 software (Zeiss), and both the overall size of each MB and the total intensity of FasII staining signals within the MB boundary were computed using the same software. FasII staining signals from outside the MB boundary were also measured and used as a reference to subtract out the background signals. Since we could not confidently measure the thickness of the MBs in these samples, only the overall area covered by each MB was measured. The average FasII signal intensity in each MB was calculated by dividing the ‘total intensity of FasII staining signals within the MB boundary’ by ‘the overall size of the MB’. To calculate the relative signal intensity, the average signal intensity of FasII signals from all wild-type brains was calculated and its mean was set as a reference point of 100. To measure the overall brain size and the regions that were devoid of neurons in the brain, adult flies were dissected and stained with rat anti-Elav antibody (1:40), which labels all neuronal cells. The boundary of the whole protocerebrum in the brain and the regions devoid of neurons were tracked manually and quantified with AxioVision Rel 4.5 software, as described above. To quantify the size of the A307-positive axonal terminals, adult brains from both wild-type controls and dhtt-ko mutants were dissected, fixed and stained side by side with an anti-GFP antibody, as described above, then mounted on the two sides of the same slide to ensure that they had a similar background for imaging analysis. Both the wild-type control and dhtt-ko mutant samples were imaged by confocal microscopy at 40× magnification under the same optical parameters. A Z-serial section of confocal images covering the entire depth of each axonal terminus was collected using a Leica TCS confocal microscope and then projected into one merged image (using the Leica software) to generate the whole picture of each axonal terminus. Since the branches and boutons in the brain were too small to distinguish clearly, only the overall area covered by each axonal terminus was measured. Fluorescent images were analyzed and captured by fluorescent microscopy (Axioskop 2 Mot Plus, Zeiss) or by confocal microscopy (Leica TCS SP2 AOBS system). Confocal images were analyzed and projected using Leica confocal software (LCS). Viability assay Drosophila viability was measured by placing 30 newly hatched female flies of each genotype into individual vials, containing fly food, at 25°C. The number of dead flies was recorded daily. Flies were transferred into a new food vial every 3–4 days to prevent them from sticking to old food. For each genotype, at least two independent cohorts of flies, raised at different times from independent crosses, were tested and the results were averaged. Mobility tests of adult flies and videos Female flies were chosen in all the mobility tests. Fly videos were captured using a Sony digital camcorder (DCR-TRV140) and edited in Apple’s iMovie program. Climbing assays were performed as described previously (Ganetzky and Flanagan, 1978; Le Bourg and Lints, 1992). Briefly, 20 flies of a specified age were knocked to the bottom of a plastic vial. The number of flies that could climb to the top of the vial after 18 seconds was counted. The test was repeated 6–7 times for each genotype at the specified age. The assay was performed as described previously (Feany and Bender, 2000; Joiner and Griffith, 1999; Wang et al., 2004). For each genotype, 6–14 flies of a specific age were tested. Flies were fed 30 mM of methyl viologen (Sigma) in instant Drosophila medium (Carolina), providing a dosage of paraquat that kills about 50% of wild-type flies after 48 hours of exposure. Control flies were fed instant medium only. Adult flies that had eclosed within the last 24 hours were kept in 30 mM paraquat or in drug-free medium for 48 hours after eclosion. The number of survivors was counted at 48 hours after the start of paraquat treatment. An electrophysiological analysis of wandering-stage third instar larvae was performed in Drosophila HL3.1 saline (NaCl, 70 mM; KCl, 5 mM; MgCl2, 4 mM; CaCl2, 0.2 mM; NaHCO3, 10 mM; trehalose, 5 mM; sucrose, 115 mM; HEPES-NaOH, 5 mM; pH 7.2) using an Axoclamp 2B amplifer (Axon Instrument) at 22°C. Recordings were performed at muscle fiber 6/7 of segments A3 to A5 under current clamp. PPF was measured by determining the peak amplitude responses (P2/P1) of two stimuli separated by the indicated latency. All error bars are s.e.m. In adults, extracellular field potentials were recorded by placing a sharp glass electrode near the longitudinal flight muscles after piercing the cuticle, with a reference electrode placed in the fly head. ERGs were performed as described previously (Rieckhof et al., 2003). Temperature shifts were performed by heating mounting clay, which encompassed the fly, to the desired temperature with a peltier heating device. For measurements of evoked EJP amplitude (Fig. 6A,B), the number of NMJs examined were: control, n=8; dhtt-ko, n=23. Voltage traces of evoked EJPs were recorded from muscle fiber 6 in third instar larvae in 0.2 mM extracellular calcium. Average resting potential: 60.2±1.2 mV in control animals and 62.4±0.8 mV in dhtt mutants. Average EJP amplitude: 19.5±1.2 mV in control animals and 17.8±1.0 mV in dhtt mutants. Measurements of voltage traces of PPF (Fig. 6C) were performed at 25 millisecond intervals in control (rescued) and dhtt mutant third instar larvae in 0.2 mM extracellular calcium. Quantification of PPF (the amplitude of EJP 2 divided by the amplitude of EJP 1) was performed in control and dhtt mutants for 25-, 50- and 75-millisecond intervals (Fig. 6D). The number of preparations analyzed were: control, n=9; dhtt-ko, n=8. The GF flight circuit can be activated by stimulation of the brain, and extracellular recordings can be made from the DLMs. Neither the control nor the dhtt-ko mutants displayed abnormal activity in the DLM flight muscles (data not shown). To measure ERGs at the elevated temperature (37°C), flies were rapidly heated from 20°C to 37°C, with test light pulses (black bar below trace in Fig. 6E) given at regular intervals. A total of ten dhtt-ko mutants, aged 1–3 days, were tested and all showed normal ERGs at 20°C and 37°C (Fig. 6E). The number of preparations for the 40–45-day-old animals analyzed was: control, n=10; dhtt-ko, n=16. Evidence of a tick RNAi pathway by comparative genomics and reverse genetics screen of targets with known loss-of-function phenotypes in Drosophila Abstract Background The Arthropods are a diverse group of organisms including Chelicerata (ticks, mites, spiders), Crustacea (crabs, shrimps), and Insecta (flies, mosquitoes, beetles, silkworm). The cattle tick, Rhipicephalus (Boophilus) microplus, is an economically significant ectoparasite of cattle affecting cattle industries world wide. With the availability of sequence reads from the first Chelicerate genome project (the Ixodes scapularis tick) and extensive R. microplus ESTs, we investigated evidence for putative RNAi proteins and studied RNA interference in tick cell cultures and adult female ticks targeting Drosophila homologues with known cell viability phenotype. Results We screened 13,643 R. microplus ESTs and I. scapularis genome reads to identify RNAi related proteins in ticks. Our analysis identified 31 RNAi proteins including a putative tick Dicer, RISC associated (Ago-2 and FMRp), RNA dependent RNA polymerase (EGO-1) and 23 homologues implicated in dsRNA uptake and processing. We selected 10 R. microplus ESTs with >80% similarity to D. melanogaster proteins associated with cell viability for RNAi functional screens in both BME26 R. microplus embryonic cells and female ticks in vivo. Only genes associated with proteasomes had an effect on cell viability in vitro. In vivo RNAi showed that 9 genes had significant effects either causing lethality or impairing egg laying. Conclusion We have identified key RNAi-related proteins in ticks and along with our loss-of-function studies support a functional RNAi pathway in R. microplus. Our preliminary studies indicate that tick RNAi pathways may differ from that of other Arthropods such as insects. Background The understanding of gene function in a poorly studied Arthropod such as the cattle tick Rhipicephalus (Boophilus) microplus (subphylum Chelicerata: order Acari: suborder Ixodida) can benefit from the knowledge generated by genome-wide resources of the model insect Drosophila melanogaster (subphylum Mandibulata: order Hexapoda: suborder Insecta). The genome of the fruit fly D. melanogaster was among the first eukaryotic genomes to be sequenced and assembled [1]. D. melanogaster and R. microplus evolved from a common ancestor ca. 500 million years ago [2]. In comparison to the existing comprehensive genome resources for the fruit fly D. melanogaster, the cattle tick genome resources are limited to approximately 45,000 EST sequences [3]. In addition, the tick genome size of 7.1 Gbp [2] compared to the D. melanogaster of 139 Mbp [4] will likely delay the generation of a complete R. microplus genome sequence [5]. A genome project for the related tick species, Ixodes scapularis, with an estimated genome size of 2.1 Gbp, is currently underway [6]. Although there are many invertebrate genomes completed including worms, nematodes, beetle, wasp, honey bee, flies, and mosquitoes http://www.genome.gov/, I. scapularis will be the first Chelicerate:Arachnida genome sequence available representing mites, ticks, scorpions and spiders. Among the many methods available for reverse genetic studies, RNA interference (RNAi) has gained popularity because of its demonstrated efficient post transcriptional gene-silencing effects in plants, fungi, nematodes, flies and cultured mammalians cells (reviewed by [7-11]). RNA mediated gene silencing is a widely conserved mechanism in eukaryotes and can be categorized into two partially overlapping pathways, the RNAi pathway and the microRNA (miRNA) pathway. The RNAi pathway is triggered by exogenous or endogenous dsRNAs that are recognized by Dicer RNase III proteins which 'dice' these molecules into double-stranded small interfering RNAs (siRNAs) of 21–23 nt in length [12]. A typical eukaryotic Dicer consists of 2 helicase domains, a PAZ domain, 2 RNAse domains and a dsRNA-binding domain (dsRBD) [12,13], however some variations in this domain structure have been noted for insect Dicers [14]. D. melanogaster has 2 Dicer enzymes, Dcr-1 and Dcr-2 which are responsible for miRNA and siRNA production respectively [15]. By contrast most other animals contain a single Dicer that generates both siRNAs and miRNAs. The next phase in the RNAi pathway involves the loading of siRNAs into RNA-induced silencing complexes (RISCs). dsRNA binding motif proteins (dsRBM), such as D. melanogaster R2D2 and Caenorhabditiselegans Rde-4 help siRNAs to be loaded properly into silencing complexes [16,17]. Using the siRNAs as a guide, RISCs find target mRNAs and cleave them. Argonaute (Ago) family proteins are the main components of silencing complexes, mediating target recognition and silencing [18-20]. Most organisms have multiple members of the Ago proteins, for example both insect species D. melanogaster and Tribolium castaneum (beetle) have 5, whereas C. elegans (nematode) has 27 [14,21-25]. In Drosophila Ago-1 and Ago-2 are known to be associated with RISC [21]. In C. elegans, the primary siRNAs processed by Dicer can also trigger the amplification of siRNAs through a RNA-dependent RNA polymerase (RdRP) to produce secondary dsRNAs in a two-step mechanism involving secondary Argonaute proteins [26-28]. This mechanism has not been demonstrated in other animals to date and is commonly found in plants rather than animals. An additional phenomenon identified in plants and C. elegans is the systemic spread of RNAi from cell to cell throughout the organism and its potential systemic transfer to subsequent generations through the germ-line [29-31]. Proteins related to this phenomenon in C. elegans include Sid-1, which encodes a multi-transmembrane domain protein thought to act as a channel for dsRNA uptake, and RNAi spreading defective proteins (Rsd-2, Rsd-3 and Rsd-6) shown to be required for the systemic RNAi response [32,33]. Originally, systemic RNAi was thought to be unique to C. elegans in animals, however preliminary evidence suggests that silkworm, honeybee, wasp and beetle utilize a Sid-1-like (sil) protein not found in mosquitoes or flies (reviewed by Tomoyasu et al [14]). Furthermore, over 20 genes identified as necessary for dsRNA uptake in Drosophila cultured cells have also been identified in other insect species [14,34-36]. The specific mechanisms associated with dsRNA uptake and systemic RNAi in Arthropods including some insect species are thus currently undefined. Of the above described proteins associated with RNAi pathways, only one RNAi tick protein has been identified to date, a putative R. microplus Ago-2 [37]. RNAi pathways in Arthropods other than fruit flies and mosquitoes are beginning to demonstrate that there are evolutionary variations in these pathways with a higher level of divergence within the Arthropoda than previously thought [14,38-41]. Similarly, long dsRNAs have been successfully applied in R. microplus [42] and other tick species (e.g. Amblyomma, Ixodes, Haemaphysalis, and Dermacentor spp.) for targeted gene knockdown to demonstrate the function of tick-specific genes in various tick life stages, with some studies producing evidence of systemic RNAi spread into subsequent stages (reviewed by de la Fuente et al [37]). With the advent of increasing Arthropod genome resources, it may be feasible to identify more putative tick homologues of essential RNAi pathway-associated proteins to better elucidate the tick RNAi mechanism. Improving the understanding of the mechanisms used by ticks for gene knockdown will assist to develop specific tick RNA interference reagents and improved techniques for gene functional studies. In this study we provide evidence for the presence of RNAi pathway associated proteins in R. microplus ESTs and I. scapularis genome reads, including a tick homologue for Dicer, Argonaute proteins, RdRP and proteins associated with dsRNA uptake and processing, and thus propose a putative tick RNAi pathway. We then determined whether targeting genes in the cattle tick which are homologous to D. melanogaster genes with known RNAi in vitro phenotypes [43] would similarly result in abnormal phenotypes. We identified 10 candidate genes and conducted in vitro and in vivo (female tick injections) RNAi loss-of-function assays. Interestingly, only proteasomal genes impaired tick cell viability in vitro, whilst 9 candidates impaired tick egg and larval development in vivo. Results Evidence of putative RNAi pathway in R. microplus A Dicer homologue was not confirmed in R. microplus however conserved domains commonly found in Dicer proteins of higher eukaryotes were identified in the R. microplus BmiGI2 EST database ([3](summarized in Additional File 1). A single R. microplus EST sequence (TC9337) was identified as containing an ORF of 250 amino acids (aa) encoding a putative RNase III (Pfam:PF00636) domain. The pairwise alignment of the ORF with the amino acid sequence of C. elegans Dcr-1 [GenBank:NP_498761] showed 24% identity and an e-value of 8e-12 (Additional File 1). A Dicer tick homologue with the expected domain structure for a eukaryotic Dicer was identified in I. scapularis in a recently assembled supercontig [GenBank: DS643033] from the Ixodes Genome Project (IGP) [6]. This supercontig represents a 350 kb region of the I. scapularis tick genome and we identified a 22.3 kb genomic region containing a single gene that has 14 exons (the annotation described here has been submitted to the IGP). The predicted I. scapularis Dicer protein is 1799aa long and has 31% similarity to the predicted Dicer-1 isoform 4 from the dog Canis lupus familiaris [GenBank:XP_868526]. Furthermore, Figure 1a shows that the predicted I. scapularis Dcr-1 homologue has the same domain composition as its counterparts in D. melanogaster and C. elegans. The identified I. scapularis Dicer homologue clusters with the Dicer protein from the bovine Bos taurus (Figure 1b). The schematic domain structure of Dicer proteins. (a) Comparison of the conserved domain structures of D. melanogaster Dicer-1 and Dicer-2, C. elegans DCR-1 and our predicted I. scapularis Dicer-1 protein. Names and IDs of the conserved domains are given as stored in the Pfam database. * = The Pfam search did not detect a signal for this domain in the sequence of the Dicer-1 protein of D. melanogaster. (b) Phylogenetic analysis of full-length Dicer proteins (Bos = B. taurus, Cele = C. elegans, Dmel = D. melanogaster, Iscap = I. scapularis, Tcas = T. castaneum). The analysis of R. microplus ESTs and I. scapularis genome reads identified putative tick homologues of D. melanogaster Argonaute-1 and 2 proteins in both species (Table 1, Figures 2 and 3). Putative tick RNAi candidate homologues Domain structure and phylogenetic tree of tick Argonaute-1 proteins. (a) Schematic structure of the Argonaute-1 proteins from D. melanogaster and our predictions of the structures of the I. scapularis and R. microplus Argonaute-1 homologues. (b) Phylogenetic analysis of Argonaute-1 proteins. (Bos = B. taurus, Cat = R. microplus, Cele = C. elegans, Dmel = D. melanogaster, Iscap = I. scapularis, Tcas = T. castaneum). Domain structure and phylogenetic tree of tick Argonaute-2 proteins. (a) The structure of the predicted Argonaute-2 proteins from I. scapularis and R. microplus in comparison to the structure of Argonaute-2 in D. melanogaster. The predicted structural property of both tick Argonaute-2 candidates is similar to the structure of the fruit fly Argonaute-2 protein. No R. microplus ORF with a Piwi domain similar to Argonaute-2 Piwi was identified. (b) Phylogenetic analysis of Argonaute-2 proteins (Bos = B. taurus, Cat = R. microplus, Cele = C. elegans, Dmel = D. melanogaster, Iscap = I. scapularis, Tcas = T. castaneum). Figure 2a summarizes the domain structure of the identified tick Ago-1 proteins. The cattle tick Argonaute-1 protein (Cat-Ago-1) is partially encoded by two ESTs (Figure 2a). TC13769 encodes an ORF of 352aa containing the DUF1785 domain located from aa 162 to 213 and a PAZ domain from aa 226 to 350. A pairwise alignment using blastp showed 43% identity with the respective domains of D. melanogaster Argonaute-1 protein. Another EST, TC6448 encodes the putative Piwi domain of the Cat-Ago-1 protein, which has 46% identity with the Piwi domain of D. melanogaster Ago-1. The Piwi domain encoded in TC6448 is located from aa 120 to 430. The I. scapularis contig ABJB010128003.1 (Iscap-Ago-1) encodes an ORF of 967aa containing all three known domains of Ago-1 proteins (Figure 2a). The PAZ domain of Iscap-Ago-1 has 41% and 47% similarity with the PAZ domains of Dmel-Ago-1 and Cat-Ago-1, respectively. Interestingly, the Piwi domain of Iscap-Ago-1 shows a higher sequence similarity, being 52% and 57% identical to its counterpart in Drosophila and cattle tick. Based on the multiple sequence alignments of the DUF1785 and PAZ of the Argonaute-1 proteins a clustering of I. scapularis and R. microplus is observed (Figure 2b). The second cluster consists of the sequences from the insect species T. castaneum and D. melanogaster. The sequences of the Argonaute-1 proteins from C. elegans and B. taurus form two separate outlying groups. Figure 3a summarizes the domain structure of the identified tick Ago-2 proteins. TC8091 represents a putative cattle tick Argonaute-2 protein (Cat-Ago-2) encoding an ORF of 269aa harboring the DUF1785 and PAZ domains located from aa 81 to 134 and aa 135 to 269 (27% identity), respectively. The R. microplus Ago-2 homologue TC984 (TC9244/TC16832, BmiGI2) identified by de la Fuente et al [37] was also confirmed in our search and appears to encode a Piwi domain. Interestingly, the pairwise alignment using blastp with both D. melanogaster Argonaute proteins showed an overall identity of 42% with Argonaute-1 and 39% with Argonaute-2. The I. scapularis contig ABJB010009424.1 (Iscap-Ago-2, Figure 3a) encodes an ORF of 896aa consisting of a DUF1785 domain from aa 304 to 357, a PAZ domain from aa 358 to 498 and a Piwi domain located in the region from aa 640 to 896. Sequence comparison of the putative Iscap-Ago-2 with the Dmel-Ago-1 ([GenBank:NP_725341.1], 27% identity) and Dmel-Ago-2 ([GenBank:NP_730054.1], 41% identity) proteins revealed a higher homology between Iscap-Ago-2 and Dmel-Ago-1, nevertheless the predicted domain structure of this putative I. scapularis Argonaute was more similar to Dmel-Ago-2 (Figure 3a). The phylogenetic tree of the Argonaute-2 proteins shows three clusters (Figure 3b). The I. scapularis and R. microplus sequences group together, the second cluster consists of the Argonaute-2 proteins from D. melanogaster and T. castaneum, and in the third group the sequences from C. elegans and B. taurus are clustered. Available R. microplus ESTs and I. scapularis genomic contigs were screened for homologous genes to C. elegans proteins involved in RNAi systemic spread (Rsd-2, Rsd-3, and Rsd-6) and dsRNA uptake (Epn-1) (Table 1). We identified putative hits to Rsd-3 and Epn-1 in both R. microplus and I. scapularis with 45% and 48% (Rsd-3) and 71% and 43% (Epn-1) identities, respectively. Screening for tick homologues against the 30 D. melanogaster proteins implicated in dsRNA uptake and processing [35] identified 14 and 16 homologues in R. microplus and I. scapularis, respectively, at varying levels of similarity for the 23 hits (22–91%). The highest similarity was observed with dsRNA uptake homologues associated with vesicle mediated transport, intracellular transport, oogenesis, endosome transport and ATPase for both tick species (Table 1). Similarity searches with the protein sequences of the putative RNA helicases Armitage and Rm62, involved in the assembly of RISC, resulted in best hits on R. microplus sequences TC9347 and TC14966 respectively but no homologues of Spindle E were found (Table 1). Searches using D. melanogaster protein sequences for FMRp and TudorSN returned best hits on the R. microplus ESTs BEAE145TR (53% identity) and BEAFW62TR (46% identity), respectively. It must be noted that only one SN domain was identified in the putative homologue which either indicates that it is not a true TudorSN homologue or that the consensus sequence is currently incomplete. However, a recent GenBank submission indicates the presence of a putative I. scapularis TudorSN identified simultaneously with this study ([GenBank:EEC18716.1] Ixodes scapularis Genome Project Consortium). The EGO-1 protein from C. elegans has RNA-directed RNA polymerase (RdRP) activity and is associated with the C. elegans transitive RNAi pathway by amplifying the trigger dsRNA and/or siRNAs [28]. The EST BEAEL55TR exhibited a 41% identity with the C. elegans RdRP – EGO-1 (Table 1). Nine putative I. scapularis RdRP accessions have been deposited into GenBank by the Ixodes scapularis Genome Project Consortium simultaneously with this study. A total of 4 of these I. scapularis RdRP sequences share conserved regions with the partial R. microplus RdRP and thus the new Accessions EEC04985.1, EEC05952.1, EEC12509.1 and EEC12909.1 were utilized for the I. scapularis RdRP sequences in the consensus tree presented in Figure 4. All 5 tick RdRPs demonstrate a close phylogenetic relationship with the partial R. microplus RdRP clustering with I. scapularis EEC12909.1. I. scapularis sequences EEC12509.1 and EEC04985.1, and EEC05952.1, form separate branches respectively. The RdRP proteins from C. elegans form another distinct cluster with the tick RdRPs branching between C. elegans and those from fungi, plants and protists (Figure 4). Apart from the Armitage homologue, all R. microplus hits associated with RISC components and RdRP above were confirmed in I. scapularis genome reads in this study (Table 1). Phylogenetic tree constructed from the multiple sequence alignment of the partial R. microplus RdRP domain (Cat-RdRP) and hypothetical I. scapularis RdRP proteins (Iscap) to RdRP sequences from selected plants (Nicotiana tabacum, Hordeum vulgare, Arabidopsis thaliana, Solanum lycopersicum), fungi (Schizosaccharomyces pombe, Neurospora crassa and Aspergillus fumigatus), protists (Tetrahymena thermophila and Dictyostelium discoideum) and the metazoan C. elegans (Cele-ego-1, Cele-rrf-1/3). The branch labels display the consensus support in %. Figure 5 shows a schematic diagram of a putative tick RNAi pathway. Putative proteins identified in R. microplus ESTs have been described using the 'Cat' (Cattle tick) prefix. dsRNA is taken up by tick cells and the RNAi effect spreads to subsequent tick stages by an unknown mechanism [42]. It is yet unconfirmed whether a SID-1 or Sil-1 homologue exist in ticks, however, it is feasible that RdRP and associated proteins are involved in germ-line spread similar to the C. elegans RdRP pathway [28]. Here we postulate the potential amplification of both trigger dsRNA and secondary siRNAs through the involvement of a Cat-RdRP. A R. microplus Dicer was not identified, although a homologue was identified in the I. scapularis genome reads as described above. A definitive dsRNA binding protein (such as D. melanogaster R2D2 or C. elegans Rde-4) potentially associated with Dicer was not found using the current tick sequence resources. A confirmed tick Rde-1 was not identified but has been associated with the RdRP pathway if present [25]. Dicer guides the siRNA to the RISC structure which has been adapted from the Sontheimer RNAi published diagram [44]. The RISC structure demonstrates homologues for a cattle tick Drosophila Fragile × protein (Cat-FMRp), tick TudorSN (Ixodes scapularis Genome Project Consortium) and the Cat-Ago-2 described above [44-46]. Other proteins putatively associated with dsRNA uptake, systemic or germ-line RNAi listed in Table 1 were not included in this diagram. Schematic representation of a putative tick RNAi pathway. Cattle tick homologues are indicated using a 'Cat' prefix for proteins where Rhicipephalus (Boophilus) microplus homologues are identified in this study (GenBank Accessions are listed in Additional File 6). The proposed activity of the Cat-RdRP (RNA dependent RNA polymerase, EGO1-like) is indicated as amplifying trigger dsRNA or cleaved siRNAs. Long dsRNAs are recruited to Dicer (putative tick Dicer identified in I. scapularis genome reads) via a yet to be identified dsRNA Binding Protein. The RNA-Induced-Silencing Complex (RISC) includes a Cat-Ago-2 (Argonaute-2 homologue), tick TudorSN (I. scapularis tudor-staphylococcal nuclease – GenBank EEC18716.1) and a Cat-FmRp (representing the D. melanogaster orthologue of the fragile-X mental-retardation protein essential to RISC). Homologues for a tick RNA unwinding protein and a vasa intronic gene (associated with RISC) were not identified. The schematic diagram was partly adapted from Sontheimer 2005 [44] and was drawn using Solid Edge Version 20 (Siemens PLM Software, TX, USA). Selection of R. microplus conserved homologues for RNAi gene silencing To validate further a putative functional RNAi pathway in R. microplus we conducted RNAi-mediated loss-of-function assays in vitro and in vivo. We first selected tick RNAi targets based on their homology to Drosophila genes known to display an RNAi phenotype [43]. Of the 438 Drosophila genes known to affect growth and viability, 40 were identified in the I. scapularis genome reads and 37 in the R. microplus BmiGI2 database with 31 hits common between the tick species (results not shown). These results were based on blastn searches with an e-value cut-off of <1e-10. To select the most conserved sequences for tick in vitro studies, using high stringency searches (>80% identity, e-value <1e-50), 11 R. microplus ESTs were identified in the BmiGI2 database as homologous to D. melanogaster genes with RNAi phenotypes affecting growth and viability at z scores >3 [43] (Table 2). An additional 2 highly conserved homologues were selected as negative controls, one with a lower z score (Drosophila string of perls) and one with a nil z score thus with a nil effect on cell culture growth and viability (Drosophila Tat-binding protein-1), Table 2. The putative function of these 13 R. microplus ESTs were then assigned by retrieving the annotated InterPro domains of their Drosophila counterparts (Table 2). Evaluation of the assigned functional information revealed that 5 sequences putatively have a role in ribosome and protein synthesis (TC5762, TC9037, TC12306, TC12372, TC12393), 4 in proteasome and ubiquitinylation (TC6372, TC9852, TC10417, TC13930), 3 in DNA binding (TC6116, TC12182, TC9417), and one in energy and metabolism (TC5823). The controls used were the Drosophila string of perls (TC5762) and Tat-binding protein-1 (TC13930) homologues respectively. R. microplus homologues with high conservation (≥ 80% identity) to 13 D. melanogaster proteins following RNAi knockdown in vitro (11 associated with significant cell viability z scores at >3 and 2 controls at<3) Although all primers for target amplification prior to RNA transcription were designed by targeting conserved consensus regions as described in the methods, amplification of TC5823 (energy and metabolism/ATP biosynthesis), TC9417 (DNA binding) and TC12372 (ribosome and protein synthesis) was inconsistent with poor yields which were inadequate for RNA transcription (not shown). dsRNAs were transcribed successfully for the remaining 10 target genes (8 high z scores, 2 controls) used for RNAi experiments in cultured R. microplus BME26 cells and adult female ticks. Gene silencing in cultured tick cells None of the dsRNA treatments had significant effects on tick cell viability (Figure 6a) compared to controls and compared to z scores >3 as described for the same targets in Drosophila cells (Table 2, [43]). However, TC6372 (Ubiquitin-63E homologue) knockdown demonstrated the most severe effect on growth and viability (inverse z score 2.1) (Figure 6), also confirmed by microscopic examination of the cells (not shown). An additional 2 target genes (Rpt1 TC10417, and Tat-binding protein-1 TC13930) demonstrated a slight reduction on cell viability with z scores 0.8 and 1.0 respectively. It is feasible that the TC13930 treatment may have had a stronger viability phenotype if the knockdown had been more effective (only 31% compared to other treatments ~>79%). Collectively the effects are less significant in tick cells than for their Drosophila counterparts in Drosophila cells, these 3 treatments are all associated with proteins involved in proteasome and ubiquitin function. Quantitative RT-PCR analysis confirmed that all RNAi targeted genes resulted in a substantial reduction of the corresponding target mRNA (79.9 – 100%) except for TC13930 at 31% (Figure 6b). Cell culture knockdown. (a). Growth and viability RNAi phenotypes expressed as inverse z-scores of genes involved in ribosome and protein synthesis (TC5762, TC9037, TC12306, TC12393), encoding proteasome components and participating in ubiquitinylation (TC6372, TC9852, TC10417, TC13930), and having DNA binding functions (TC6116, TC12182). A positive z-score indicates reduced cell growth and viability. (b). Effect of dsRNA-induced knockdown on RNAi targets measured by quantitative RT-PCR and presented as % of gene expression levels relative to the housekeeping gene. Gene silencing in adult female ticks (reproduction phenotype) The same treatments were tested in live adult female ticks to measure any in vivo effects of gene silencing on tick survivability, egg output and larval hatching. Eggs laid by ticks from the control groups showed no obvious morphological changes (see Figure 7a for eggs from "no treatment" group). The average egg mass weight was 0.118 g for the control dsRNA group, 0.134 for the tick actin dsRNA group, 0.107 g for the PBS injection control group and 0.128 g for the negative control group (nil injection). Eggs from the control groups showed a normal embryonic development time to larval hatching at 27 days. The larval hatching rates for control treatments ranged between 62.0–69.8% (Table 3). Differences in egg morphologies following treatment of R. microplus adult female ticks with dsRNA. (a) Egg from untreated females approximately 15 day after laying, (b) Eggs from females treated with TC6372 (D. melanogaster Ubiquitin 63E-like transcript) dsRNA approximately 15 days after laying. (c) Eggs from females treated with dsRNA targeting TC12306 (D. melanogaster ribosomal protein L8-like transcript) approximately 15 days after egg laying. Effect of tick in vivo dsRNA gene knockdown treatments on female tick survival and subsequent reproduction fecundity by targeting Drosophila homologues described in Table 2 Ubiquitin-63E dsRNA treatment had the most significant effect on adult tick survival (average 10 days, approximately 5 days less than the controls, Table 3). Eggs laid by R. microplus females injected with dsRNA targeting genes associated with ribosome/protein synthesis (TC5762/string of perls; TC12306/Rpl-8; TC9037/Ribosomal protein L11; TC9852/Proteasome 26S subunit; TC12393/Ribosomal protein S13) and proteasome/ubiquitin (TC6372/Ubiquitin-63E) demonstrated the most lethal effect on tick reproduction with deformed egg morphology and no larvae hatching (Table 3). RNAi targeting of TC12306 (Rpl-8) and TC6372 (Ubiquitin-63E) generated the greatest reductions in average egg output, with Ubiquitin-63E (TC6372) treated ticks again being significantly affected compared with the controls (Table 3). Figures 7b and 7c show examples of phenotypic effects on embryo development due to TC6372 (dehydrated in appearance, embryo not visible) and TC12306 (embryo smaller in size) knockdown respectively. Down-regulation of the TC13930/Tat-binding protein-1 also associated with proteasome/ubiquitin function, impaired embryo development leading to a poor larval hatching rate (0.4%). In contrast, another gene associated with proteasome/ubiquitin (TC10417/rpt 1) had no effect on egg development and hatching rates compared with controls, although this target demonstrated a decrease in cell viability in cell culture experiments above. The 2 ESTs associated with DNA binding (TC6116 and TC12182/histone H3.3A) both induced slower embryo development and reductions in egg hatching at 3.1% and 35.2% respectively. A single (sixth) tick was harvested to confirm the relative reduction of transcript levels (% knockdown determined by qRT-PCR) for both the adult ticks and eggs (where applicable) for each Drosophila homologue treatment. All treatments except TC9037 demonstrated high knockdown of transcripts in adult tick viscera (≥ 94%), with knockdown also confirmed in eggs tested (≥ 76%). As the viscera from only one tick per treatment was harvested for RT-qPCR, it is feasible that the TC9037 treatment on this single tick was not delivered successfully, however, none of the 5 ticks injected produced viable eggs indicating a knockdown phenotype for the TC9037 treatment overall. Thus 9 of the 10 Drosophila tick homologues demonstrated either a lethal effect on reproduction or a reduction in larval hatching rates when using dsRNA targeted knockdown in vivo. The significant effects associated with the Ubiquitin-63E homologue TC6372 treatments correlates with the highest z scores for the same target in Drosophila cell viability out of the sub-set of targets used here (Table 2, [43]). Data from the FlyBase website http://flybase.org/ identified that in vivo studies involving dsRNA injection into Drosophila embryos was lethal for both Rpt1 and RpS13 (correlating to TC10417 and TC12393 above). However, Rpt1 (TC10417) was the only treatment which did not have an effect in vivo for ticks. Fly in vivo knockdown studies associated with the 8 of the remaining targets were not found. Discussion The lack of tick genome sequence resources has limited the ability to mine for RNAi protein homologues however research to date has suggested that ticks utilize a dsRNA-mediated RNAi similar to that described in insects such as flies and mosquitoes [47,48]. De la Fuente and colleagues [37] postulated a model for tick dsRNA-mediated RNAi following the identification of a putative Ago-2 protein in the R. microplus EST database. Our results support the diagram represented in de la Fuente et al [37] demonstrating evidence for putative tick Dicer, RISC associated proteins and dsRNA uptake homologues, however we identified a R. microplus EGO-1 homologue known to be implicated in RNA-directed RNA polymerase activity previously not identified in animals other than C. elegans. We also identified a Cat-Ago-2 at higher similarity than the tick homologue identified by de la Fuente et al [37] which exhibited higher similarity to the Argonaute-1 protein of D. melanogaster in this study. This is the first comprehensive analysis of RNAi sequence domains for a tick species and for the Chelicerate Arthropods. Compared to the vast insect genome resources (flies, mosquitoes, beetle, silkworm, wasp – to name a few), there is currently only one Chelicerate genome available with the I. scapularis tick genome project nearing completion. To provide an evolutionary perspective to demonstrate relationships within the Ecdysozoan infraphylum it is important to note that their common ancestors may have existed over 1 billion years ago [49]. Comparative genomics between these phyla is in its infancy and pathways such as RNAi interference and gene regulation to date have been based on the fruit fly D. melanogaster as the model organism. As differences between RNAi pathway mechanisms between C. elegans and D. melanogaster are evident, it is thus feasible that Chelicerates could also vary from insects albeit their evolutionary distance is less (~500 million years) [2]. Indeed, definitive hits for the domains and proteins here were not exclusive to the Arthropoda, with the putative tick RNAi proteins matching homologues in diverse species such as insects (beetle, silkworm, wasp), nematodes, and mammals (data not shown). While only a single Dicer protein is present in mammals and in C. elegans, in D. melanogaster siRNAs and miRNAs are produced by distinct Dicer enzymes [15]. In this study we identified one putative tick Dicer in the I. scapularis genome reads however evidence for more than one Dicer cannot at this stage be confirmed. This I. scapularis Dicer was found to be most similar to a Dicer-1 from a mammal. A R. microplus Dicer could not be confirmed, however an I. scapularis homologue was also not previously identified using EST data alone. Our preliminary evidence points to a 'single' tick Dicer with yet un-confirmed structure, though until complete tick genome resources are available the presence of more than one Dicer cannot be entirely dismissed. The RISC structure contains the following essential proteins: D. melanogaster R2D2 or C. elegans Rde-4 [17], D. melanogaster homologue of the Fragile × mental retardation protein (FMRP) dFXR [45], Vasa Intronic gene (VIG) and a Tudor Staphylococcal nuclease [46,50]. We were able to confirm the presence of putative tick FMRp but no significant hits for TudorSN or VIG homologues and no significant similarity to known RNA binding proteins using the current tick resources. However, a concurrent study has identified a putative I. scapularis TudorSN (GenBank, Ixodes scapularis Genome Project Consortium) yet to be confirmed in R. microplus. The Argonaute (Ago) family of proteins contains 2 distinct RNA-binding domains PAZ and PIWI (PPD) required to bind the siRNA and to slice the cognate RNA to be degraded, respectively and thus are essential to RISC [51,52]. Our study confirmed the presence of tick Ago-1 and Ago-2 in the I. scapularis genome reads and found evidence for a complete Ago-1 protein in the R. microplus EST database. A R. microplus sequence containing a partial Ago-2 protein was also identified. The functions of these tick Argonaute proteins remains to be confirmed and further research is required to identify the full complement of tick PPD proteins. Flies and mosquitoes do not possess C. elegans Sid-1 homologues known to be responsible for systemic and germ-line RNAi. Tick RNAi observed in this study and the literature demonstrate that a systemic RNAi silencing mechanism is active in ticks [42]. Although a tick Sid-1 was not found, we did however identify a tick homologue of the C. elegans EGO-1, an RNA dependent RNA polymerase (RdRP) known to amplify trigger dsRNA (transitive RNAi) and systemic RNAi [27,53]. RdRP is otherwise absent in flies, mosquitoes and other animals. Perhaps an RdRP-based RNAi amplification mechanism within the Ecdysozoans (including ticks and C. elegans) is common, but lost in insect species? Mechanisms for cell to cell dsRNA uptake within ticks requires further investigation, as well as the confirmation of the activity of the tick RdRP and Rsd-3 homologues identified here. Further research to identify putative tick secondary Argonautes associated with the transitive RNAi pathway in C. elegans is also warranted. These mechanisms have not been studied in spiders, mites or ticks to date, thus confirmation of RNAi mechanisms within the Cheliceromorpha will assist to confirm potentially new evolutionary mechanisms previously not defined and which cannot be based on pathways observed in insect species. An additional aim of our study was to investigate whether fruit fly RNAi screens of conserved genes could be associated with similar tick phenotypes and tick gene function. We used a stringent search to enable the selection of the most similar sequences to maximize the probability of selecting a tick sequence which following dsRNA mediated knockdown could also affect growth and viability in vitro. With the exception of proteasome/ubiquitin protein homologues, the RNAi experiments with cultured R. microplus BME26 cells did not replicate the effects observed by Boutros and co-workers in D. melanogaster cells [43] for all targets. However, in vivo knockdown confirmed a lethal effect for 6 of the 10 targets, with only one demonstrating nil effects on tick reproduction. The ubiquitin-63E homologue which demonstrated the highest z score and impact on Drosophila cell viability exhibited the strongest effects on viability in our tick study both in vitro and in vivo. However the effects on tick cell growth and viability from the remaining 9 (including 2 negative controls) dsRNA targets tested did not correlate well with Drosophila demonstrating poor statistical significance at least under our in vitro conditions. Kurtti et al [54] found that cationic lipid-based reagents greatly improved the transfection of I. scapularis cultured tick cells as well as subsequent silencing of transgenes by dsRNAi. Perhaps uptake of nucleic acids by cultured tick cells is less efficient than with Drosophila cells. In addition, although tick genome resources are currently incomplete, we did not identify tick homologues of Scavenger receptors (Eater and SrCI) known to be required for dsRNA uptake in Drosophila cell culture [36]. This suggests the recruitment of different receptors for dsRNA uptake in tick cells compared to those described in Drosophila. It is also possible that the cell line types utilized in D. melanogaster and R. microplus are not directly comparable and functionally different. The 438 genes targeted by Boutros and co-workers [43] compared the effects using 2 embryonic cell lines, Kc167 which is an 'early' embryonic cell line and S2R+ which is a 'late' embryonic cell line – potentially more comparable to BME26 cells which are 'late' embryonic in origin [55-57]. However, the BME26 average cell size is smaller at 15–20 µM compared to SR2+ cells at 50 µM [56,58]. The BME26 cell line also has a doubling time of 7 days, considerably slower than the Drosophila cell lines. Our in vivo studies were more convincing demonstrating lethal and inhibitory effects on tick reproduction for 9 (including the 2 controls) of the 10 targets. In vivo studies in C. elegans showed that 47% of the C. elegans orthologues of the 438 genes associated with the Drosophila RNAi phenotypes exhibited developmental phenotypes [43,59]. Perhaps the fact that we chose highly conserved homologues increased the probability of success in our in vivo experiments compared with C. elegans. However, it is clear that tick in vitro RNAi analysis using BME26 cannot be directly correlated to available Drosophila in vitro data, unless perhaps only target genes with higher z scores (as demonstrated here for ubiquitin-63E) can be studied to increase the probability of a phenotype correlation. The challenges encountered during the initial PCR amplification of tick template DNA (results not shown) prompted re-design of conserved primer sets for targets amplified and transcribed in this study. Other tick dsRNA studies have used cDNA clones [42] as templates for amplification and subsequent transcription verifying that perhaps the tick genomic DNA templates are not amenable for high throughput gene amplification required for RNAi functional screens. The tick genome is large (7.1 Gbp) with a high ratio of repetitive and exonic sequences [2] also confirmed here with the I. scapularis putative Dicer genomic sequence structure with 14 exons. The presence of complex intronic/exonic structure can inhibit satisfactory PCR amplification of gDNA possibly due to poor primer binding. This was mostly overcome in this study by improving primers by targeting conserved ORFs across several arthropod species, however, amplification was not always consistent (not shown). It may be feasible to develop short interfering RNA treatments which would be simpler to prepare than long dsRNA treatments for difficult templates such as the tick, to date siRNAs have not been applied in R. microplus loss-off-function assays. Long dsRNA gene silencing can also lead to off target effects and false positive RNAi phenotypes [60,61]. Until complete annotated tick genome resources are available, false positive knockdown resulting from long dsRNA treatments and the specificity of (siRNAs) tick RNAi reagents cannot be confirmed. Conclusion We utilized the existing R. microplus BmiGI2 database (13,643 ESTs) and the I. scapularis genome reads to identify 31 putative tick RNAi proteins which confirmed the presence of a putative Dicer, RISC associated, dsRNA uptake and RdRP proteins in ticks and constructed a putative tick RNAi pathway. Apart for proteasome/ubiquitinylation homologues, it was not feasible to replicate D. melanogaster embryonic cell culture RNAi functional data in R. microplus BME26 embryonic cells. This could either be attributed to transfection/uptake issues and/or a difference in cell types in the fly and tick embryonic cell lines. We did demonstrate a correlating in vivo effect on embryogenesis for 9 of the 10 D. melanogaster tick homologues. The findings in this manuscript support the fact that perhaps the Chelicerates may not be amenable to modeling based on insect pathways (Subphylum Mandibulata) as perhaps expected for Arthropods. With the evidence of a tick RdRP and the propensity for systemic or germ-line RNAi, it will be better to compare gene function and RNAi pathways between members of the Arachnida and the Superphylum Nemathelminthes (C. elegans). Until more tick and related genomes (mites and spiders) are available, such comparative studies within these Subphyla are not feasible. Clearly the RNAi pathways warrant further elucidation, and tick specific genome and functional data will be beneficial for tick research and for the development of improved tick control measures. Methods Sources of input sequence data 13,643 ESTs (9,403 Tentative Consensus/TC and 4,240 singleton) sequences for R. microplus were obtained from the Boophilus microplus Gene Index (BmiGI) at http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=b_microplus (last accessed: 17/6/2008)[3]. I. scapularis (black legged tick) genome project (IGP) data was accessed through http://www.ncbi.nlm.nih.gov/sites/entrez? Db=genomeprj&cmd=ShowDetailView&TermToSearch=16233 (last accessed: 13/6/2008); and 38,276 I. scapularis EST sequences were obtained from the Ixodes scapularis Gene Index (ISGI) at http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=i_scapularis (last accessed: 17/6/2008). All other nucleotide and amino acid sequences were obtained from the Entrez nucleotide and protein databases http://www.ncbi.nlm.nih.gov/sites/entrez. Identification of conserved genes in R. microplus Key RNAi pathway-associated proteins from D. melanogaster and C. elegans described previously [28,32,35,36,45,46,50,73,74] were screened against the available tick ESTs (BmiGI2) [3] and I. scapularis genome contig reads obtained from the NCBI whole genome project database (project ID 16233) using BLAST [62]. For the BLAST searches an initial e-value of <1e-05 was set as a threshold. The best hits from the R. microplus and I. scapularis sequences where then used as query sequences in a second round of BLAST searches against the D. melanogaster and C. elegans subsets of the NCBI non-redundant protein database. Results of this reciprocal BLAST search validated the sequence similarity between the key RNAi pathway-associated proteins from D. melanogaster and C. elegans and the two ixodid tick species, sequences which did not return the corresponding RNAi protein were subsequently disregarded. Further confirmation was obtained by performing searches against the InterPro database using InterProScan (data not shown) [63]. All searches were performed with the BLAST default settings. Specific approaches for Dicer, Argonautes and RdRP homologues are described below. Multiple sequence alignments for the domains typical for proteins of the Dicer family were retrieved from the Pfam website. Specifically these were the alignments for the Helicase conserved C terminal domain (Pfam:PF00271), double-stranded RNA binding domain (Pfam:PF03368), PAZ domain (Pfam:PF02170), RNase3 domain (Pfam:PF00636), and the double-stranded RNA binding motif (Pfam:PF00035). Hidden Markov Models (HMMs) were constructed locally using hmmbuild of the HMMER2 package [64] with default settings. The programs estwisedb and genewisedb of the Wise2 package [65] were used to perform searches with each HMM as a query in a local copy of the BmiGI2 database and the I. scapularis sequences obtained from NCBI whole genome sequencing projects. The best hit sequences from these searches were retrieved from the respective databases and a conceptual translation of encoded open reading frames (ORFs) was performed using the program getorf, part of the EMBOSS package of computational biology tools. The ORFs were then used as the query sequence for a blastp search against the NCBI Reference Sequence protein database to verify the validity of the initial search results. Further confirmation of the search results was achieved by screening the ORFs against the Pfam database using the global search model. For the prediction of gene models and the identification of the exon/intron structure, the program genewise from the Wise2 package was used to map the detected ORFs to the genomic sequences. I. scapularis expressed sequence tags from the I. scapularis ISGI2 database were used in blastn searches to verify the validity of the predicted exon/intron structure by genewise. The sequences of the ORFs were also screened against a local copy of the Pfam database using the program hmmpfam of the HMMER2 package to reveal the sequence structure of the conserved domains. Multiple sequence alignments of amino acid sequences stored in the Pfam database were obtained for following protein domains and domain families: Domain of unknown function (DUF)1785 (Pfam:PF08699), PAZ domain (Pfam:PF02170) and Piwi (Pfam:PF02171). HMMs were built from the multiple sequence alignments using the program hmmbuild with default settings. The BmiGI2 database and I. scapularis sequences were searched with the programs estwisedb and genewisedb using the HMMs as query sequences. The program getorf was used to conceptually translate the ORFs of the best hits. The validity of the initial search results was verified by blastp searches against the NCBI Reference Sequence protein database. Further confirmation of the search results was achieved by screening the ORFs against the Pfam database using the global search model. A comparison between known Argonaute proteins from D. melanogaster and C. elegans was performed using the program bl2seq, which uses the BLAST algorithm for a pairwise comparison. Multiple sequence alignments of the protein sequences and ORFs were performed using Clustalw [66] with default program settings. In addition to the Dicer sequences illustrated in Figure 1a, following protein sequences were included in the construction of the phylogenetic tree (Figure 1b): Bos taurus Dicer-1 [GenBank:NP_976235.1] and T. castaneum Dicer-2 [GenBank:NP_001107840.1]. The phylogenetic trees (Figures 2b and 3b) for Argonaute-1 and Argonaute-2 were constructed using additional sequences from C. elegans [GenBank:NP_510322.2] and [GenBank:NP_871992.1], T. castaneum [GenBank:XP_971295.2] and [GenBank: NP_001107842.1] and B. taurus [GenBank:NP_991363.1] and [GenBank:AAS21301.1]. For both Argonaute-1 and 2 proteins, the phylogenetic trees were based on the alignments of the DUF1785 and PAZ domains. The phylogenetic tree (Figure 4) for the partial R. microplus RdRP protein (Cat-RdRP) was constructed using additional RdRP domain (Pfam:05183) sequences from the metazoans: C. elegans (Ego-1 [GenBank:NP_492132.1], rrf-1 [GenBank:NP_492131.1] and rrf-3 [GenBank:NP_495713.1]) and I. scapularis ([GenBank:EEC04985.1], [GenBank:EEC05952.1], [GenBank:EEC12509.1], [GenBank:EEC12909.1]); plants: Arabidopsis thaliana [GenBank: NP_172932], Hordeum vulgare [GenBank:ACH53360.1], Nicotiana tabacum [GenBank:CAR47810.1], and Solanum lycopersicum [GenBank:ABI34311.1]; fungi: Aspergillus fumigatus [GenBank:EDP48577.1], Neurospora crassa [GenBank:XP_964248.2] and Schizosaccharomyces pombe [GenBank:NP_593295.1]; and protists: Dictyostelium discoideum [GenBank:XP_636093.1] and Tetrahymena thermophila [GenBank:XP_001026321.1]. The conserved RdRP domains were extracted from these sequences and then used for the alignment to the partial R. microplus RdRP domain. The multiple sequence alignments for all 3 studied proteins were visually inspected and the phylogenetic trees were constructed using Geneious 3.8.5 http://www.geneious.com, last accessed on 12/08/08). Pairwise distances were calculated based on the BLOSUM62 matrix and the respective trees were constructed using Neighbor-Joining. No outgroups were selected and the consensus trees were built using bootstrapping with 5,000 samples. Identification of R. microplus homologues for known Drosophila RNAi viability phenotypes Raw experimental results of the genome wide RNAi screen of D. melanogaster are publicly available at the website http://www.flyrnai.org[67]. The gene ontology data for the identified RNAi targets were retrieved from http://www.flybase.org[4]. Genes of interest were selected based upon a phenotypic z score > 3 [43]. For these genes, corresponding translations were retrieved from FlyBase and used for the subsequent amino acid similarity searches. The D. melanogaster cDNA sequences from the selected RNAi targets were used to screen to search the 13,643 of the R. microplus ESTs and TCs for highly conserved genes using blastn [62]. Sequences with a similarity of at least 80% and an e-value less than e1-50 were selected, conceptually translated and their putative function was further analyzed by assigning GO terms using InterProScan [63,68,69]. Additional Files 2 and 3 describe this selection process and the GO terms utilized respectively. Tick cell culture and sources of ticks for dsRNA treatment studies BME26 was derived in 1985 from R. microplus embryonated eggs in the USA [57] and supplied by Dr. Munderloh (Department of Entomology, University of Minnesota, St. Paul, Minnesota 55108) to the Queensland Department of Primary Industries & Fisheries in Australia. Cell culture protocols to maintain and passage the cell line (obtained at passage 55) have been previously described [70]. N strain adult female ticks were obtained from the DPI&F Animal Research Institute tick cell colony [71]. DNA and RNA extraction methods DNA from BME26 cells was prepared using the QIAamp DNA mini kit (QIAGEN Sciences, MD, USA) – protocol for cultured cells as described by the manufacturer. RNA for qRT-PCR analysis prepared from BME26 cells, adult tick viscera, and tick eggs was extracted using TRIzol® reagent (Invitrogen, CA, USA) following the manufacturer's instructions. For RNA extractions from larvae, the larvae were first ground in liquid nitrogen using a mortar and pestle prior to TRIzol® reagent extraction following the manufacturer's instruction (Invitrogen, CA, USA). dsRNA synthesis methods Sequences from Anopheles gambiae, D. melanogaster, I. scapularis, and R. microplus (Additional File 4) were aligned using AlignX (Invitrogen Vector NTI, CA, USA) to identify conserved regions for primer design. Primers were subsequently designed using Invitrogen Vector NTI to amplify the corresponding conserved region in R. microplus (Additional File 5). T7 promoter sequences were added to the 5'-ends of the primers to allow for subsequent RNA transcription as described in the manufacturer's instructions (Ambion MEGAScript RNAi kit, Applied Biosystems, CA, USA). PCR products were amplified from 20 ng DNA prepared from BME26 cells as template using 10 pM each primer, 10 pmol dNTPs, HotStart Taq Plus enzyme and the buffer provided by the manufacturer (QIAGEN Sciences, MD, USA) in a 20 µl reaction volume. The optimal annealing temperature for each assay was determined using gradient PCR and a temperature gradient of 55°C to 70°C in twelve discrete steps in a G-storm GS-1 thermocycler (Geneworks Technologies Pty Ltd, SA, Australia). The PCR thermal profile was as follows: 95°C 2 min, followed by 35 cycles at 95°C 10s, annealing temp 30s, 72°C 1 min (annealing temperatures for each primer pair described in Additional File 5), and a final extension at 72°C 7 min. The size of the PCR products (Additional File 5) were confirmed by gel electrophoresis using 1.5% Agarose in TAE Buffer (Tris acetate 40 mM, EDTA 2 mM, pH 8.5) after 45 minutes at 90 V. The PCR products were purified using the QIAquick kit (QIAGEN Sciences, MD, USA) following the manufacturer's protocol. Long dsRNA were synthesized from the purified PCR products (5 pooled 20 µl reactions per gene target) using the MEGAScript RNAi kit as described by the manufacturer (Ambion, Applied Biosystems, CA, USA). Purified dsRNAs were stored in elution buffer at -70°C until further use. Actin (TC12168) and the dsRNA control supplied by the manufacturer (MEGAScript, Ambion, Applied Biosystems, CA, USA) were prepared as tick specific and non-specific dsRNA treatments, respectively. Transfection of BME26 tick cells In vitro transfection methods for the dsRNA treatment of tick cells were modified from D. melanogaster methods originally described by Boutros et al [43]. BME26 cells at passage 57 were grown in 96-well plates freshly seeded with 48,000 cells/40 µl per well. Cells were transfected with 800 ng dsRNA and incubated at 31°C for 60 mins prior to the addition of complete medium (final total well volume of 120 µl). Treatments were incubated for 4 days at 31°C and each well was supplemented with 80 µl complete medium at Day 2. Each treatment contained 6 replicates to provide 3 replicates for viability assay and 3 for qRT-PCR. On Day 4 (96 hrs post treatment), 3 wells were subjected to cell viability testing using the Cell Glow kit as per manufacturer's instructions (Promega Corporation, WI, USA) and 3 wells were subject to RNA extraction for qRT-PCR screening. Controls included nil treatment (media only) and the dsRNA control from the Ambion MEGAScript RNAi kit (non-specific dsRNA treatment). Impairment of growth and viability relative to the nil treatment control was statistically determined by calculating inverse z-scores for every treatment [43]. Injection of R. microplus ticks with dsRNA, monitoring and statistical analysis of mortality and egg output Female adult ticks fed to repletion were collected within 24 hrs from dropping from the bovine host for dsRNA injection. Six ticks per treatment (10 Drosophila homologues, no injection control, PBS injection control, tick actin dsRNA and the MEGAScript dsRNA control) were injected with 1–2 × 1012 dsRNA molecules using a micro-injector (World Precision Instruments Inc., Florida, USA) as described previously by Nijhof and colleagues [42], except ticks were first pierced using a 30 G needle rather than 27 G. Five out of the 6 ticks per treatment were monitored daily for effects on mortality, egg output and larval hatching rates until all ticks had died [42]. Statistical analyses were conducted using GenStat 10 (VSN International). The following variables were subjected to analysis of variance assessing the effect of replicates and treatments: 1. total wt of eggs produced; 2. days ticks survived post injection; 3. days from laying to larval hatch; and 4. percent larvae hatched. A protected least significant difference (LSD) procedure was used to compare treatment means using a significance level of 0.05. RNA was extracted from the viscera and from the eggs collected from the 6th replicate tick per treatment for qRT-PCR analysis on days 6 and 14 respectively (see below). Quantitative RT-PCR gene expression analysis Primer sequences, PCR product and annealing temperatures for all targets are described in Additional File 5. cDNA was synthesized using a cDNA synthesis kit (Bioline International, London, UK), and triplicate qPCRs (50 ng per reaction) of BME26 cells was undertaken using SensiMixPlus SYBR kit (Quantace Ltd, Watford, UK) in the Corbett RotorGene 3000 (Corbett, Sydney, Australia) using the following profile: 95°C 10 mins; 40 cycles of 95°C 15 s, 55°C 30 s, 72°C 30 sec, followed by a melt analysis 72–90°C 30 s on the first step, 5 s holds for subsequent steps, according to manufacturer's instructions for SYBR green detection. All the results corresponded to relative quantification using R. microplus actin (Additional File 5) as an internal control gene using the 2-ΔΔCt method [72]. Viscera from the 6th replicate tick of each Drosophila homologue group were homogenized in TRIzol® to extract total RNA. The semi quantitative analysis of the samples was undertaken using the QuantiTect SYBR green RT-PCR Kit® (QIAGEN, Australia) as recommended by the manufacturer. The expression profiles were normalised against R. microplus actin as above. Reactions contained 125 ng of total RNA, 12.5 µl of 2× QuantiTect SYBR Green RT-PCR Master mix, 10 pmol of each primer, 0.25 µl QuantiTect RT Mix, the final reaction volume was 25 µl. RT-PCR reaction were conducted on Rotor-Gene 3000 under the following conditions: reverse transcription 50°C for 30 min, PCR initial activation at 95°C for 15 min, followed by 40 cycles at 94°C, 15 s, 55°C, 30 s and 72°C 30 s. Calculation of percent gene expression and knockdown (average of 3 triplicate reactions) was determined by comparative CT method for relative quantification as described above. R. microplus EST Accessions GenBank Accessions describing the R. microplus ESTs identified in this study have been appended as Additional File 6. Abbreviations dsRNA: double-stranded RNA; cDNA: complementary DNA; EST: expressed sequence tags; GO: Gene Ontology; PCR: polymerase chain reaction; RISC: RNA-induced silencing complex; RNAi: RNA interference; siRNA: small interfering RNA; Positional Cues in the Drosophila Nerve Cord: Semaphorins Pattern the Dorso-Ventral Axis Abstract During the development of neural circuitry, neurons of different kinds establish specific synaptic connections by selecting appropriate targets from large numbers of alternatives. The range of alternative targets is reduced by well organised patterns of growth, termination, and branching that deliver the terminals of appropriate pre- and postsynaptic partners to restricted volumes of the developing nervous system. We use the axons of embryonic Drosophila sensory neurons as a model system in which to study the way in which growing neurons are guided to terminate in specific volumes of the developing nervous system. The mediolateral positions of sensory arbors are controlled by the response of Robo receptors to a Slit gradient. Here we make a genetic analysis of factors regulating position in the dorso-ventral axis. We find that dorso-ventral layers of neuropile contain different levels and combinations of Semaphorins. We demonstrate the existence of a central to dorsal and central to ventral gradient of Sema 2a, perpendicular to the Slit gradient. We show that a combination of Plexin A (Plex A) and Plexin B (Plex B) receptors specifies the ventral projection of sensory neurons by responding to high concentrations of Semaphorin 1a (Sema 1a) and Semaphorin 2a (Sema 2a). Together our findings support the idea that axons are delivered to particular regions of the neuropile by their responses to systems of positional cues in each dimension. Author Summary Axons and dendrites of synaptic partners must be targeted to a common region of the developing neural network so that appropriate connections can be formed. The mechanisms underlying this targeting are incompletely understood. We showed previously that a positional cue (Slit) acting in the medio-lateral axis of the Drosophila nerve cord controls the position of sensory terminals independently of their synaptic partners. This work revealed that there might be additional cues operating in a similar fashion in the dorso-ventral axis of the nerve cord. Here we report the discovery of a dorso-ventral system of positional cues, in the form of a gradient of secreted Semaphorin 2a acting at right angles to the Slit gradient, and membrane bound Semaphorin 1a differentially distributed across the neuropile. The two Semaphorins dictate the termination positions of sensory axons in the dorso-ventral axis. Together with a third signal acting in the antero-posterior axis, Semaphorins and Slit deliver axons to appropriate volumes of the neural network. These studies support a model in which axons branch and terminate, independently of synaptic partners, in response to pervasive systems of volumetric positional cues. Introduction During the development of neural circuitry, neurons of different kinds must establish specific synaptic connections by selecting appropriate targets from large numbers of different alternatives. The range of these alternative targets is reduced by well organised patterns of growth, termination, and branching that deliver the terminals of appropriate pre- and postsynaptic partners to restricted regions of the developing nervous system. The mechanisms that control the coordinate projection of pre- and postsynaptic neurites to a common region are incompletely understood. Although there has been substantial progress in identifying molecular mechanisms of axon growth and guidance, far less is known about the way in which appropriate target areas are identified, leading to termination and branching [1]–[4]. The extent to which these processes depend on target specific signals as opposed to pervasive guidance cues, to which many different neurons can respond, is far from clear. We have used the axons of embryonic Drosophila sensory neurons as a model system in which to study the way in which growing neurons are guided to terminate in a specific region of the developing nervous system. These neurons have their cell bodies in the periphery of the embryo, either close to or embedded in the body wall. Their axons grow into a central ganglion where they terminate in a neuropile that consists of a dense meshwork of interweaving axons and dendrites. Anatomically the neuropile shows few overt signs of organisation apart from clear regularities such as the commissures that cross the midline and a set of longitudinal axon bundles at stereotyped positions that provide a series of landmarks with respect to which other structures can be mapped [5]. Functionally however the neuropile is an obviously well-organised structure, with, for example, motor neuron dendrites and the endings of sensory neurons terminating and branching in distinct and characteristic domains. Thus it is clear that there must be cues operating in the neuropile that deliver terminals to these specific destinations within the forming network. In the case of the sensory neurons, it is clear that specific types of neurons serving particular modalities terminate in well ordered and characteristically different parts of the neuropile. These termination zones together with the overall structure of the neuropile are shown in diagrammatic form in Figure 1. Because the sensory neurons provide us with an accessible set of cells whose terminals grow to different parts of the forming neuropile, we can readily use these neurons to investigate the guidance mechanisms that operate to determine these distinctive patterns of growth and termination. Different neuron classes project to different medio-lateral domains and to different dorso-ventral layers of neuropile. We previously showed that Slit secreted at the midline and acting through its Robo receptors constitutes a repellent gradient to which sensory neurons respond by terminating and branching at specific positions in the medio-lateral axis of the neuropile [6]. Expression of a particular Robo receptor by a sensory axon is necessary and sufficient to determine the distance from the midline at which that axon will terminate. Thus, in the medio-lateral axis at least, the position at which an axon terminates within the forming neuropile is determined not by some putative signals from its postsynaptic target, but by the presynaptic neuron's response to a pervasive cue secreted from the midline. However, the neuropile is a 3-D structure and there must therefore be additional cues that determine the dorso-ventral and antero-posterior termination domains for each axonal and dendritic arbor. Our previous study provided evidence for at least one further signal that operates to determine positions in the dorso-ventral axis. Sensory terminals that are shifted experimentally along the medio-lateral axis of the neuropile maintain their characteristic dorso-ventral location in their new position, suggesting that the factor that determines this position may be a “dorso-ventral” patterning cue that is present at different positions in the medio-lateral axis. This additional finding led us to propose a general model for the cues that delineate domains within a neuropile in which presynaptic axons and their postsynaptic partners terminate and form connections [6]. In this model, termination sites depend on the response of axons to a system of positional cues that dictate the behaviour and final location of many, perhaps all terminals within a developing network of pre- and postsynaptic neurons. Specific locations are given not by the target, but by the set of receptors for these positional cues that each neuron expresses. Here, we test and augment this model by using a genetic screen to identify cues and their receptors that guide terminating axons in the dorso-ventral axis of the neuropile. We find that dorso-ventral layers of neuropile contain different levels and combinations of semaphorins. We demonstrate the existence of a central to dorsal and central to ventral gradient of Sema 2a, perpendicular to the Slit gradient. We show that a combination of Plexin A (Plex A) and Plexin B (Plex B) receptors specifies the ventral projection of sensory neurons by responding to high concentrations of Semaphorin 1a (Sema 1a) and Semaphorin 2a (Sema 2a). These signals together with the Slit/Robo system acting in the medio-lateral axis limit the arborisations of sensory axons to specific termination domains within the neuropile. Since these are the domains within which specific functional sets of connections will be formed, the terminating sensory axons, by responding to pervasive positional cues, are able to lay out part of the characteristic functional architecture of the forming network. Results The Drosophila Embryonic Neuropile Can Be Divided into Four Dorso-Ventral Layers, Which Are Likely to Have Distinct Functions in Information Processing Previous studies have shown that the axons of sensory neurons project to distinct medio-lateral, dorso-ventral, and antero-posterior domains in the neuropile in correlation with their modality and dendritic morphology [7]–[9]. We have extended these studies using Fasciclin II (Fas II) positive tracts as reference points (Figure 1A) [6],[10]. We divide the neuropile into three medio-lateral domains and four dorso-ventral layers (Figure 1C). With the exception of the chordotonal (ch) neurons, sensory axons terminate in the medial domain of the neuropile. Ch axons terminate and branch in the intermediate domain. There is very little sensory input to the dorsal-most layer (layer 1) where motor neurons establish their dendritic arbors. The proprioceptive dorsal bipolar dendritic (dbd) and class I md (multidendritic) neurons terminate in the upper central layer (layer 2) [6],[9],[11]. The ch neurons terminate in the lower central layer (layer 3) [6], whereas nociceptive class IV md neurons terminate in the ventral-most layer (layer 4). Class IV md neurons can be identified with ppkEGFP, which labels one intersegmental nerve (ISN) and two segmental nerve (SN) neurons in each hemisegment [9],[12]. The position of termination in the neuropile does not correlate with the nerve route by which the sensory neurons reach the neuropile (see Figure S1). Sensory axons whose cell bodies are located ventrally in the body wall travel in the SN, whereas axons whose cell bodies are located dorsally or laterally in the body wall travel in the ISN [13]. Sensory axons running in the SN and ISN terminate in layers 2, 3, or 4, in correlation with their modality and dendritic morphology [9],[11]. Since each of the three modality-specific sensory termination domains contains some neurons that have travelled through the SN, and others that have travelled through the ISN, differences in axon routing to the neuropile cannot account for differences in termination within the neuropile. Expressing plex B or plex A in Sensory Neurons Shifts Their Terminals away from Central and Dorsal Layers of the Ventral Nerve Cord To investigate mechanisms that confine sensory projections of different modalities to different dorso-ventral layers of the neuropile, we carried out a gain-of-function screen for trans-membrane proteins, which, when expressed selectively in sensory neurons, shift sensory terminals with respect to Fas II tracts. We used PO163GAL4, UAS-n-synaptobrevin-GFP flies to target gene expression selectively to sensory neurons and simultaneously to visualise their terminals (Figure 2A) [14]. As a test of our method, we confirmed that expressing the Robo 3 receptor for Slit in sensory neurons shifts their terminals away from the medial domain of neuropile (Figure 2B) [6]. Effects of altering levels of receptors for Slit, Sema 2a, or Sema 1a in sensory neurons and the distribution of cues in neuropile. We screened 418 lines with UAS inserts in front of trans-membrane protein coding genes (see Materials and Methods and Table S1 for detailed results of the screen) by systematically expressing them in sensory neurons and analysing the pattern of sensory terminals in abdominal segments (A1–A7) at 21-h after egg laying (AEL). We identified 11 genes (2.6%) that change the pattern of sensory terminals, without altering the number of neurons or preventing sensory axons from reaching the central nervous system (CNS) (Table S1). Of the 11 genes expressed, two produced obvious shifts along the dorso-ventral axis. Both belong to the same family: plex B and A. If plex B is expressed in all sensory neurons, sensory terminals are excluded from layer 2 (Figure 2C). If plex A is expressed, terminals are excluded from the intermediate regions of layer 3 and from layer 1 (Figure 2D). We also co-expressed Robo3 and Plex B in sensory neurons and found that this produces a “combination” of Robo 3 and Plex B expression phenotypes. In these embryos sensory terminals are now mostly confined to the lateral-most portion of layers 3 and 4 (Figure 2E). Sema 2a and Sema 1a Are Expressed in Central and Dorsal Layers of the Ventral Nerve Cord The Plexins are receptors for the Semaphorins (Semas), a diverse family of secreted and membrane-associated proteins [15]–[19]. In Drosophila there are two Plexins (A and B) and five Semas: 1a, 1b, and 5c (transmembrane) and 2a and 2b (secreted). Plex B binds Sema 2a and mediates the Sema 2a-dependent repulsion of motor and sensory axons in the periphery and the fasciculation of longitudinal tracts in the ventral nerve cord (VNC) [20],[21]. Plex A binds strongly to Sema 1a and Sema 1b and mediates the Sema-dependent repulsion of embryonic motor axons in the periphery and the repulsion of adult olfactory receptor axons by Sema 1a in the antennal lobes [22]–[24]. The Plexin overexpression phenotypes suggested that their Sema ligands might act as cues to position the terminals of neurons along the dorso-ventral axis of the forming neuropile. We therefore used antibody labelling to analyse the expression of Semas 2a and 1a in the CNS at different stages of embryogenesis: prior to sensory axon ingrowth (11-h AEL), at stages when sensory axons form their terminal arbors (13-h AEL), and several hours after sensory axons have completed their terminal arbors (21-h AEL). Sema 2a expression first becomes detectable at 11 h as the outgrowth of sensory axons begins, persists strongly until 16 h, but has disappeared by 21 h, when the embryo is mature and ready to hatch. At 13 h, when sensory axons are forming their terminal arbors, the highest levels of Sema 2a are in layer 2 in the centre of the neuropile (Figure 2F and 2H). Strikingly, the protein forms gradients of expression in the neuropile that extend dorsally and ventrally from layer 2 (Figure 2L), at right angles to the mediolateral gradient of Slit (Figure 2F, 2G, 2J, and 2K). There is no detectable expression in layer 4. Our experiments show that the effect of overexpressing Plex B in sensory neurons is to shift their terminals away from regions with high Sema 2a levels. This effect is still detectable at 21 h when there is no Sema 2a expression and we conclude that misplaced terminals do not compensate by delayed growth into central neuropile (Figure 2C). Sema 1a expression is present at 10-h AEL, before sensory axons have entered the neuropile and persists throughout embryogenesis (unpublished data). By 13 h the highest levels of Sema 1a are in the lateral and intermediate portions of layers 1 and 3, at lower levels in layer 2, and not detectable in layer 4 (Figure 2F and 2I). In addition to differences in the levels of Sema 1a expression in different dorso-ventral layers of the neuropile, we also find an apparent decrease in concentration from intermediate (high) to medial (low) in layers 1 and 3. At 21 h Sema 1a is still strong in intermediate portions of layers 1 and 3. The effect of overexpressing Plex A is to exclude sensory terminals from these high levels of Sema 1a expression (Figure 2D). We also analyzed the distributions of Sema 1a and Sema 2a in the antero-posterior axis, at the time of sensory axon ingrowth into the CNS, and found they appear uniform (Figure S2A and S2B). To confirm that Sema 2a and Sema 1a act as the ligands for Plex B and Plex A in our experiments, we tested the sema 2a and sema 1a dependence of the Plex B and Plex A overexpression phenotypes in sensory neurons. We analysed patterns of sensory terminals in sema 2a03021 loss of function embryos [25] and in embryos in which plex B was overexpressed in sensory neurons in a sema 2a03021 background. In sema 2a03021 embryos we find ectopic sensory terminals in layer 2 (Figure 3A). Overexpression of Plex B in sensory neurons in a sema 2a03021 background fails to exclude sensory terminals from central and dorsal neuropile (compare Figures 2C and 3B). The pattern of sensory terminals in these embryos is similar to their pattern in sema 2a03021 mutants (compare Figure 3A and 3B). We conclude that Sema 2a is the functional ligand for Plex B in this system. sema 2a and sema 1a mutations suppress the phenotypes of Plex B and Plex A overexpression in sensory neurons. We recombined the UAS-plex A-HA [24] transgene with the sema 1aP1 [26] mutation to express Plex A in a sema 1a mutant background. We analysed patterns of sensory terminals in sema 1aP1 mutant embryos and in embryos in which Plex A was overexpressed in sensory neurons in a sema 1aP1 background. In sema 1aP1 embryos, we found ectopic sensory terminals in layers 1 and 3 (Figure 3C). Overexpression of Plex A in sensory neurons in a sema 1aP1 background failed to exclude them from layers 1 and 3 (compare Figures 2D and 3D). The pattern of sensory terminals in these embryos is strikingly similar to their pattern in sema 1aP1 mutants (compare Figure 3C and 3D). We conclude that Sema 1a is the functional ligand for Plex A in this system. We were able to identify potential cellular sources of the transmembrane Semaphorin Sema 1a by looking for neuronal populations that project to layers 1 and 3 (Figure S3). One such population are the motorneurons, most of which project dendrites to layer 1 (Figures 1 and S3A). Using the OK371-GAL4 we targeted the expression of the cell death gene reaper and of the CD8GFP reporter (OK371-GAL4, UASCD8GFP;UAS-reaper) to the motor neurons [27]. This resulted in the death of most motor neurons by the early first instar larval stage (as judged both by the onset of larval paralysis and by the loss of GFP signal) (Figure S3C). Immunofluorescence visualisation of Sema 1a shows a significant reduction in Sema 1a levels in layer 1 in animals that lack motor neurons, compared to animals with intact motor neurons (Figure S3B, S3D, and S3E). We conclude that the motorneuron dendrites are likely to be a source of Sema 1a in the dorsal neuropile. Another cell population that projects to layer 1, as well as to layer 3, are the GABAergic interneurons (Figure S3F). We used GADGAL4 [28],[29] to visualise and kill both the motor neurons and the GABAergic interneurons and found that this resulted in nearly complete loss of Sema 1a staining from both layers 1 and 3 (Figure S3G, n=10 embryos). We conclude that the GABAergic interneurons are likely to be a significant source of Sema 1a in layer 1 and also in layer 3. We have so far been unable to identify cellular populations that project exclusively to layer 2, but since the expression is continuous across the midline (see Figure S2A), at least some midline cells could be involved. One possibility is that the recently described extensions of midline glial cells, the gliopodia [30], might provide a vehicle by which high levels of Sema 2a are deployed across the developing neuropile. Interestingly these extensions of the glial cells have a limited life span, becoming much reduced late in embryogenesis and we find that Sema 2a expression also declines in these late stages. The VNC of embryos that lack midline glial cells (for example in single minded mutants; [31]) are too fragile and disorganized to allow analysis of levels of Sema 2a along the dorso-ventral axis. Instead we restored Sema 2a expression in the midline glial cells using the single mindedGAL4 line [31], in an otherwise sema 2a mutant background (sema 2a, UAS-sema 2a;single-mindedGAL4) (see Figure S4A–S4C for details of these experiments). We were able to restore Sema 2a expression in the neuropile (Figure S4B), in layers 1, 2, and 3 (Figure S4C), but in a pattern that appeared broader than the endogenous stripe in layer 2. Thus a source of the Sema 2a gradients could potentially be a subset of the midline cells, although we cannot exclude the possibility that some other cells are the endogenous source of this cue in the CNS. Ventrally Projecting Sensory Neurons Terminate in Regions of Low Sema 1a and Low Sema 2a Expression Levels To investigate the role of the Sema/Plexin system in determining the position at which axons terminate within the layered structure of the neuropile we decided to focus our experiments on a single class of sensory cells with well defined terminal branches, the nociceptive class IV md neurons. Class IV md neurons can be identified with ppkEGFP [9],[12], which labels one ISN and two SN neurons in each hemisegment. The axons of these cells terminate medially in the ventral-most part of the neuropile, layer 4 (Figure 1C), where they branch asymmetrically in the antero-posterior axis (Figure S7A). By examining the location of these ppkEGFP-expressing axons with respect to Sema expression we confirmed that at 13-h AEL these axons terminate in a region of low Sema 2a (Figure 4A) and just below regions of high Sema 1a expression levels (Figure 4B). At 21-h AEL the class IV terminals remain in a region with low Sema 1a expression (Figure 4C). Class IV md neurons terminate in a neuropile region with low levels of Sema 1a and Sema 2a. sema 1a and sema 2a Are Required to Exclude Ventrally Projecting Class IV md Neurons from Dorsal and Central Neuropile We now asked whether sema 1a and sema 2a are required to confine class IV projections to layer 4. In embryos mutant for sema 1aP1 [26], sema 2a03021 [25], and in sema 1aP1, sema 2a03021 double mutants, the class IV axons have aberrant patterns of termination and/or growth in the dorso-ventral axis (compare Figure 5A with 5B, 5C and 5D; see also Figure S6 for details of the effects of these mutations on the dorso-ventral position of Fas II tracts). We make a distinction between growth and termination phenotypes of class IV axons (For details of this distinction and examples of different kinds of growth and termination phenotypes see Figure S5). sema 1a and sema 2a are required to exclude Class IV axons from dorsal and central neuropile. We found a significant increase in the percentage of hemisegments with aberrant terminals in sema 1aP1, sema 2a03021, and sema 1aP1, sema 2a03021 double mutants with respect to sema 1aP1/+ controls (Figure 5A–5H). Moreover, the percentage of hemisegments with aberrant termination in sema 1aP1, sema 2a03021 double mutants, was significantly higher than in either sema 1aP1 or sema 2a0302 single mutants (Figure 5E). We found that in sema 1aP1 mutants aberrant class IV axons tend to terminate in layer 1 more often than in layer 2 (Figure 5F). Conversely, in 2a03021 mutants, we found that aberrant class IV axons tend to terminate in layer 2 more often than in layer 1 (Figure 5G). In sema 1aP1, sema 2a03021 double mutants (Figure 5H) class IV axons terminate with roughly equal probability in layers 1, 2, or 3. Sema 1a appears to play a more important role in preventing termination in layer 1, followed by layer 3, and a minor role in preventing termination in layer 2. Sema 2a appears to play a more important role in preventing termination in layer 2, and a minor role in preventing termination in layers 1 and 3. Our results suggest that Sema 1a and Sema 2a are instructive for termination of class IV axons along the dorso-ventral axis. We also assessed the potential roles of Sema 1a and Sema 2a in controlling the termination of class IV axons in the antero-posterior axis by analysing their projections in a top-down view of the neuropile in wild type and in sema 1a, sema 2a double mutants (Figure S7). We chose the sema 1a, sema 2a double mutant for this analysis, because it exhibited the strongest phenotypes in the dorso-ventral axis. Wild-type class IV axons grow asymmetrically, within their normal ventral and medial termination domain, forming thicker terminal in the anterior than in the posterior portion of the segment (Figure S7A). We did not observe a significant loss of this asymmetry in the sema 1a, sema 2a double mutant compared to wild type (Figure S7B and S7C). In top down view, class IV terminals do appear disorganized compared to wild type, but we assume this disorganization is a consequence of the major defects in growth and termination in the dorsoventral axis. Thus Sema 1a and Sema 2a do not appear to play a major role in confining class IV terminals to the anterior portion of the segment. Consistent with this idea is also our finding that the distributions of Sema 1a and Sema 2a appear uniform in the antero-posterior axis, at the time of sensory axon ingrowth into the CNS (Figure S2A and S2B). sema 1a Is Required Non-Cell-Autonomously to Exclude Class IV Axons from Dorsal Neuropile In some cases membrane-bound Sema 1a acts as a receptor [18],[32]. Thus, rather than a requirement to act as a cue, the class IV mutant phenotypes could reflect a cell-autonomous requirement for Sema 1a in the sensory neurons themselves. To resolve this, we performed two kinds of rescue experiments. First, we restored sema 1a expression to sensory neurons in sema 1aP1mutant embryos using PO163GAL4. Antibody labelling confirms that Sema 1a is successfully targeted to embryonic sensory terminals using this driver (compare Figure 6A, 6C, and 6E) and shows that in the mutant a large fraction of Sema 1a-expressing sensory neurons aberrantly project to the dorsal part of the neuropile (Figure 6E). We then analysed specifically the projections of class IV neurons in embryos where sema 1a expression had been restored to sensory neurons in the sema 1aP1 mutant background (compare Figure 6B, 6D, and 6F). There was no rescue of the dorsal termination phenotype of class IV axons in these embryos. Quantification revealed no significant reduction in dorsal termination of class IV axons, compared to sema 1aP1 mutants (Figure 6I). Thus, sema 1a is not required in class IV neurons themselves to exclude their terminals from dorsal neuropile. sema 1a is required non-cell autonomously in dorsal neuropile for the ventral projection of Class IV axons. In a second set of experiments, we selectively restored Sema 1a to dorsal neuropile in an otherwise sema 1aP1 mutant background, by using HB9GAL4 to drive its expression in a subset of motor neurons [33]. We used the HB9GAL4 line for this rescue experiment, because it is expressed before sensory neurons grow into the neuropile, unlike GADGAL4 or OK371GAL4, which are expressed later. We confirmed that Sema 1a is selectively present in dorsal neuropile in these experiments (compare Figure 6A, 6C, and 6G), and we observed a significant reduction in the dorsal termination of class IV axons compared to sema 1aP1 mutants (Figure 6H and 6I). Thus in an otherwise mutant background, the mutant phenotype of class IV axons can be partially rescued by expressing sema1a dorsally in the dendrites of motor neurons. We also asked whether restoration of Sema 2a expression in the midline glial cells using the single mindedGAL4 line in an otherwise sema 2a mutant background (sema 2a, UAS-sema 2a;single-mindedGAL4, ppkeGFP) rescues the sema 2a mutant phenotype of class IV axons (see Figure S4 for details of these experiments). We observed a significant reduction in the aberrant termination of class IV axons in layer 2 compared to sema 2a mutant embryos (Figure S4D and S4E). plex A and plex B Are Both Required to Exclude Class IV Terminals from Regions with High Sema 1a and Sema 2a Expression Levels The experiments we describe suggest that both Sema 1a and Sema 2a are required as cues to confine class IV sensory axons to ventral neuropile. We also find that expressing either plex A or plex B in sensory neurons is sufficient to shift their terminals away from regions with high levels of Sema 1a and 2a. Thus, a combination of Plexins could be required in ventrally projecting sensory neurons to exclude them from dorsal and central neuropile. By in situ hybridization we confirmed previous reports [20] that ch, dbd, and class I–IV md neurons express plex B at the time that sensory axons grow into and terminate in the VNC (Figure S8D). By double labelling with anti-Plex A and anti-horseradish peroxidase we confirmed that Plex A is expressed in sensory neuron cell bodies at 13-h AEL (Figure S8A), and by double labelling with anti-Plex A and anti-GFP showed that Plex A is strongly expressed in the ppk-expressing class IV neurons (Figure S8B). Unfortunately, none of these experiments allows us to draw quantitative conclusions about levels of expression in different cells. High background levels also prevented a reliable analysis of Plex A expression along the dorso-ventral axis of the CNS. However antibody labelling against Plex A does reveal expression in the neuropile at 13-h AEL (Figure S8C). To show whether both Plexins are required to exclude the ventrally projecting sensory neurons from central and/or dorsal neuropile, we analysed the projection pattern of class IV axons in plex A and B mutants. In plex ADf(4)C3 mutants, class IV axons project aberrantly to central and/or dorsal neuropile (Figure 7A and 7B). Quantification reveals significantly more terminals in dorsal and central neuropile, compared to wild type (Figure 7G). Plex A and Plex B are required in sensory neurons for the ventral termination of Class IV axons. In plex BKG00878 mutants class IV axons also project to dorsal or central neuropile (Figures 7D and 7E). Quantification reveals significantly more terminals in dorsal and central neuropile compared to wild type (Figure 7G). We also quantified the proportion of terminals in each of the different layers of the neuropile (Figure 7H and 7I). We found that in plex B mutants Class IV axons terminate with roughly equal probability in layers 1, 2, or 3. This suggests that Plex B may normally have a role in preventing termination in layers with high levels of Sema 2a or Sema 1a and may therefore be a functional receptor for both ligands. We also observed that embryos transheterozygous for plex B and either sema 1a (sema 1a/+; plex B/+), sema 2a (sema 2a/+; plex B/+), or plex A (plex B/plex A) all exhibit class IV termination phenotypes, indicating a genetic interaction between these mutations (unpublished data). To further test whether Plex B functions to prevent termination in regions with high Sema 1a levels we analysed the patterns of sensory terminals that overexpress Plex B in a sema 1a mutant (Figure S9). In these embryos we observed a striking expansion of sensory terminals into the intermediate region of layer 1 that normally contains highest Sema 1a levels within this layer (compare Figure S9A and S9B). In a wild-type background Plex B-overexpressing sensory terminals remain confined in the most medial portion of layer 1, even though they become excluded from layer 2 (Figures 2C, S9A). A similar expansion is also observed in plex ADf(4)C3 mutants, indicating that Plex A function is required to prevent termination in regions with highest levels of Sema 1a (Figure S9C). Interestingly, we found that in the absence of plex A, Plex B overexpression in sensory neurons is sufficient to prevent their expansion into regions with highest Sema 1a levels (in PO163GAL4, UAS-plex B; plex ADf(4)C3 embryos) (Figure S9D). Rescuing plex A and plex B in Sensory Neurons Alone Is Sufficient to Prevent the Aberrant Projection of Class IV Neurons to Dorsal and Central Neuropile To exclude (a) the possibility that plex A and B are required in the central targets of sensory neurons, in which case the mutant phenotypes might be a result of aberrations in normal target directed growth and (b) the possibility that Plex A and B are acting as guidance cues we restored their expression selectively to sensory neurons using the P0163 GAL4 driver. Restoration of Plex B expression selectively in sensory neurons in a plex BKG00878 mutant background, rescues the phenotype of class IV md neurons (Figure 7E). Quantification reveals significantly fewer aberrant class IV terminals in plex B-rescue embryos as compared to plex B mutants (Figure 7G). The Fas II tracts continued to exhibit mutant phenotypes in these experiments, as would be expected if the rest of the neuropile, other than the sensory neurons, remained mutant (Figure S10). Likewise, restoration of Plex A expression in sensory neurons in a plex ADf(4)C3 mutant background, rescued the phenotype of class IV md neurons (Figures 7C). Quantification reveals significantly fewer aberrant class IV terminals in plex A-rescue embryos compared to plex A mutants (Figure 7G). We conclude that both plex A and plex B are required in sensory neurons for the appropriate targeting of class IV axons to the ventral neuropile. Discussion In the work we report here, we have addressed the general issue of how neuronal termination is regulated within a complex meshwork of differentiating axons and dendrites. Connections are formed within a central neuropile from which cell bodies are excluded. As first noted by Cajal [34], removing cell bodies to the periphery, while multiple connections are formed within a central core, maximises the economy with which the network is wired together. As a consequence of this organisation the processes of neurons of all kinds—motor, sensory, and interneurons—grow into a common volume of the nervous system within which connections will be formed. This growth is patterned and consistent, so that the forming network is partitioned into different domains within which limited subsets of neurons terminate and form characteristic arborisations. In the VNC of Drosophila embryo, for example, motor neurons place their dendrites in the most dorsal domain of the neuropile where they arborise to form a myotopic map that represents centrally the distribution of innervated muscles in the periphery [35]. While these dendritic maps are forming dorsally, the axons of sensory neurons are growing into the same neuropile and terminating in other characteristic and consistent domains. Here too, each modality is targeted to a particular volume of the neuropile where the terminals form a characteristic pattern of arborisations [7]. In a system in which neither the axonal nor dendritic terminals are constrained by the cell bodies or by the position of entry of the main dendrite or axon trunk into the neuropile, we envisage that connectivity develops in stages with an initial phase in which growing axons and dendrites are both delivered to appropriate volumes of the neuropile, followed by a phase of targeted connection between appropriate pre- and postsynaptic partners. A pattern of growth of this kind would resemble that seen in the developing olfactory system of the adult fly where coarse targeting to particular regions of the antennal lobe is followed by precise recognition and matching between axons and dendrites [36],[37]. Alternatively it may be that the mechanisms that pattern the growth of fibres representing a single modality such as olfaction are different from those required to organise the distribution of terminals within a highly heterogeneous network such as that seen in the VNC. In our view the initial delivery and restriction of fibres to particular subvolumes of the VNC neuropile is likely to be by individual growth responses to generalised systems of cues that operate to pattern the network as it develops. In a previous paper we were able to demonstrate the operation of one such cue, Slit, which, acting through its Robo receptors dictates the different positions in the mediolateral axis at which specific sensory axons will terminate and arborise. Here we have shown that membrane bound and secreted Semas acting through their receptors the Plexins restrict growing axons and their terminals to particular dorso-ventral layers of the forming neuropile. Evidence for Sema/Plexin Signalling Acting in the Dorso-Ventral Axis of the Neuropile Our initial approach of using a misexpression screen targeted to all sensory axons was sufficient to reveal the existence of the Semas as putative cues in the dorso-ventral axis by showing that there were generalised redistributions of sensory endings in this axis when the cells concerned were forced to express either of the two Sema receptors, Plex A or Plex B. These shifts were readily detectable when the nervous system was viewed in a plane at right angles to the neuraxis. Viewing the nervous system in this plane also reveals the largely complementary patterns of expression for the two Semas. The membrane bound Sema 1a is distributed in an alternating pattern across the neuropile with high levels in both layers 1 and 3. The secreted Sema, Sema 2a, on the other hand, is expressed at high levels in a central strip that extends across the midline and in gradients that decline ventrally and dorsally orthogonal to the Slit gradient (Figure 2). A gradient of Sema 2a has also been described in the embryonic limb of the grasshopper [38]. There it contributes to the polarized growth of pioneer sensory axons away from the region of highest Sema 2a expression at the tip. In the developing Drosophila embryo selective overexpression of the putative receptors for Sema 1a and Sema 2a in sensory neurons acts in a predictable fashion to exclude sensory axons and terminals from those regions where the ligands are highly expressed: overexpression of Plex A excludes projections from high levels of Sema 1a expression in layers 1 and 3. Overexpression of Plex B shifts sensory terminals further away from the central layer of the neuropile. These findings suggest that Sema 2a and Sema 1a provide guidance cues to the growth cones of sensory neurons that express Plex A and Plex B. It is consistent with this idea that in the absence of Sema 1a, Plex A overexpression in sensory neurons does not exclude their terminals from regions that normally contain high Sema 1a levels. Similarly, in the absence of Sema 2a, Plex B overexpression in sensory neurons does not exclude their terminals from the central layer of the neuropile. The manipulations of the pattern of sensory terminals in the dorso-ventral axis found with Plexin overexpression are analogous to the manipulations in the medio-lateral axis that are found with Robo3 misexpression. In both dimensions the position at which sensory neurons form their terminals is determined by their expression of receptors for positional cues. Sema/Plexin Signalling Guides Termination in the Dorso-Ventral Axis The most ventrally located sensory terminals, the ppk-expressing md neurons are derived from axons that actually enter the nervous system dorsally and grow downwards, skirting alternative neuropile regions before turning inwards to reach their characteristic medial, ventral domain of termination. A consequence of Sema signalling is that these ventrally targeted axons are excluded from more dorsal regions of the neuropile and channelled instead through a limited lateral region where the expression of both proteins is low, so that their inward migration towards the midline is blocked until they reach the most ventral region. In the absence of either of the Semas or their Plexin receptors, ppk-expressing axons aberrantly enter and terminate in more dorsal regions of the neuropile. This suggests that the growth cones of these cells are attracted towards to midline (we assume by Netrins) [39] as soon as they enter the CNS, but that entry and termination in the more dorsal region of the neuropile is prevented by high levels of Sema 1a in layer 1. In vertebrates genetic studies show that proprioceptive axons are excluded from the superficial dorsal horn by Sema 6D/6C signalling mediated by Plex A1. Loss of Plex A1 allows proprioceptive collaterals to invade the superficial dorsal horn although most succeed in projecting through it to their normal more ventral target zones [19]. In an analogous (though not topologically equivalent) fashion, ventrally projecting afferents in Drosophila require Sema signalling through Plex A for their proper exclusion from the most dorsal neuropile. Loss of plex A appears to affect class IV terminals less strongly than loss of sema 1a. One explanation could be that Plex B might also function as a receptor for Sema 1a in this system. Our observation that in plex B mutants class IV axons aberrantly terminate in layers 1, 2, or 3 supports this possibility. We also find that Plex B overexpression in sensory neurons in plex A mutant embryos, prevents aberrant expansion of sensory terminals into intermediate portion of layer 1, which contains very high levels of Sema 1a (Figure S9). Such an expansion occurs in both plex A and sema 1a mutant embryos. High levels of Plex B signalling thus appear to be able to substitute for the absence of Plex A signalling and prevent expansion into regions with high Sema 1a levels. These findings could be explained if Plex B were to function as a lower affinity receptor for Sema 1a, as well as a high affinity receptor for Sema 2a. Sema 1a and Sema 2a are unlikely to be the only cues that operate in the dorso-ventral axis. The incomplete penetrance of the termination phenotype in the sema 1a, sema 2a double mutant suggests that additional factors may operate to control the ventral targeting of class IV axons. There may be long range ventral attractants or local substrate bound attractive cues for these axons in the neuropile. It is also likely that dorsally and centrally located sensory and interneuron terminals, as well as dendrites of motor neurons may require additional signals to exclude them from ventral neuropile. Such signals could be the other Semas. Alternatively, by analogy with the optic tectum, where Wnt signalling drives dorsal projections and Ephrins dictate ventral projections, it is possible that some other signalling system may operate with Semas to confine dorsally projecting neurons to dorsal neuropile [3],[40],[41]. Type-Specific Repulsion in the VNC In the fly antennal lobe, during the formation of the olfactory map, Sema 1a expression on the surfaces of antennal olfactory receptor neuron (ORN) axons excludes Plex A expressing maxillary palp ORN axons from inappropriate glomeruli [22],[23]. Our findings suggest that much of the Sema 1a expression in the neuropile of the VNC is on the surfaces of motor neuron dendrites and on the projections of the GABAergic interneurons. Thus, there appear to be two kinds of positional cues in the neuropile. Slit and Sema 2a are examples of secreted and possibly glia-mediated positional cues. Sema 1a on the other hand is presented on membranes of particular neuronal classes (GABAergic interneurons and motorneurons) and is a repellent for the axons of at least one other type of neuron (class IV md neurons). Thus, the presentation of repellent molecules on the surfaces of subsets of neurons can act to exclude specific classes of axons from particular regions of the neuropile. Positional Cues Subdivide the Neuropile into Different Termination Domains within Which Connections Form Theoretical models for gradient-guided axonal growth and targeting during the formation of 2-D neural maps, such as the retinotopic projections, require at least one gradient in each of the two—not necessarily Cartesian—dimensions [42]. These ideas have been borne out by experimental findings. Gradients of attractants and repellents in one dimension have been implicated in providing positional information for terminating sensory axons during the formation of both continuous and discrete neural maps [18],[43],[44]. Furthermore, a recent study has shown that two orthogonal systems of graded cues operate to specify position of termination along each axis of a somatotopic map in the optic tectum. Our findings address the larger issue of how termination of distinct neuron classes is regulated within a complex meshwork of differentiating axons and dendrites. They suggest that similar mechanisms that are used for the establishment of neural maps, only involving generalized positional cues in each dimension, control targeting of many different classes of neurons to specific termination domains within a complex neuropile. Although the evidence we provide here suggests that positional cues can specify particular domains for the termination of sensory neurons, we do not suppose that the control of termination and branching by a pervasive system of positional cues would necessarily be sufficient to allow connections to form selectively and specifically between appropriate pre- and postsynaptic partners. What such a system does provide is a framework of signals that could regulate simultaneously the growth of axons and dendrites of many different neurons and induce their termination and branching in appropriate parts of the developing network. Within these restricted regions it is likely that further, localised mechanisms, including competitive interactions, patterns of activity, and target derived cues might all be required to control synaptogenesis and determine the emergence of precise patterns of connectivity within a termination domain. Coordinate Positioning of Pre- and Postsynaptic Terminals by the Same Cues? If the pattern of sensory axon termination within the neuropile is controlled by a system of positional cues, most likely, in three dimensions, it may well be that the location of their postsynaptic dendrites is determined in a similar fashion. If this were the case, the matched expression of receptors for the same system of signals by pre- and postsynaptic neurites would guide them to a common volume as a prelude to the formation of synaptic connections between them. Recent studies that show that developing motor neuron dendrites respond to some of the same cues as terminating sensory axons provide indirect evidence for common systems of positional cues leading to the coordinate targeting of presynaptic axons and postsynaptic dendrites [45],[46]. A direct test of this hypothesis, however, must await the identification of the postsynaptic interneurons with which developing sensory neurons form connections. It will then be possible to make a direct investigation of the molecular mechanisms that control the termination and branching of pre- and postsynaptic endings and thereby lay out a ground plan for connectivity within the developing neuropile. Materials and Methods Fly Stocks For mutant analyses sema 2a03021 [25], sema 1aP1 [26], plex ADf(4)C3 [47], and plex BKG00878 [21],[48] were crossed into the ppkEGFP [9] stock. Stocks were made using GFP balancers [49]. Homozygous mutant embryos were identified by lack of GFP. For misexpression we used the following stocks: UAS-robo3 [50],[51] inserts on second and third chromosome, UAS-plexB [21] and UAS-plexA-HA [47], UAS-robo2 [51], UAS-ephrin [52],[53], UAS-eph [52], UAS-unc5 [54], UAS-frazzled [55], UAS-drl-DN [56], UAS-comm [57], UAS-robo, 410 EP-lines from the Rorth collection [58],[59]. For the misexpression screen the UAS-lines were crossed into the PO163GAL4, UAS-n-syb-GFP stock [14],[60]. For rescue experiments the following embryos were analysed: UAS-sema 1a, sema 1aP1; PO163GAL4, ppkEGFP [26], UAS-sema 1a, sema 1aP1; HB9GAL4, ppkEGFP; UAS-plexA-HA/+; PO163GAL4, ppkEGFP/+; plex ADf(4)C3, and UAS-plex B/PO163GAL4, ppkEGFP; plex BKG00878. We also used OK371GAL4 (gift of M. Landgraf), GADGAL4 [29], single mindedGAL4 [31], UAS-reaper [27] and wnt5D7 stocks [56]. Dissection Embryos were staged and VNCs dissected out embryos as previously described [6],[61],[62]. For overexpression experiments embryos were grown at 29°C. VNCs were mounted with brain lobes down and VNC up to allow rapid, high-resolution, confocal imaging of transverse planes, perpendicular to the neuraxis. Immunocytochemistry We used the following primary antibodies: anti-Sema 2a (MAb 19C2, developed by C. Goodman), anti-Slit (MAb C555.6D, developed by S. Artavanis-Tsakonas), anti-Fas II (MAb 1D4, developed by C. Goodman), and anti-Repo (MAb 8D12, developed by C. Goodman) supplied by the Developmental Studies Hybridoma bank (110 dilution); anti-Sema 1a (11,000 dilution, kindly provided by A. Kolodkin [26]; anti-Plex A (1500 dilution, kindly provided by L. Luo) [23], and Cy5-conjugated goat anti-horseradish peroxidase (1100 dilution; Jackson ImmunoResearch). Secondary antibodies were used at 1500 dilution: Alexa488-conjugated donkey anti-goat, Alexa488-conjugated goat anti-rabbit, Alexa633-conjugated goat anti-mouse, Alexa633-conjugated rabbit anti-mouse (Molecular Probes). Standard immunocytochemical procedures were followed [63], and immunofluorescence was visualised with Leica SP1 and Zeiss LSM confocal microscopes. Images are maximum projections of confocal z series processed with Adobe Photoshop software. Quantification Procedures For quantification of Sema 2a gradients at 13-h AEL nine VNCs stained for Sema 2a were randomly chosen and A7 imaged using a Leica SP1. A confocal section was randomly chosen from each stack, the dorso-ventral axis manually drawn, and the neuropile was divided into nine equal dorso-ventral stripes, perpendicular to the midline and the average fluorescence intensity in each stripe was calculated. Values from different nerve cords were normalized such that the average intensity from each nerve cord was 1. For quantification of the Slit gradient at 13 h, 12 VNCs stained for Slit were randomly chosen, and A7 was imaged using a Leica SP1. A confocal section was randomly chosen from each stack, a line on either side of the midline was manually drawn, and the neuropile on either side of the midline was divided into four mediolateral stripes. The average fluorescence intensity in each stripe was calculated and normalised as above. For a statistical analysis of defects in the pattern of sensory terminals (visualized with PO163GAL4, UAS-n-syb-GFP) along the medio-lateral axis we quantified the normalised surface area occupied by sensory terminals (sensory area, SA) in the medial domain of the neuropile (SAM/T=SA[medial]/SA[medial+intermediate+lateral]) in randomly chosen transverse confocal sections from 30 different hemisegments for each genotype. Within a single embryo, we selected every tenth section (all confocal sections were 1-µm thick so that the analyzed sections were 10 µm apart from each other). A Student's t-test was used to compare the mean SAM/T for the different genotypes. For a statistical analysis of expansion or exclusion of sensory terminals (visualized with PO163GAL4, UAS-n-syb-GFP) into different dorso-ventral layers we compared SA in layer 2 (SA2/h=SA(layer 2)/[hemisegment surface area]) or SA in layers 1, 3, and 4 (SA1+3+4/h=SA[layer 1+3+4]/[hemisegment surface area]) in randomly chosen transverse confocal sections from more than 30 different hemisegments for each genotype. Within a single embryo, we selected every tenth section (all confocal sections were 1-µm thick so that the analyzed sections were 10 µm apart from each other). A Student's t-test was used to compare the mean SA2/h or SA1+3+4/h for the different genotypes. For a statistical analysis of termination defects class IV md axons in the dorso-ventral axis, we quantified the percentage of hemisegments with aberrant terminals (Hat) in layers 1, 2, and 3 per embryo (per 14 hemisegments): Hat(1, 2, or 3)=(n hemisegments with terminals in 1, 2, or 3/14)×100 and the total percentage of hemisegments with aberrant terminals per embryo [Hat(1+2+3)=Hat(1)+Hat(2)+Hat(3)]. A Student's t-test was used to compare the mean Hat for the different genotypes. In some cases we also quantified the average (per embryo) relative proportion of hemisegments with aberrant terminal in each layer: Hat(1)/Hat(1+2+3), Hat(2)/Hat(1+2+3), and Hat(3)/Hat(1+2+3). We counted as “aberrant” only those hemisegments with terminals in layers 1, 2, or 3 (Figure S5A and S5B). We did not count as “aberrant” those axons that exhibit the aberrant growth, with normal termination phenotype (Figure S5C). Supporting Information Figure S1 Sensory neuron termination does not correlate with nerve route and position of entry into the neuropile. (A) Diagram showing the pathways in the neuropile taken by sensory neurons that run in the ISN (magenta) and the SN (green) en route to their termination domains (yellow) in wild type (21 AEL), with respect to Fas II tracks (red). Diagram represents a projection of a confocal z series of transverse sections through an abdominal segment. Dorsal up. White arrowhead shows midline. Magenta lines indicate the pathways taken by ISN neurons in the neuropile. Green lines indicate the pathways taken by SN neurons in the neuropile. 2, 3, and 4 indicate sensory neuron termination domains in layers 2, 3, and 4, respectively. Scale bar: 10 mm. Sensory axons whose cell bodies are located ventrally in the body wall join the SN nerve, whereas axons whose cell bodies are located dorsally or laterally in the body wall join the ISN nerve. There is no correlation between the nerve that axons travel in and the position of their termination in the neuropile. Sensory axons running in the SN terminate in layers 2, 3, or 4, in correlation with their modality and dendritic morphology [9],[11]. For example, the ventral class IV neuron (vdaB) terminates in layer 4, the ventral ch neurons terminate in layer 3, and the ventral proprioceptive class I neuron vpda terminates in layer 2 (M. Zlatic, unpublished data) [9],[11]. Similarly, those sensory neurons that travel in the ISN terminate in layers 2, 3, or 4, depending on their modality and dendritic morphology. Dbd and the ddaE and ddaD class I md neurons, terminate in layer 2, the lateral ch neurons terminate in layer 3, and the dorsal class IV neuron ddaC terminates in layer 4 [9],[11]. Each of the three characteristic modality-specific sensory termination domains, therefore, contains some neurons that have travelled through the SN, and others that have travelled through the ISN. Thus, the position of termination in the neuropile does not seem to depend on the nerve through which the sensory neurons travel towards the neuropile nor on the position of entry into the neuropile. (B) Images show the patterns of growth and termination of md axons labelled with 109(280)GAL4, UAS-CD8GFPGFP (white) in 21-h embryos. Dorsal is up. Red arrowheads show the midline. Magenta arrows indicate pathways taken by ISN neurons in the neuropile. Green arrows indicate pathways taken by SN neurons in the neuropile. 2 and 4 indicate md neuron termination domains in layers 2 and 4, respectively. Scale bar: 10 µm. Upper: a single section from a confocal z series through an abdominal segment showing the ISN pathways. Lower: a single section from the same series showing the SN pathways. Both ISN and SN md neurons terminate in layers 2 or 4, depending on their dendritic morphology. (0.93 MB TIF) Figure S2 Distributions of Sema 2a and Sema 1a along the antero-posterior axis of the neuropile. (A and B) Immunofluorescence visualisation of Sema 2a (A) and Sema 1a (B) (white) in ppkeGFP embryos (13-h AEL). Upper images show projections of confocal z series of longitudinal sections through the VNC. Central and lower images show single more dorsal and more ventral sections from the stack, respectively. Anterior is to the left. Red arrowheads show midline. a and p indicate the position of anterior (a) and posterior (p) commissures in each segment. Scale bar: 35 µm. (A) Levels of Sema 2a are uniform along the antero-posterior axis, thus Sema 2a is unlikely to provide instructive information for controlling neurite termination along this axis. Both in more dorsal (layer 2) and in more ventral (layer 3) longitudinal sections, levels of Sema 2a appear uniform along the antero-posterior axis. (B) Levels of Sema 1a are uniform along the antero-posterior axis. Thus Sema 1a is unlikely to provide instructive information for controlling neurite termination along this axis. Both in more dorsal (layer 1) and in more ventral (layer 3) longitudinal sections, levels of Sema 1a appear uniform along the antero-posterior axis. (1.34 MB TIF) Figure S3 Cellular sources of Sema 1a in the neuropile. (A–E) Sema 1a is brought into dorsal neuropile, in part, by motor neurons. (A, C) Immunofluorescence visualisation of motor neuron dendrites labelled with OK371GAL, UAS-CD8-GFP control (white), in OK371GAL4, UASCD8GFP embryos (A) and in OK371GAL4, UASCD8GFP, UAS-reaper embryos (B) at 21-h AEL. (B, D) Immunofluorescence visualisation of Sema 1a pattern (white) in OK371GAL4, UASCD8GFP control (B) and OK371GAL4, UAS-CD8GFP, UAS-reaper (C) embryos 21-h AEL. Dorsal is up. Arrowheads indicate the midline. Magenta lines, neuropile boundaries. Red lines, layer boundaries. Numbers indicate layers. Scale bar: 10 µm. A. In control embryos processes of motor neurons labelled with OK371GAL, UAS-CD8-GFP are readily detectable and they are located in layer 1, which normally contains high levels of Sema1a. (B) In control embryos Sema 1a is present at high levels in layers 1 and 3. (C) Motor neuron dendrites are not detectable in OK371GAL4, UAS-CD8GFP, UAS-reaper. (D) Sema 1a expression in the same animal, as in (C). Sema 1a levels in layer 1 are reduced relative to layer 3 in the absence of motor neuron dendrites. (E) Quantification of Sema 1a levels in layer 1 relative to layer 3 in the same hemisegment, for OK371-GAL4, UASCD8GFP control and OK371GAL4, UAS-CD8GFP, UAS-reaper, 21-h old embryos. For this purpose, embryos of the two genotypes were stained with antibody against GFP (to distinguish between embryos with and without motor neurons). In each embryo seven sections from seven different hemisegments where chosen at random, and for each section the ratio of the pixel intensity (PI) for the channel showing Sema 1a staining in layer 1 relative to layer 3 (PI1/3=PI[Sema 1a in layer 1]/PI[Sema 1a in layer 3]) was calculated. A significant decrease (p=2×10-6; Student's t-test) in pixel intensity in layer 1 relative to layer 3 was observed in OK371GAL4, UAS-CD8GFP, UAS-reaper embryos (average PI1/3=0.73; standard deviation [SD]=0.09, n=65 hemisegments) compared to OK371GAL4, UAS-CD8GFP controls (average PI1/3=0.81; SD=0.08, n=48 hemisegments). (F and G) Sema 1a is brought into dorsal and ventral neuropile, in part, by GABAergic interneurons. (F and G) Immunofluorescence visualisation of motor neurons and GABAergic processes labelled with GADGAL, UAS-CD8-GFP (white) (F) and of Sema 1a pattern (white) in GADGAL4, UAS-CD8GFP, UAS-reaper (G), 21-h old embryos. Dorsal is up. Arrowheads indicate the midline. Magenta lines, neuropile boundaries. Red lines, layer boundaries. Numbers indicate layers. M, medial; I, intermediate; L, lateral domains. Scale bar: 10 µm. (F) Processes of GABAergic interneurons and motor neurons together (white) cover the dorsal and central regions of the neuropile, which normally contain high levels of Sema 1a. (G) Sema 1a (white) levels appear highly reduced in embryos that lack both GABAergic interneurons and motor neurons. The characteristic Sema 1a distribution pattern is no longer detectable. (1.73 MB TIF) Figure S4 Restoration of sema 2a in midline cells partially rescues the aberrant central projection of Class IV axons. (A–C) Immunofluorescence visualisation of Sema 2a (white), in sema 2a mutant (A) and in sema 2a, UAS-sema 2a; single-mindedGAL4, ppkeGFP (B and C) embryos (21-h old). (D) Projections of class IV axons (green) and Fas II tracts (red) in sema 2a,UAS-sema 2a;single-mindedGAL4,ppkeGFP embryos (21-h old). (A and B) Image shows projections of a confocal z series of longitudinal sections through the VNC. Anterior is to the left. Red arrowheads show midline. Magenta line: neuropile boundary. Scale bar: 14 µm. (C and D) Images show projections of a confocal z series of transverse sections through A7. Dorsal is up. Arrowheads show midline. Magenta line (C): neuropile boundary. Red (C) and white (D) lines, layer boundaries. Numbers indicate layers: M, medial; I, intermediate; L, lateral domains. Scale bar: 9 µm. (A) Sema 2a expression is not detectable above background levels in the neuropile of 21-h-old sema 2a mutant embryos. (B) High levels of Sema 2a expression are detectable in sema 2a, UAS-sema 2a; single-mindedGAL4, ppkeGFP embryos. (C) Transverse view of Sema 2a expression in sema 2a, UAS-sema 2a; single-mindedGAL4, ppkeGFP embryos. Sema 2a expression in midline cells, in an otherwise mutant background, results in its distribution throughout the neuropile, with lower levels detectable on the lateral edges of the neuropile and in the ventral-most neuropile. (D) Restoration of Sema 2a expression in midline cells alone reduces the aberrant central projection of class IV neurons (compare to Figure 5C). However, this rescue was accompanied by additional defects in the medio-lateral axis, with increased lateral termination of class IV axons. Thus the source of the Sema 2a gradients could potentially be a subset of the midline cells, although we cannot exclude the possibility that some other cells are the endogenous source of this cue in the CNS. (E) Chart shows average percentage of hemisegments per embryo (per 14 abdominal hemisegments) with aberrant class IV terminals in layer 2 (Hat(2)), in sema 2a/+, sema 2a mutant and sema 2a,UAS-sema 2a;single-mindedGAL4,ppkeGFP embryos. Hat(2) is significantly higher in sema 2a (p=4×10-4; Student's t-test; average Hat(2)=3.3%; n=24 embryos, 336 hemisegments), than in sema 2a/+ controls (average Hat(2)=0%, n=30 embryos, 420 hemisegments). sema 2a, UAS-sema 2a; sim-GAL4,ppkeGFP show a significant reduction in Hat(2) (red **, p=0.006; Student's t-test; average Hat(2)=0.7%, n=29 embryos, 406 hemisegments), compared to sema 2a mutant embryos (average Hat(2)=3.3%). (1.28 MB TIF) Figure S5 Examples of growth and termination errors of class IV axons in mutant embryos. Projections of class IV axons (white) in mutant embryos (21-h AEL). Images show projections of a confocal z series of transverse sections through A7. Dorsal is up. Arrowheads show midline. Magenta arrows point to aberrant (dorsal or central) terminals of class IV axons. Green arrows point to class IV axons that initially grow normally (ventrally) in the neuropile, but afterwards turn dorsally, and terminate in aberrant layers (1, 2, or 3). Red arrows point to class IV axons that grow aberrantly in dorsal or central neuropile. We define terminals as large structures that form at the tips of axons (although sometimes they form along the axon path, on either side of the main axon trunk, which continues growing). These structures are thicker than the axon itself and we assume they contain presynaptic specialisations. Class IV axons exhibit several kinds of phenotypes in sema and plex mutant embryos. The most striking is normal initial growth with aberrant termination, where the axon initially grows appropriately towards its target area in the ventral medial neuropile, but then makes a sharp dorsal turn, and terminates in layers 1, 2, or 3 (for examples see Figures 5D, left axons, 7E, right axon, and S5A). In the case of aberrant growth, with aberrant termination, the misrouted axon grows through and forms terminals in layers 1, 2, or 3 (for examples see Figures 5B, 6F, 7B, 7D, right axon, and S5B). Some of these axons never reach their wild-type layer 4 (Figures 5B, 6F, 7B, and S5B) while others send a branch ventrally, after they have formed a terminal dorsally or centrally (right axon in Figure 7D). In the case of aberrant growth, with normal termination, the misrouted axon turns ventrally after growing through dorsal or central layers and terminates in its wild-type layer 4 (for example see Figures 7D, left axon, and S5C). For all our statistical analysis of termination defects (see below) we counted as “aberrant” only those hemisegments that exhibit the aberrant termination phenotypes (Figure S5A and S5B), but we counted as “normal” those hemisegments that exhibit the aberrant growth, with normal termination phenotype (Figure S5C). (A) Examples of normal initial growth with aberrant termination, where class IV axons grew in normally, and afterwards aberrantly turned dorsally or centrally and terminated there, in sema1a, sema 2a double mutant embryos (21 AEL). These examples show that the position of entry does not determine the position of termination. Despite the fact that these axons initially grow appropriately in the ventral neuropile, they afterwards alter direction of growth and invade aberrant neuropile layers, where they terminate. (B) Example of class IV axons showing both aberrant growth and aberrant termination in a sema 1a mutant embryo. The axons initially grow in dorsal neuropile, where they also forms a terminal at the midline. (C) Example of class IV axons showing aberrant growth, but normal termination in a plex B mutant embryo. The axon grows through dorsal neuropile, but turns ventrally at the midline, without forming a terminal. It forms a terminal once it reaches the ventral neuropile, in its appropriate location. (1.35 MB TIF) Figure S6 Defective positioning of Fas II tracts in different mutant backgrounds. Graphs show percentages of segments (n=175) in which L1 (blue), I2 (green), I3 (yellow), M1 (black), and M2 (white) tracts project aberrantly in sema 1aP1/+ (A), sema 1aP1 (B), sema 2a (C), and sema 1aP1, sema 2a double mutant (D) 21-h embryos. (A) In sema 1aP1/+ control embryos Fas II tracts grow normally in the dorso-ventral axis. In both sema 1aP1 and sema 2a embryos Fas II tracts are affected (B and C) and the disruption is more severe in double mutants (D). (0.29 MB TIF) Figure S7 Role of Semas in patterning the antero-posterior axis of the neuropile. We assessed the potential role of Sema 1a and Sema 2a in controlling termination of class IV axons in the antero-posterior axis by analysing their projections in a top-down view of the neuropile in wild type and in sema 1a, sema 2a double mutants (see Figure S5A and S5B). We chose the sema 1a, sema 2a double mutant for this analysis, because it exhibited the strongest phenotypes in the dorso-ventral axis. (A and B) Projections of class IV axons labelled with ppkEGFP (white) in sema 1a/+; ppkEGFP control (A) and sema 1a, sema 2a; ppkEGFP (B), 21-h embryos. Images show projections of confocal z series of longitudinal sections of the VNC (from T1 to A4). Anterior left. Arrowheads show midline. a, anterior half of the segment; p, posterior half of the segment. Scale bar: 16 µm. (A) Top-down view of wild-type class IV projections in T1–A4. Wild-type class IV axons grow asymmetrically, within their normal ventral and medial termination domain, forming a thick anterior branch and very thin processes that extend posteriorly. (B) Top-down view of class IV projections in T1–A4 in sema 2a, sema 1a double mutants. Note that while the class IV terminals appear disorganised compared to wild type, they still appear asymmetric and largely confined to the anterior portion of the segment. We assume the observed disorganization is a consequence of the major defects in growth and termination in the dorsoventral axis. (C) Quantification of the average surface area occupied by class IV terminals in the posterior half of the hemisegment, relative to the total surface area covered by class IV terminals in a hemisegment [SAp/(p+a)=SA(posterior)/SA(posterior+anterior]. Quantification of SAp/(p+a) does not reveal a significant increase (p=0.09; Student's t-test average SAp/(p+a)=0.22; SD=0.16; n=50 hemisegments) for the double mutants with respect to wild-type embryos (average SAp/(p+a)=0.19; SD=0.1; n=55 hemisegments). For comparison we analyzed the antero-posterior distribution of class IV projections in embryos mutant for the gene wnt 5, that has previously been implicated in controlling axon projections in the antero-posterior axis [56]. Wnt 5 is secreted by neurites in the posterior commissure of each segment. We also analyzed class IV projections in embryos in which the dominant negative form of Derailed (Drl) has been selectively targeted to sensory neurons (PO163GAL4,UAS-DN-drl). The Wnt 5 receptor Drl is present on the growth cones and axons of neurons crossing in the anterior commissure and is required to prevent these cells from crossing aberrantly in the posterior commissure [56]. In contrast, to the sema1a, sema 2a double mutants, when we analysed class IV projections in wnt 5D7 mutants and in PO163GAL4,UAS-DN-drl embryos, we did find a significant increase in SAp/(p+a) compared to wild type (for wnt 5, p=1.8×10-7; Student's t-test; average SAp/(p+a)=0.34; SD=0.12; n=16 hemisegments; for PO163-DN-drl, p=2.2×10-7; Student's t-test; average SAp/(p+a)=0.33; SD=0.12; n=30 hemisegments). Thus Sema 1a and Sema 2a do not appear to play a major role in confining class IV terminals to the anterior portion of the segment. (1.15 MB TIF) Figure S8 Plexin expression in sensory neurons (A, B, and D) and in the CNS (C). (A and B) Immunofluorescence visualisation of sensory neuron cell bodies labelled with antibody against horseradish peroxidase (HRP) (A) or PPK-EGFP (red) (in B) and Plex A (A and B) at 13-h AEL. Dorsal is up. (A) Plex A expression (white in ii and green in iii) in dorsal (d) and lateral (l) clusters of sensory neurons (white in i and red in iii). Strong Plex A expression is visible in sensory neuron cell bodies in both clusters. Scale bar: 15 µm. (B) Plex A protein (white in ii and green in iii) is strongly expressed in class IV md neuron cell bodies (white in i and red in iii). Scale bar: 10 µm. (C) Immunofluorescence visualisation of Plex A protein (white in ii and green in iii) in a transverse section of the neuropile labelled with HRP (white in i and red in iii) at 13-h AEL. Image shows a projection of a confocal z series of 1-mm thick transverse sections through abdominal segment A7. Dorsal is up. Arrowheads show the midline. Outlines indicate neuropile boundaries. Scale bar: 5 mm. (D) In situ hybridisation showing plex B mRNA expression in dorsal (d) and lateral (l) clusters of sensory neurons in the embryonic body wall. Dorsal is up. Scale bar: 20 µm. In situ hybridization protocol: DIG-labelled RNA antisense and sense probes were generated with the Ambion Megascript kit and DIG-UTP (purchased from Roche), following the manufacturer's instructions. In situ hybridization was performed according to a protocol kindly provided by Nipam Patel (University of California, Berkeley). DNA templates for in vitro transcription: DNA fragments were amplified by PCR with Primer1 (GCGCGCGTAATACGACTCACTATAGGG) and Primer2 (GCGCGCAATTAACCCTCACTAAAGGG) from pBluescript(SK)-PlexinB-CK00213 (AA142091) (EST, ~1.7-kb insert) using the following key PCR parameters: annealing at 66°C, 5 min extension at 72°C, 30 cycles; Primer1 and Primer2 include the T7 and T3 promoter sequences. In vitro transcription: plex B: T3 (antisense), T7 (sense). (3.90 MB TIF) Figure S9 Plex B and Plex A prevent expansion of sensory terminals into regions with high Sema 1a levels. (A–D) Representative images of sensory terminals labelled with PO163GAL4, UAS-n-syb-GFP (white) in 21-h embryos (left) and diagrams showing patterns of sensory terminals superimposed for different genotypes (right). In all cases images show projections of a confocal z series of transverse sections through A7. Dorsal is up. Arrowheads show midline. White lines, layer boundaries. Numbers indicate layers: M, medial; I, intermediate; L, lateral domains. Scale bar: 10 µm. (A) Expressing Plex B in sensory neurons in a wild-type background results in exclusion of sensory neuron terminals from neuropile layer 2 (see also Figure 2C). Quantification of SA2/h (SA2/h=SA(layer 2)/[hemisegment surface area]) reveals a significant decrease (***, p=4×10-17; Student's t-test; average SA2/h=0.003; SD=0.006; n=30 hemisegments) with respect to wild-type embryos (average SA2/h=0.06; SD=0.02; n=30 hemisegments). However, in these embryos, ectopic sensory terminals in layer 1 still remain largely excluded from intermediate and lateral portions of layer 1, which contain highest Sema 1a levels. Right: Diagram showing the pattern of Plex B expressing sensory terminals (green) superimposed on the wild-type pattern (yellow). (B) Expressing Plex B in sensory neurons in a sema 1a mutant background still excludes sensory terminals from neuropile layer 2. However, in these embryos ectopic sensory terminals invade the entire layer 1 and are no longer excluded from its lateral portions, which normally contain highest Sema 1a levels. This results in an overall increase in the surface area occupied by sensory neuron terminals in layers 1 and 3. Right: Diagram showing the patterns of Plex B expressing sensory terminals in sema 1a mutant (green) and wild-type (yellow) backgrounds, superimposed. Quantification reveals a significant increase in SA1+3+4/h (SA1+3+4/h=SA[layer 1+3+4]/[hemisegment surface area]) (***, p=4×10-5; Student's t-test) when Plex B is overexpressed in sensory neurons in a sema 1a mutant background (average SA1+3+4/h=0.6 and SD=0.08, n=22 hemisegments), compared to embryos in which Plex B is overexpressed in wild-type background (average SA1+3+4/h=0.5; SD=0.06; n=32 hemisegments). (C) In plex A mutant embryos, sensory terminals aberrantly invade neuropile layers 1, 3, and to a lesser extent layer 2. As in the case of sema 1a mutant embryos, this results in an overall increase in the surface area occupied by sensory neuron terminals in layers 1 and 3. Right: Diagram showing the patterns of sensory terminals in plex A mutant (green) and wild-type (yellow) backgrounds, superimposed. Quantification reveals a significant increase of SA1+3+4/h (***, p=6×10-34; Student's t-test) in plex A mutants (average SA1+3+4/h=0.7 and SD=0.08, n=62 hemisegments), with respect to wild type (average SA1+3+4/h=0.3; SD=0.05; n=32 hemisegments). (D) Expressing Plex B in sensory neurons in a plex A mutant background is sufficient to exclude sensory terminals from lateral and intermediate portions of neuropile layer 1. In these embryos ectopic sensory terminals do not invade regions of layer 1, which contain highest Sema 1a levels. As a result, sensory neuron terminals in layers 1 and 3 occupy a smaller surface area than in plex A mutants. Thus, in the absence of Plex A, Plex B is sufficient to exclude sensory terminals from regions of highest Sema 1a expression. Plex B may therefore function as a receptor for Sema 1a. Right: Diagram showing superimposed patterns of sensory terminals with and without Plex B expression in plex A mutants. Terminals with Plex B expression: green. Terminals without Plex B expression: yellow. Quantification reveals a significant decrease of SA1+3+4/h (***, p=4×10-9; Student's t-test) when Plex B is expressed in sensory terminals in a plex A mutant background (SA1+3+4/h=0.5 and SD=0.1, n=55 hemisegments), compared to plex A mutants (SA1+3+4/h=0.7 and SD=0.08, n=62 hemisegments). (E and F) Bar charts show the average SA1+3+4/h under different conditions as indicated. (E) The average SA1+3+4/h (average SA1+3+4/h=0.6, SD=0.08, n=22 hemisegments) is significantly higher (***, p=4×10-5; Student's t-test), when Plex B is expressed in a sema 1a mutant background, compared to its expression in wild-type background (average SA1+3+4/h=0.5; SD=0.06; n=32 hemisegments). (F) The average SA1+3+4/h (average SA1+3/h=0.5 and SD=0.1, n=55 hemisegments) is significantly lower (***, p=4×10-9; Student's t-test) for sensory terminals that express Plex B in a plex A mutant background, compared to plex A mutants (average SA1+3+4/h=0.7 and SD=0.08, n=62 hemisegments). In contrast, we did not observe a significant difference (p=0.6; Student's t-test) between the SA1+3+4/h of sensory terminals that express Plex B in a plex A mutant background (average SA1+3+4/h=0.5 and SD=0.1, n=55 hemisegments) and those that express Plex B in wild-type background (SA1+3+4/h=0.5; SD=0.06; n=32 hemisegments). (1.24 MB TIF) Figure S10 Fas II defects are not rescued by selective restoration of Plex B expression in sensory neurons. Graphs show percentage of segments (n=175) in which L1 (blue), I2 (green), I3 (yellow), M1 (black), M2 (white) tracts project aberrantly. (A) In ppkEGFP; plexB embryos Fas II tracts are severely affected. (B) When Plex B expression is selectively restored in sensory neurons alone, in UAS-plexB;PO163GAL4,ppkEGFP;plexB embryos, Fas II tracts continue to exhibit the mutant phenotype. (0.19 MB TIF) Table S1 Results of the misexpression screen. We identified 11 genes (2.6%) that change the pattern of sensory terminals, without producing pronounced changes in neuron number or preventing sensory axons from reaching the CNS. In these experiments, sensory terminals shift independently of Fas II tracts, which remain in their wild-type position and relation to each other. Of the 11 genes, two produced specific shifts along the dorso-ventral axis. The table gives the list of 11 genes, which, when misexpressed in sensory neurons alone, produce specific shifts in the dorso-ventral, medio-lateral or antero-posterior axes. (0.04 MB DOC) Time warping of evolutionary distant temporal gene expression data based on noise suppression Abstract Comparative analysis of genome wide temporal gene expression data has a broad potential area of application, including evolutionary biology, developmental biology, and medicine. However, at large evolutionary distances, the construction of global alignments and the consequent comparison of the time- series data are difficult. The main reason is the accumulation of variability in expression profiles of orthologous genes, in the course of evolution. Results We applied Pearson distance matrices, in combination with other noise-suppression techniques and data filtering to improve alignments. This novel framework enhanced the capacity to capture the similarities between the temporal gene expression datasets separated by large evolutionary distances. We aligned and compared the temporal gene expression data in budding (Saccharomyces cerevisiae) and fission (Schizosaccharomyces pombe) yeast, which are separated by more then ~400 myr of evolution. We found that the global alignment (time warping) properly matched the duration of cell cycle phases in these distant organisms, which was measured in prior studies. At the same time, when applied to individual ortholog pairs, this alignment procedure revealed groups of genes with distinct alignments, different from the global alignment. Conclusion Our alignment-based predictions of differences in the cell cycle phases between the two yeast species were in a good agreement with the existing data, thus supporting the computational strategy adopted in this study. We propose that the existence of the alternative alignments, specific to distinct groups of genes, suggests presence of different synchronization modes between the two organisms and possible functional decoupling of particular physiological gene networks in the course of evolution. Background Comparative analysis of evolutionary changes in distant organisms at the level of gene expression requires cross-matching (alignment) of temporal microarray data covering developmental time courses or cell cycles. Alignment of time series data or time warping allows side by side comparison of orthologous gene expression on a relative time scale [1-5]. The time warping produces non-linear alignment paths, which help estimate the relative duration of similar steps in the life cycle of diverged species. In addition, aligned temporal datasets can reveal concordantly and discordantly expressed pairs of orthologous genes or groups of genes. Currently available time-warping algorithms [6] stem out from early methods of speech recognition [7]. Benchmarking tests show that the existing methods under perform on noisy datasets and require accommodation to temporal expression data from organisms separated by large evolutionary distances [see Additional file 1 - Figures S1-S3] (and at UC Berkeley online resource: http://flydev.berkeley.edu/cgi- bin/GTEM/yeast_analysis/similarity_matrices.html). Here, we tested several noise suppression techniques, in order to optimize global alignment between the time series data from two species, separated by ~400 million years of evolution, budding (Saccharomyces cerevisiae) and fission (Schizosaccharomyces pombe) yeasts. Traditionally, yeast cell cycle served as a model system to study regulation of the periodic gene expression, replication, and cell division [8,9]. Evolution of the periodic gene expression in yeast has been explored based on classification approaches using temporal gene expression data [10-12], where an individual periodicity score was assigned to each ortholog and these periodicities and phases were then compared. In contrast to the classification approach, time warping captures information from all orthologous profiles in a single test. Exploration of alignments, constructed for S. cerevisiae and S. pombe using available methods and programs [1,6] [see Additional file 1 - Figures S1-S3] (and the UC Berkeley online resource) revealed presence of long gaps and noisy alignment paths. In this study, we introduced and thoroughly tested a novel data treatment and alignment framework, based on noise-suppression methods and elements of Kruscal-Liberman alignment algorithm [6]. The framework allowed us to override interspecific noise and to construct a global alignment for the two yeast species. This global alignment supported previously observed differences in duration of G1 and G2 cell cycle phases [13]. In order to explore alternative alignments, the pairs of the orthologous expression profiles have been aligned individually and the resulting individual alignment paths were clustered using common clustering algorithms [14]. Using this approach, we found gene groups or "time clusters," in which the relative synchronization modes (alignment paths, characteristic to each time cluster) were different from the global alignment path. Our analysis suggested that evolutionary shifts in durations of G1/G2 cell cycle phases are manifested in the expression timing of replication machinery and ribosomal genes. Instead, gene expression in mitochondria was desynchronized or evolutionary "disconnected" from the replication and housekeeping genes due to high autonomy of that organelle. Results and discussion Data selection and noise removal Success of the cross-specific time warping critically depends on the level of noise in time series expression data, "internal noise" and on the evolutionary variability in the gene expression between the two species or "external noise". The internal noise appears, for instance, due to the measurement errors between different microarrays (time points) and due to desynchronization of cell culture over time; the external noise is the result of accumulated in evolution differences in the orthologous expression or differences in expression caused by experimental conditions, such as selection of cell culture synchronization method. In this perspective, problems connected with the alignment construction are largely problems related to noise reduction and noise overriding. Ability to judge quality of alignments critically depends on the input data; data selection helps to find the least noisy/most reliable datasets. Therefore all 70 pairwise combinations of publicly available S. cerevisiae and S. pombe datasets [10-13,15-17] were explored using Kruscal-Liberman algorithm [6] based on either Euclidean or Pearson distance matrices (see Methods and UC Berkeley online resource). We adopted Pearson distance matrices to produce highly informative comparisons between time series and to cope with the external (evolutionary) noise (see Methods). The distance matrices revealed discernible periodic patterns similar to that observed in simulated periodic datasets [see Additional file 1 - Figure S1]. Notably, alignments based on the Pearson distance matrices sustained much higher external noise and were capable of capturing even subtle similarities between orthologous datasets [see Additional file 1 - Figures S1-S3] (and the matrices for all 72 pairwise comparisons, available from UC Berkeley web resource). Judged by the quality of the observed periodic patterns, we have selected for detailed analysis two pairs of datasets: S. cerevisiae synchronized by α-factor [12], and S. pombe synchronized either using cdc25 temperature sensitive mutant or elutriation [17]. Based on the amplitude of gene expression in the course of the life cycle, all genes in the yeast genome can be conditionally separated into (i) cell-cycle dependent (oscillating) (ii) constitutively expressed and (iii) inducible (not expressed or expressed constitutively in our datasets). Low oscillating and constitutive genes contribute less or no information to the global alignment, moreover, their actual expression dynamics can be masked by the internal noise. Therefore, we removed the low-variant genes from the selected datasets to improve sensitivity of the method. In the prior studies [10,17], Fourier analysis was used to eliminate the low-cycling genes. However, several factors, such as biased contribution of synchronization approaches, short duration of the datasets (two cell cycles) and high internal noise can make Fourier analysis fail for many genes, which, in fact, do cycle significantly. In the case of the yeast cell cycle, biological replicates were not available, and standard ANOVA-based filtering could not be applied. Therefore, we designed SNR (Signal-to-Noise Ratio) filter to eliminate noisy and low-cycling profiles. The SNR filter is analogous to ANOVA, but requires no replicates [see Additional file 1 - Figures S4, S5] (and Methods section). Statistical model for SNR takes into account assumption that periodically expressed genes would gradually increase or decrease expression level from one time point to another. In mathematical terms, the variance of point-to-point changes in a gene expression profile should be less than the variance of the data itself. The two selected datasets were filtered using the SNR method to remove the low-variance profiles and smoothed using Gaussian method to minimize the internal noise (see Methods section). The SNR filtering reduced the initial number of the orthologous profile pairs in the selected datasets (α-factor - S. cerevisiae vs. cdc25 - S. pombe) to 3193 or otherwise to 2518 genes in S. pombe and 2169 genes in S. cerevisiae (see ortholog matching in Methods). An apparent contradiction between the previously reported number of cycling genes (500) [18] and the number of genes investigated in this study (>2000) is explained by the fact that all significantly changing profiles (high variance) were scored in our study, even if they displayed moderate Fourier power at the desired period [see Additional file 1 - Figure S5]. We believe that in most instances low Fourier scores reflected biased expression at the start of the cell culture synchronization, desynchronization of cells after several cycles and/or measurement errors. After removal of the previously reported best cycling 500 profiles from our dataset, the distance matrix and the alignment did not change significantly [see Additional file 1 - Figure S6]. This test provided evidence that periodicity is present in the additional genes, as compared to prior studies, although it might be masked by the noise or conditions of the culture synchronization. We also found that Gaussian smoothing significantly improved detection of periodicity in the expression profiles (data not shown). Time warping results Distance matrices for the full datasets, each spanning approximately 2 cell cycles, were inspected to identify data ranges, corresponding to a single cell cycle in each species (see Figure 1B). A global alignment path has been constructed using time warping based on Kruscal-Liberman algorithm [6] using Pearson distance matrices (Figure 1A, B). Variations in the data treatment and the Pearson matrix construction parameters produced several possible paths; nevertheless, all successful alignments (gapless or with minimal gaps) followed nearly identical paths [see Additional file 1 - Figure S7]. Data ranges, corresponding to a single cell cycle were selected based on the periodic patterns observed on the distance matrix and standard cell cycle markers, characteristic to specific phases of the cell cycle [10,17]. The selected data ranges were aligned according to the global alignment path shown in the Figure 1C. Time warping for a single cell cycle has clearly shown that the comparative duration of cell cycle phases in the two species is different (see Figure 2E) and it is in a good agreement with the existing data [13]. According to the global alignment path, S. pombe has longer G2 phase and shorter G1 phase. This result supported our computational strategy, selected with respect to the high divergence between the analyzed species. Time warping and concordantly expressed genes. (A, B) Orthologous expression profiles for genes from the two yeast species are superimposed on the absolute timescale, before alignment. The selected data ranges correspond to ranges marked in Figure 1B. (C, D) Profiles for the same genes, adjusted according to the global alignment path, (relative time scale). (E) Correspondence between the cell cycle phases in S. cerevisiae and S. pombe, established based on the global alignment. Notice collapse of M-G1 phase and expansion of S-G2 phase region in S. pombe. Inspection of orthologous profile pairs in the aligned datasets revealed instances of both concordantly and discordantly expressed genes. 518 expression profiles, corresponding to approximately ~400 genes had very similar or nearly identical expression phasing in both organisms. Figure 2 shows the expression profiles of two orthologous pairs, which appear discordant on the absolute timescale (unaligned) and are nearly identical on the relative timescale, after time warping. DIA2 (Figure 2A, C) is the origin-binding F-box protein that plays a critical role in DNA replication and maintaining genome integrity. S. cerevisiae strains with DIA2 deletions have a high rate of endogenous DNA damage and are defective in S-phase progression [19,20]. Gene pof3, the ortholog of DIA2 in S. pombe has a similar function. In pof3 mutants the telomeres are substantially shortened and the normal telomere transcription/silencing is disrupted [21]. Another example, gene RPS17B (Figure 2B, D) encodes the ribosomal protein 51 (rp51) of the small (40S) subunit and displays concordant expression in the unaligned data sets as well. However, after time warping, the ribosomal protein has identical expression phasing in both yeast species. The majority of other ribosomal genes and genes involved into the ribosome biogenesis also displayed identical phasing of expression in the aligned datasets [see Additional file 1 - Figure S8] (and Figure 3). Alternative alignment paths and heterochrony Along with the global alignment, alignments for each individual pair of orthologs from the selected datasets were constructed and explored as well. This analysis has shown that the majority of concordantly expressed orthologous genes produced pairwise alignment paths similar to the global one (see examples in Figure 2 and Figure 3C). At the same time, individual alignments of discordantly expressed ortholog pairs often produced alignment paths different from the global (Figure 3A, B). We explored the pairwise alignments to see whether there are alternative alignment paths, common to certain groups of genes expressed discordantly with respect to the global alignment (idea first proposed by J. Aach and G. Church [1]). In order to reveal the alternative paths of time warping, all ortholog pairs were aligned one by one, using the Gene-warp program and the resulting individual alignment paths were clustered using k-means clustering with arbitrarily selected number of clusters k = 10 [14] (see Methods section). Figure 3 shows "time clusters" produced by the clustering of individual alignment paths. Technically, each time cluster corresponds to a group of individual profile pairs with similar pairwise time warping. From the biological standpoint, the time clusters correspond to synchronization groups; within each group, the expression synchrony among all genes is maintained during evolution, while synchrony between the groups (time clusters) is apparently lost (see Figure 3A- C). The concordantly expressed genes (regardless of the phasing of their expression) formed the largest time cluster or the largest synchronization group (Figure 3C, D), containing 518 expression profiles. As expected, the average alignment path of this largest synchronization cluster was close to the global alignment path. Other time clusters revealed different levels of desynchronization with the global path, varying from moderate (see cluster 8 in Figure 3D) to extreme (clusters 3 and 4 in Figure 3D). To explore why certain gene groups maintained synchronization in evolution, expression data, composing individual time clusters, have been explored further in S. cerevisiae dataset by consequent clustering of the expression profiles and GO-terms enrichment analysis. It has been found that the largest S. cerevisiae time cluster (Figure 3C) matched the global alignment path and contained expression profiles for many ribosomal and replication-related genes (Figure 3I, J) [see also Additional file 1 - Table S1]. Synchronization between the replication machinery, ribosomal and housekeeping genes suggests coordination between the cell division and the cell growth [22] in both yeast species. We also inspected the composition of the time clusters deviating from the global alignment path (Figure 3D, clusters 0 and 5). According to the results of the GO-term enrichment assays, some of these time clusters contained genes involved in respiration and protein synthesis in mitochondria (see Figure 3E-H). Such desynchronization or heterochrony observed between the mitochondrial and the ribosomal/replication genes can be attributed to the semi-autonomy of mitochondria and the mitochondrial gene expression [23]. Mitochondrial biogenesis in S. pombe is more similar to higher eukaryotes [24], so it is quite possible that the independent synchronization of some mitochondrial genes is maintained in higher eukaryotes as well. According to endosymbiotic theory [25], mitochondria entered eukaryotic cells nearly a billion years ago. Apparently, since then, some of the mitochondrial pathways (respiration, ribosome biogenesis) maintained their own, internal, synchronization of gene expression. Phase shift in expression of ribosomal and mitochondrial ribosomal genes detected in this study (see Figure 3G-J) appears to support hypothesis of decoupling of oxidative and reductive biochemical pathways in the yeast cell cycle [26,27], and possibly represents an example of heterochrony [28] at the level of gene expression. Time clustering, combined with the consequent profile clustering, helps in the superimposing and interpretation the evolutionary changes in gene expression. Without time clustering [see Additional file 1 - Figure S9], superposition of the orthologous profiles is much less informative (compare Figure 3E- J with Figure S9 D, E). Conclusion Desynchronization of gene expression in evolution Data selection/filtering and noise suppression strategy made it possible to build a global alignment between very diverse temporal expression data for yeast species, separated by ~400 million years of evolution. Specifically, it has been found that the Pearson metrics in the context of Kruscal-Liberman time warping enables aligning very diverse time series data. Alignments, constructed for the yeast species have been validated using prior biological knowledge. Recent studies in the evolutionary genomics field suggested presence of conservation between sub domains of large gene networks [29-31]. Preceding genome-wide studies of conserved genetic interactions in S. cerevisiae and S. pombe demonstrated conservation of genetic interactions between particular sets of genes, corresponding to protein complexes or pathways [32]. In this work, comparative analysis of gene expression dynamics has shown that parts of large gene networks (presumably corresponding to time-clusters) maintained substantial temporal synchrony in the course of evolution. The time warping in combination with the path and profile clustering allowed tracing synchronization for some housekeeping, structural and replication genes. However, analysis of regulatory components of cell cycle, such as cyclins, revealed no such synchronization or other shared evolutionary trends (data not shown). One possible reason for this is the dramatic rewiring of regulatory pathways during evolution. Mathematical strategies, described in the current work, can be applied to comparative analysis of expression data in any pair of organisms, separated by hundreds of millions years of evolution. The following factors may limit the area of application: (i) variability in gene expression under different experimental conditions (synchronization method, [see Additional file 1 - Figure S10 and Table S2]; (ii) strikingly different alternative alignment paths, specific to large group of genes (heterochrony, see Figure 3); (iii) distortion of gene expression profiles as the result of normalization and Gaussian smoothing. Methods Microarray data and low- level data processing All available microarray time-series data sets for S. cerevisiae and S. pombe [10-13,15-17] were downloaded from the author's sources [33] or NCBI GEO database [34]; ortholog tables were obtained from Valerie Wood (Sanger Institute) [35]. In cases where a single gene in one species corresponded to multiple orthologs in another species, multiple profile pairs were generated by duplicating the single expression profile and superimposing it to all matching orthologs. Alignment of data, obtained using different methods of cell synchronization has shown that there is 50-80% of consistency between dataset pairs displaying clear periodic patterns [see Additional file 1 - Figure S10 and Table S2]. Low-level data processing included the following steps: signal-to-noise filtering (SNR), upsampling, Gaussian smoothing and Z-score normalization. Z-score normalization was done using standard methods [36]. Upsampling and Gaussian smoothing were performed in order to reduce noise and improve quality of alignments. Upsampling is a standard way of converting analog signals to digital, new sample rate should be at least 2× higher than the highest frequency in the original signal (Nyquist- Shannon sampling theorem). Accordingly, all input datasets (37 time points maximum) were upsampled in this study to 100 time points. Gaussian filter was used to remove high frequencies, much higher than frequencies related to the cell cycle, assuming that the high frequencies are noise. Together, upsampling and Gaussian filtering might have effects similar to the interpolated time warping described earlier [1]. Attempts to filter out non-periodic profiles using Fourier methods [10] eliminated too many profiles with high variance, which is not surprising given very small number of periods in the microarray data (~2.5). Therefore, Fourier filter was replaced by signal-to-noise filter (SNR) [see Additional file 1 - Figure S1]. Original SNR filter based on non-parametric statistics has been designed for the analysis of microarray time series data, which lacks biological replicates. Consider local point-to-point variation Δx between the neighboring time points i and i+1 in jth expression profile: If the variance , the noise is high and the profile (probe) needs to be excluded from consideration (see additional file 1). Each expression profile in every data set was scored using the following log- ratio: In this formula, σ2(Αx) (pseudocount) is the average variance of the point-to point variation (noise) taken for all profiles of the entire data set (see eq. S1-S6 in the additional file 1). Similarity matrices and Time warping Euclidean similarity matrices take into account only the levels of gene expression at a given time point [1]. We found that this commonly used method failed in the case of alignment between S. cerevisiae and S. pombe [see Additional file 1 - Figure S2]. To improve sensitivity of time warping, we replaced Euclidean matrices by Pearson similarity matrices. Given time window size parameter n, {n ∈ (2N + 1){ one can compute value of the Uncentered Pearson correlation r for a given kth pair of the orthologous profiles a and b at the time points i and j as follows: This procedure returns agreement between segments of the two profiles, each of length n time points, centered on time point i and j correspondingly. Similarity between the time point i from dataset and the time point j from dataset , for all M pairs of orthologous profiles, was computed as a standard Pearson distance: The Pearson similarity matrices have higher sensitivity and produce better alignments [see Additional file 1 - Figure S2] (and UC Berkeley web resource) then Euclidean as they collect more information in each point-to- point comparison (see eq. S7-S12 in the additional file 1 for more details). The described method has been implemented in software package GT-Warp. The package includes the following programs and utilities: "AVF-filer" and "RZ-smooth" are programs for low-level data filtering and processing. These programs include common methods such as Fourier analysis, ANOVA, F-test, Gaussian smoothing, resampling, and normalization. AVF-filer program also includes SNR method described above and "VF-stat" utility for simulating SNR score distribution in random data. The program "Time-warp" incorporates both Euclidean and Pearson methods (see above), generates global alignment matrices using Kruscal-Liberman algorithm [6], and has graphical outputs, such as shown in Figure 1. The program "Gene-warp" incorporates the same methods and is intended for one-to-one alignment of orthologous profiles. Gene-warp produces alignment paths data, which can be clustered using standard methods, such as Cluster 3.0 [37]. GT-Warp package also includes program "M-align" for aligning datasets based on matrices produced by Time-warp and a "Prf-browser" to browse and display orthologous profiles on the same plot. GT-Warp package has been written in Perl and compiled for Win32; source code is available upon request, Win32 distribution, help, and test files are available from UC Berkeley online resource. Clustering alignment paths Alignment paths for individual profile pairs were generated using Euclidean method (Gene-warp program). The paths were clustered using K-means clustering method, producing 10 temporal clusters using Cluster 3.0 program with default parameter settings [14,37]. S. cerevisiae expression profiles from each temporal cluster (or "time cluster") were clustered again, using K-mean clustering method, producing 10 sub-clusters within each of the 10 time clusters. Enrichment of gene ontology terms in the time clusters and subclusters was carried out using GeneMapp 2.0 package [38] [see Additional file 1 - Table S1]. Construction of an annotated corpus to support biomedical information extraction Abstract Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes. Background Due to the rapid advances in biomedical research, scientific literature is being published at an ever-increasing rate [1]. Without automated means, it is difficult for researchers to keep abreast of developments within biomedicine [2-6]. Text mining, which is receiving increasing interest within the biomedical field [7,8], enriches text via the addition of semantic metadata, and thus permits tasks such as analysing molecular pathways [9] and semantic searching. Semantic searching above the level of concepts depends on prior processing to recognise relations or events in texts, which is carried out by information extraction (IE) systems. Due to domain-specific features of texts and the types of events to be recognised, IE systems must be adapted to deal with specific domains. A well-established method of carrying out this adaptation is through training using annotated corpora (e.g., [10-12]). Our work has been concerned with the development of such a corpus for the biomedical field, the Gene Regulation Event Corpus (GREC), consisting of MEDLINE abstracts semantically annotated with event information. Our approach is based on the fact that many events are focussed on either verbs (e.g., transcribe, regulate) or nominalised verbs (e.g., transcription, regulation). Both types of word behave in similar ways, in that they specify arguments that can convey a range of different types of information related to the event. For each relevant event, our annotation aims to identify, as exhaustively as possible, all structurally-related arguments within the same sentence. Each argument is assigned a semantic role from a fixed set of 13 roles. Where appropriate, arguments are also assigned a biological concept type. The GREC may be downloaded from http://www.nactem.ac.uk/GREC/. A copy of the corpus is also available in Additional file 1. To our knowledge, the GREC provides the richest annotation to date within the biomedical field, in terms of the number of arguments types and their characterisation. As such, the corpus is specifically designed to contribute to the development of bio-specific semantic frame resources and semantic role labellers (SRLs) which, although active areas of research within the general language domain, have received less attention within the bio-IE domain. Related work Within the field of bio-IE, evaluations such as the LLL05 challenge [13] and BioCreative II [14] have focussed attention on the recognition of protein-protein interaction (PPI) events from the literature. There now exists a number of corpora (e.g., [13,15,16]) and systems (e.g., [17-19]) tailored to this task. However, many other types of events and information are relevant within biomedicine, such as gene regulation and expression events, location of protein in the cell, protein-DNA interaction, etc. [20]. Extraction of such events often requires the recognition of more complex information than just interacting proteins. Several corpora and systems concerned with the annotation of more complex events have recently been developed, e.g., [21-24]. These differ in a number of ways, including: • Range of events - whether a single type of event or multiple event types are annotated. • Event arguments - the number and types of arguments (i.e., participants) in each event may be fixed or flexible. More detailed information types, e.g., location, time or experimental setup may or may not be identified as event arguments. • Scope of events - whether event arguments must occur within a single sentence or whether they may occur across multiple sentences. • Semantic information assigned to arguments - this may correspond to named entity types and/or semantic roles. In the case of semantic roles being assigned, they may be tailored to a particular type of event, or they may apply to a large range of different events. Whilst events are often identified by verbs, nominalised verbs play a particularly important role within biomedical texts, and often outnumber other domain-specific verbal forms [25]. However, it is acknowledged that they are more difficult to process than verbs [26] and are currently only dealt with by a small number of systems, often in a limited way (e.g., [17,27,28]). Due to the central nature of verbs and nominalised verbs in the description of events, accurate event extraction requires information about the way they behave in text, in terms of: • Their syntactically-related arguments, e.g. causality, location, manner, etc. • Semantic information relating to each argument (e.g., semantic roles or restrictions on the types of phrase that can constitute each argument). The production of corpora annotated with such information allows real usage within text to be taken into account. Large-scale annotation of corpora within the general language domain at this level of detail has resulted in the production of resources containing syntactic and semantic frame information, which deal with both verbs and nominalised verbs [29-32]. Such annotated corpora also facilitate the training of components of IE systems, with a large amount of research having been devoted to semantic role labelling (SRL) [33]. Some studies (e.g., [34,35]) have shown that, to a certain extent, general language resources can also be useful in the training of SRLs for biomedical texts, due to the fact that many verbs appear in texts from both the general language and biomedical domains, and often behave in similar ways. However, the cited works concede that, whilst such SRLs may produce adequate results for certain predicates, training using biomedical corpora is also needed. This is because domains such as biomedicine employ sublanguages [36], in which the "informational content and structure form a specialized language that can be delineated in the form of a sublanguage grammar". NLP systems must take such grammars into account to allow accurate processing of text within specialist domains [37]. Sublanguage grammar features that are relevant to our work include the following: • The types of events found in biological sciences are often described using verbs/nominalised verbs that do not feature prominently in general language [26], e.g., methylate. • Verbs/nominalised verbs that occur in both the general and specialised language domains may have different syntactic and semantic properties in each domain, e.g., differing numbers of arguments [38], as well as different meanings. For example, translation generally means rendering one language into other, while in Molecular Biology it specifies the process of protein synthesis from an mRNA template. Whilst there have been some attempts to produce bio-specific extensions to the general language resources described above, e.g., [38,39], together with semantic role labellers [20,40], they currently have limited coverage. The UMLS SPECIALIST lexicon [41], which includes many biomedical terms, is larger scale, but includes only syntactic, and not semantic, information about verbs. Motivation Existing event corpora within the domain (e.g., [21] and [23]) are not specifically geared to support the acquisition of semantic frame information for verbs. The bio-NLP community has, until now, lacked a domain-specific linguistically-oriented corpus in which detailed semantic information for a wide range of both verbs and nominalised verbs has been annotated. This has limited the amount of research undertaken on the production of domain-specific semantic frame resources and SRLs. In response to this, we have designed a new event annotation scheme which is specifically tailored to this purpose. The scheme has subsequently been applied to the annotation of event instances relating to gene regulation and expression in MEDLINE abstracts. Our scheme differs from those of previous event corpora in the field in a number of important ways: • It captures the semantic annotation of as many structurally-related arguments as possible of a large number of verbs and nominalised verbs describing gene regulation and expression events. This is important since, according to [20], and as confirmed by us through consultation with biologists, types of information such as location, manner, timing and condition, which can appear in various syntactic positions, are all essential for describing biomedical relations. A sentence-based approach facilitates the linking between semantic information and syntactic structure. • It bridges linguistic and biological knowledge: From the linguistic perspective, all arguments are characterised using semantic roles. We have defined a new, closed set of event-independent roles which are designed for application to arguments of a range of types of biomedical events. Closed sets of semantic roles are advantageous in that they facilitate generalization over different types of events [25,42]. Although their application to general language may be problematic [30], the use of a closed set is viable in a restricted domain, as domain-specific definitions can be provided for each semantic role type. From the biological perspective, appropriate arguments are additionally assigned a biological concept type from a hierarchically-structured set that is tailored to the gene regulation domain. The concepts are mapped to classes in the Gene Regulation Ontology (GRO) [43]. The combination of semantic role and biological concept labels provides a rich annotation, aimed at allowing users to have a large amount of flexibility over the type of query they specify and to have control over the specificity or generality of certain parts of the query, e.g.: In LOCATION:E. coli, AGENT:NifA activates which THEME:GENE. This query would search for instances of events in which a LOCATION, AGENT and THEME are specified. The values of the semantic roles may be specified either as specific words or phrases (e.g., E. coli or NifA) or more general named entity categories (e.g., GENE). The GREC consists of 240 MEDLINE abstracts, which have been annotated with a total of 3067 events. Whilst of modest size compared to some other domain-specific event-annotated corpora (e.g., [21]), this is balanced by the richness of the annotations. The corpus has already been used in the development of the BioLexicon [44]. This unique text mining resource for biology provides and links syntactic and semantic frame information for a large number of biomedical sublanguage verbs. In addition, the lexicon contains (1) derived forms of these verbs (including nominalisations), (2) general English words frequently used within the biology domain and (3) domain terms, gathered (and interlinked) both from existing databases and through the application of text mining techniques. Initial machine learning experiments using the GREC [45] suggest that it can be used to train IE components with reasonably good performance, with both named entity extraction and semantic role labelling having achieved F-scores of around 60%, based on 10-fold cross validation. A further direction of research which could help to improve the performance of IE systems trained on the GREC is introduced in [46], in which it is demonstrated that, due to the differing perspectives of different annotation schemes, it is not always the case that larger corpora contain the most useful information. The reported study found that, whilst small corpora may not be large enough to train IE systems in their own right, augmenting such corpora with training instances derived from other corpora can help to improve the performance of the trained system. This provides convincing evidence that combining smaller, richly annotated corpora, such as our own, with larger corpora which are slightly poorer in information content, could provide a future direction of research for training more accurate biomedical IE systems. This idea is especially attractive, given that the production of large, richly annotated corpora can be very time consuming. In the remainder of this paper, we firstly cover the key aspects of our annotation scheme, followed by a description of the recruitment and training of annotators. We follow this by providing detailed statistics, results and evaluation of the GREC, and finally present some conclusions and directions for further research. Methods This section is concerned with the preparatory work required prior to the annotation of the GREC. Beginning with a clarification of our notion of an event, we then provide a description of the key features of our annotation scheme. A brief overview of the annotation software used and of its customisation is followed by details regarding the annotators and their training. As performance during training was measured quantitively through the calculation of inter-annotator agreement (IAA) scores following each cycle of training, we provide details and motivation for our chosen evaluation metric, the F-measure. Finally, we provide an analysis of the IAA results attained during training. Events in biomedical texts In this section, we clarify our notion of an event. Firstly, we provide some simple examples of event instances that relate to gene regulation and expression within biomedical texts: 1) In Escherichia Coli, glnAP2 may be activated by NifA. 2) Our results show that glnA encodes glutamine synthetase. For each event, two types of information may be specified in the text, both of which are important to its correct interpretation: • The participants (or arguments) of the event. In sentence 1), there are 3 arguments specified by the verb activated, i.e., In Escherichia Coli, glnAP2 and NifA. • Higher level information (called modality) about how the event should be interpreted. For example, the word may in sentence 1) indicates that there is some uncertainty about the truth of the event, whilst the phrase Our results show that in 2) indicates that there is experimental evidence to back up the event described by encodes. Several recent articles (e.g., [47-51]) have reported on attempts to annotate information such as certainty, evidence or negation within biomedical texts. Our current work concerns the first of these information types. Specifically, the annotation task consists of the following, in sequence: a) Identifying relevant instances of events that relate to gene regulation and expression. b) Identifying all arguments of the event that are specified within the same sentence. c) Finally, assigning semantic roles and biological concepts to these arguments. Table 1 illustrates the type of information that would be annotated for sentences 1) and 2) above. Example annotation output Annotation scheme In this section, we outline some key aspects of the annotation scheme. Firstly, we describe the types of events on which the annotation is focussed, i.e., gene regulation and expression. This is followed by a more detailed description of the semantic roles and biological concepts assigned to event arguments. Finally, we provide an account of some of the steps taken to ensure the consistency of annotated text spans. To accompany the annotation scheme, we have produced a detailed set of annotation guidelines, as these are necessary to aid in the achievement of high quality annotations [52-54]. The structure and content of these guidelines were iteratively refined in discussion with domain experts and with annotators (via group discussion sessions following annotation training phrases and full annotation cycles). The guidelines are available to download from http://www.nactem.ac.uk/GREC/, and are also available in Additional file 1. Gene regulation and expression events The current annotation effort is concerned with events relating only to gene regulation or expression, i.e., events that describe any interaction which leads, either directly or indirectly, to the production of a protein. Annotation is restricted to sentences that contain some mechanical description of transcription, translation or post-transcriptional modifications and/or their controls. Annotators were helped by (but not restricted to) a list of verbs which we created that potentially denote gene regulation and expression events. These are automatically highlighted by the annotation software, WordFreak (see Software section below), in each abstract to be annotated. The basis of this list was a set of 229 hand-picked gene regulation verbs provided by the European Bioinformatics Institute (EMBL-EBI). We augmented this list through the automatic extraction of further verbs from an E. coli corpus. This corpus contains approximately 33,000 MEDLINE abstracts and was also provided by EMBL-EBI. The automatic extraction was carried out by identifying those verbs whose syntactic arguments corresponded either to terms identified by the TerMine tool (http://www.nactem.ac.uk/software/termine/)or biological named entities identified by the GENIA tagger [55]. The complete list was subsequently reviewed for relevance by a biology expert, resulting in a list of 353 verbs. Semantic roles Starting with the generic semantic roles proposed for VerbNet [29] and PropBank [30], we examined a large number of relevant events within MEDLINE abstracts, in consultation with biologists. We concluded that arguments of gene regulation and expression events may be characterised using a subset of these general language roles, with the addition of the domain-specific CONDITION role. In some cases, we changed the names of the roles used in other resources in an attempt to make them more easily understandable to biologist annotators. From VerbNet, we have used the roles AGENT, THEME, INSTRUMENT, LOCATION, SOURCE and DESTINATION. Our RATE and TEMPORAL roles are based on the VerbNet EXTENT and TIME roles, respectively. MANNER and PURPOSE come from PropBank's set of general roles that are applicable to any verb. We also saw a need for a role similar to the VerbNet PREDICATE role, to deal with cases such as: The (cAMP)-cAMP receptor protein complex functions as an activator..., where an activator corresponds to the PREDICATE role of functions. For our own purposes, we created 2 separate roles, DESCRIPTIVE-AGENT and DESCRIPTIVE-THEME, and extended the characterisation of these roles to apply not only to predicatives, but also to any argument which describes characteristics or behaviour of either the AGENT or the THEME of the event. The full set of roles is shown in Table 2. In general, definitions of argument types normally specified as adjuncts, such as MANNER, INSTRUMENT, CONDITION and LOCATION, can be problematic to distinguish from each other. However, our use of more biologically-oriented definitions for these cases aims to reduce discrepancies. Semantic roles Biological concepts Our biological concept labels are organised into hierarchies based on the Gene Regulation Ontology (GRO) [43]. This ontology, which integrates and builds on parts of other established bio-ontologies, such as Gene Ontology [56] and Sequence Ontology [57], is also included within the list of ontologies of the OBO Foundry [58]. Our biological concept labels are arranged within 5 different hierarchies, corresponding to the following supercategories: Nucleic_Acids, Proteins, Living_Systems, Processes and Experimental. During annotation, biological concepts are identified within each semantic argument. In each case, the most specific concept category possible within the appropriate hierarchy is assigned, based on the context in which the concept occurs. The aim of this is to allow queries over extracted event instances to be performed at different levels of granularity, i.e., users could specify more general or less general concept types according to their requirements. Consider the following example: To map the regulatory domain of Escherichia coli T-protein... Here, it is possible to assign the specific concept category Domain (within the Proteins hierarchy) to the concept the regulatory domain, which is a functional part of a protein. However, the following example presents a greater challenge: IHF may inhibit ompF transcription by altering how OmpR interacts with the ompF promoter. Here, IHF is clearly a repressor. However, the specific category of OmpR is ambiguous from the context of the sentence between: • An activator of the ompF promoter. • A repressor of the ompF promoter. Therefore, the more general category of Regulator is assigned to OmpR. Consistent annotation of text spans The task of annotating consistent text spans is often challenging [52], but is important to ensure a cleanly annotated corpus which is easy to understand, reuse and process. Consider the following sentence: The Klebsiella rcsA gene encoded a polypeptide of 23 kDa The AGENT of encoded may be viewed as any of the following spans: Klebsiella rcsA, Klebsiella rcsA gene, or The Klebsiella rcsA gene. Similarly, the THEME could be polypeptide, a polypeptide or a polypeptide of 23 kDa. In order to promote consistent choice of spans, we have created a number of guidelines which are mostly based on syntactic chunks. Prior to annotation, chunks are automatically identified by the GENIA tagger [55]. The example below illustrates the output of the tagger, in terms of the chunks identified. Note that, according to the output of the GENIA tagger, PP chunks contain only the preposition, and not the following NP. [NP The Klebsiella rcsA gene] [VP encoded ] [NP a polypeptide ] [PP of ] [NP 23 kDa] According to our guidelines, annotated text spans should normally consist of (sequences of) complete chunks, thus alleviating many issues relating to the exact words that should be included within an argument text span. This means that, for example, in the above sentence, the AGENT of encoded should be chosen as The Klebsiella rcsA gene, as this corresponds to a complete NP chunk. A further guideline stipulates that argument text spans must consist only of base NP chunks, and that additional descriptive information, usually introduced by prepositions, must be excluded from argument text spans. In the above sentence, application of this rule means that the THEME of encoded is only the chunk a polypeptide, whilst of 23 kDa is excluded from the argument text span. Software The annotation of the GREC was performed using a Java-based annotation tool called WordFreak (http://wordfreak.sourceforge.net/[59]). The tool is designed to support many kinds of annotation of text documents, and can be adapted to new tasks fairly straightforwardly by producing new Java classes that define the task. Much of the work to customise WordFreak for the current task was carried out by ILC-CNR (http://www.ilc.cnr.it) in Pisa. The customisation helps annotators to conform to the guidelines in a number of ways. For example, occurrences of biologically-relevant verbs are automatically highlighted. In addition, colour-coding is used to distinguish different types of chunks, whilst certain restrictions are imposed in the tool as regards the types of chunks that can constitute different types of semantic arguments, e.g., ADVP chunks can only be labelled with the MANNER role. Annotators and training Due to the requirement for biological knowledge and complete understanding of the abstracts, annotation was undertaken by 6 biology PhD students with native or near-native competency in English. It was also required that annotators had at least some experience in gene regulation. Linguistic expertise would be acquired through the training programme and through study of the annotation guidelines. As the annotation was carried out as part of the EC BOOTStrep project (http://www.bootstrep.org), it was subject to strict time constraints, with the amount of time to complete the annotation work being limited to three months. This time constraint firstly meant that we were unable to recruit annotators who all had a similarly high level of knowledge of gene regulation and expression. In addition, due to the envisaged steep learning curve for annotators, it was decided to devote the majority of the time available to annotator training. The employment of 6 annotators, however, allowed a medium-sized final GREC to be annotated in a relatively short space of time. Initial training sessions introduced the annotation tool and the task, with a particular emphasis being placed on clear positive and negative examples of gene regulation and expression events. This was considered particularly important for those annotators with less experience in gene regulation and expression. The initial training sessions were followed up by 5 fortnightly cycles, during which abstracts were firstly annotated by the annotators and then examined by 2 of the authors (one with biological expertise and the other with linguistic expertise), who produced individual feedback reports for each annotator prior to the start of the next cycle of annotation. Additional regular group sessions allowed problems to be discussed in more detail. The calculation of IAA scores after each training cycle provided a quantitative measure of improvement during the training period. Prior to providing these scores, we first describe our method of calculating agreement. Calculating inter-annotator agreement We have defined a novel evaluation methodology which calculates IAA for a number of separate subtasks of the annotation process. These subtasks are as follows: • Event identification (how frequently annotators agree on which events to annotate). • Argument identification (for agreed events, how frequently the same arguments are chosen by each annotator). For this task, we calculate separate agreement rates corresponding to: a. Relaxed span matches, where argument text spans identified by a pair of annotators at least overlap with each other, but do not necessarily match exactly. b. Exact span matches, where argument text spans identified by a pair of annotators match exactly. This statistic helps us to evaluate the effectiveness of our rules for consistent span annotation. • Semantic role assignment (for agreed arguments, how often the same semantic roles are assigned by each annotator). • Biological concept identification (within agreed arguments, how often annotators identify the same biological concepts). • Biological concept category assignment (for agreed biological concepts, how often the assigned categories are agreed upon by each annotator). For this task, we calculate 3 different agreement rates, i.e., a. Exact category matches, where each annotator has assigned exactly the same concept label. b. Matches including parent, where we also consider as matches those cases where the category assigned by one annotator is the parent concept of the category assigned by the other annotator. c. Supercategory assignment, where we consider only whether each annotator has assigned a concept within the same top level superclass, i.e., Nucleic_Acids, Proteins, Living_Systems, Processes and Experimental. Whilst the Kappa statistic [60] has become a standard way of calculating IAA for classification tasks, it is problematic for most of the annotation subtasks outlined above, as it requires classifications to correspond to mutually exclusive and discrete categories. The only subtask which fits neatly into this category is the semantic role assignment subtask. We have thus chosen to follow [61] in choosing the F-Score to calculate IAA, as it can be applied straightforwardly to all of the above annotation subtasks. The F-Score is the harmonic mean of precision and recall scores, which are normally calculated to compare the performance of an information retrieval or extraction system to a gold standard. For the purposes of calculating IAA, precision and recall between two annotators can be calculated by treating one set of annotations as the gold standard. The F-score is the same whichever set of annotations is used as the gold standard [62]. Agreement during training Table 3 reports the changes in the IAA rates as the training period progressed. Four of the cycles (C1 - C4) concerned E. coli abstracts, whilst the final cycle (C5) switched to annotation of human abstracts. As the final corpus would consist of both E. coli and human abstracts, we wanted to verify to what extent annotation quality could be maintained if the species referred to in the abstracts is changed. Inter-annotator agreement during training The general trend was for the agreement rates to rise gradually between training cycles C1 and C4. In addition, the discrepancy between relaxed and exact span matches narrowed as the training progressed. For most tasks, the agreement rates peaked at the end of cycle C4, with most agreement levels falling in the range 70% - 90%, which we consider to be acceptable [47]. When the species referred to in the abstract was changed (from E. coli to human), this resulted in a drop in agreement rates for most tasks, particularly bio-concept assignment, suggesting that a period of adjustment is required when switching to a new species. Two tasks, however, i.e., semantic role assignment and argument identification, seem more domain-independent, in that the agreement rates stayed constant, or even continued to rise slightly, when the species referred to in the text was changed. The main exception to the general trend for improvement is in the assignment of biological concept categories, for which there was no discernible improvement during the training period. Differing levels of experience in gene regulation may have caused annotators to vary in their ability to accurately assign fine-grained biological concept categories. However, higher levels of agreement are achieved if we take the hierarchical structure of the concept categories into account, and look at cases where the category assigned by one annotator is the parent of the term assigned by the other. If we map all concept categories to their top level supercategories, then agreement rates of up to 90% (after cycle C4) are achieved. Results and discussion Following the training period, the final annotated GREC was produced. In this section, we provide details, statistics and analysis of this corpus. Following some initial general statistics regarding the corpus, we move on to examine the most commonly annotated verbs and nominalised verbs on which events are centred. Subsequently, we examine in more detail the arguments of events, including an analysis of the numbers of arguments that occur in different events, the distribution of different semantic argument types, and the most commonly occurring patterns of arguments. Biological concept assignment is then covered, including the distribution of the assigned concepts amongst the five different supercategories, together with an analysis of the most commonly assigned concepts. Finally, we consider quality control of the GREC, including both IAA scores and annotator discrepancies that were found through manual examination of the corpus. Corpus characteristics and statistics Candidate abstracts for annotation for the final GREC were selected from species-specific corpora of MEDLINE abstracts collected by EMBL-EBI, who chose abstracts relevant to the E. coli and human species using their own rule-based species-filtering methods. The candidate abstracts were further screened for relevance to gene regulation by one of the authors with biological expertise. General statistics regarding the GREC are shown in Table 4. The effort expended by the 6 annotators amounted to a total of 876 person hours (equivalent to 6.4 person months). General corpus statistics The statistics in Table 4 reinforce the importance of considering events that are described by nominalised verbs as well as those that are described by verbs. In the E. coli corpus, events that are centred on nominalised verbs are almost as common as those centred on verbs, although the range of different words that are used to describe events is much greater for verbs than for nominalised verbs. Verbs and nominalised verbs expressing events Table 5 shows the top 10 most common words (verbs and nominalised verbs) which express events, both in the corpus as a whole, and separately for the E. coli and human parts of the corpus. In each case, events centred on these 10 words constitute 45 - 50% of the total events annotated, suggesting that the majority of relevant events are centred on a relatively small set of words. Indeed, in the corpus as a whole, only 55 words (either verbs or nominalised verbs) have been used to annotate 10 or more events. Most common words describing events Most of the words in Table 5 correspond to important biological processes. For some of these processes, occurrences of both the verbal and nominalised forms are quite common, e.g., regulate/regulation, bind/binding, repress/repression, activate/activation. In other cases, there appears to be a stronger preference for either the verb or the nominalised verb. In the E. coli portion of the corpus, for example, twice as many events are centred on the nominalised verb expression than any other word. Transcription is also rarely used in its verbal form, i.e. transcribe (16 times in the complete corpus), whilst encode is only ever used in its verbal form. Event arguments In this section, we provide some statistics regarding annotated event arguments. Firstly, Figure 1 provides an analysis of the numbers of arguments that were identified for different events. Distribution of event argument counts. Each section of the chart shows the percentage of events in the GREC that have been annotated with the indicated number of arguments. Whilst it is most common for 1 or 2 arguments to be specified, 15% of events specify 3 or more arguments. However, as Figure 1 shows, it is extremely rare in our corpus for 4 or more arguments to be specified. Table 6 provides statistics regarding the semantic roles that were assigned to arguments. Semantic role occurrences In addition to the 13 roles already introduced, there is a further role named Underspecified, which was to be assigned by annotators to arguments that could not be characterised by one of the 13 defined roles. However, the fact that the Underspecified role was only assigned 11 times in the whole corpus suggests that our originally-defined role set is sufficient to characterise the vast majority of semantic arguments. The AGENT and THEME roles, which provide the most fundamental information about events, are by far the most commonly assigned. Whilst it may seem surprising that only about half of the events specify an AGENT, this can partly be explained by the relatively high occurrence of events that are centred on nominalised verbs (42% of all events) and passive constructions (14% of events). According to our corpus, only around 20% of events centred on nominalised verbs and 50% of events using passive verb constructions specify an AGENT. Several other roles feature fairly prominently in the events, particularly MANNER, LOCATION, DESTINATION and CONDITION, which is in line with observations made by [20]. In Table 7, the most common patterns semantic roles assigned to event arguments are shown. The most common pattern is for only an AGENT and a THEME to be a specified, constituting almost a third of all events. When events do specify a third argument, it is most common for the AGENT and THEME, plus one additional type of argument, to be present. Most common semantic role patterns Biological concepts In the corpus as a whole, 5026 biological concepts were identified. The distribution of the categories assigned to these concepts amongst the five supercategories is shown in Figure 2. Distribution of biological concept supercategories. Each section of the chart shows the percentage of annotated biological concepts in the GREC that have been assigned a concept class belonging to the indicated supercategory. The supercategories Nucleic_Acids and Proteins are so dominant because most gene regulation and expression events describe some kind of relationship between entities of these two types. The Processes supercategory is also very common, as concepts assigned to this correspond to "embedded" events that describe a mechanistic link between Nucleic_Acid and Proteins, e.g.: Expression of the ompF and ompC genes is affected in a reciprocal manner by the osmolarity of the growth medium. Annotators were instructed to assign the most specific concept possible in the hierarchy; the results show that 66.91% of assignments indeed constitute the most specific concepts. Table 8 compares the most commonly assigned concepts in the E. coli and human parts of the corpus. Comparison of concept assignments in E. coli and human abstracts Gene constitutes the most commonly assigned concept in both parts of the corpus. It is a general, rather than a specific concept in the Nucleic_Acids hierarchy. However, the frequency of assignments of its specific subtypes, i.e., Mutant_Gene, ORF and Allele is very low, with 19, 10 and 1 assignment, respectively. This suggests that more specific concept type assignment for genes can be problematic. The category Transcription_Factor also has far more assignments than its sub-categories, Repressor and Activator in the human part of the corpus. However, Transcription_Factor is not nearly as frequent in E. coli abstracts as in human corpus (see Table 7). These differences represent important biological information: due to the relative complexity of eukaryotic systems, transcription factors play a very important role in gene regulation compared to prokaryotes, like E. coli. Quality control Previously, we showed that good rates of agreement were achieved by the end of the training period. To ensure annotation quality was maintained in the final GREC, approximately one quarter of the abstracts was annotated by all annotators. In this section, we firstly present some general agreement statistics relating to the whole corpus, followed by more detailed statistics regarding semantic role and biological concept assignment. Finally, we examine some types of annotator discrepancies that were found through manual examination of the corpus. General agreement statistics Average agreement rates for the final corpus were calculated in the same way and for the same annotation subtasks as during the training period. These are reported in Table 9, categorised according to abstract subject. In most cases, agreement rates are maintain the same level, or in some cases exceed those attained by the end of the training period. General agreement statistics in the GREC Particularly high levels of agreement (88% or above) are achieved for both the identification of semantic arguments and the assignment of semantic roles to these arguments. As these are the subtasks that we originally identified as being more linguistically-oriented than others, our results suggest that a detailed set of guidelines, together with an intensive training programme, allow these tasks to be carried out by biologists to a high degree of accuracy. Semantic role assignment Table 10 provides more detailed agreement rates for semantic role assignment. High levels of agreement (over 84%) are achieved amongst many of the most commonly occurring roles, including AGENT, THEME, MANNER, LOCATION, DESTINATION and SOURCE. However, CONDITION and DESCRIPTIVE-THEME are also fairly common, but have lower rates of agreement. Discrepancies have been examined and are further discussed in the Annotator Discrepancies section below. Most of the other role types occur much less frequently in the corpus (varying between 1-5% of events), meaning that the agreement rates shown may be less reliable. Individual role agreement statistics Biological concept assignment Whilst Table 9 showed that coarse-grained biological category assignment achieved around 95% agreement, the assignment of finer-grained categories achieved the lowest agreement rates amongst all annotation subtasks. Table 11 shows the most commonly assigned categories in each portion of the corpus, together with their agreement rates. Individual biological concept category agreement statistics Table 11 illustrates that there are several differences in the most commonly assigned concepts according to the species referred to in the abstract (i.e., E. coli or human). There are also large differences in the rates of agreement for different categories, which are not correlated with their frequency of occurrence. High levels of agreement (over 75%) are achieved for a number of these categories, most notably Transcription, Cells, Regulation, Promoter, Gene and Enzyme. In general, the classes with the highest agreement seem to be those that do not have very specific interpretations, i.e., those concepts with broader interpretations which are understandable by biologists with different backgrounds. This means that the highest levels of agreement have been reached when the context dictates that a very specific concept cannot be assigned. Less agreement is achieved for categories that are more specific to the context of gene regulation and expression, such as Activator, Repressor, Transcription_Factor, etc. Annotator discrepancies Certain discrepancies between annotators exist in the final corpus, of which a number are highlighted in this section. Whilst the identification of these discrepancies will help to refine the guidelines for future phases of annotation, it was also found that certain errors were being made which were already covered in the guidelines. Thus, there may be a need to more carefully balance conciseness with comprehensiveness in the guidelines. Event identification The majority of discrepancies in event identification concern nominalised verbs. A particular example is the word mutation, which can be used either as a nominalised verb (i.e., the action of mutating), or as an entity (e.g., a mutated gene). However, the distinction can sometimes be problematic. Consider the following examples: 1) In addition, the pleiotropic phenotypes conferred by a particular envZ mutation (envZ473) required the presence of functional OmpR protein. 2) Therefore, OmpF reduction resulted in a mutation in the marA region. In sentence 1), a particular envZ mutation seems to describe a mutated entity rather than the action of mutation. In contrast, the mutation in sentence 2) describes the action of marA being mutated due to reduction of OmpF. Argument identification Argument identification discrepancies often occurred in more complex sentences, in which a "double layer" of annotation was sometimes required. In the following sentence, Alpha interferon should be seen as the AGENT of converting as well as the AGENT of stimulates: Alpha interferon stimulates transcription by converting the positive transcriptional regulator ISGF3 from a latent to an active form. LOCATION arguments can also be problematic in sentences containing multiple events. In the following sentence, for example, different annotators associated the LOCATION in Escherichia coli K-12 with either the event described by the verb control or the nominalised verb expression. EnvZ functions through OmpR to control porin gene expression in Escherichia coli K-12. Semantic role assignment Amongst the most important semantic roles, both in terms of frequency of occurrence and according to [20], CONDITION is the one with the lowest rates of agreement. We thus examined more closely the types of disagreements that occur. According to our study, the most common confusions are with the MANNER and TEMPORAL roles. Typical examples include the following: 1) In contrast, the anaerobic repression of ethanol dehydrogenase by nitrate does not require the narL product. 2) Nitrate repression, however, was significantly enhanced (sevenfold) when the cells were cultured in minimal medium. For the repression event in sentence 1), anaerobic was confused between MANNER and CONDITION. The confusion may occur because anaerobic can be used in the description of environmental conditions in a phrase such as under anaerobic conditions. Here, however, it is being used to describe the method of repression, and hence the MANNER role is most appropriate. In sentence 2), the phrase the cells were cultured in minimal medium was annotated either as a CONDITION or as a TEMPORAL argument of the enhanced event. Whilst this would normally be interpreted as a CONDITION, the confusion may have arisen due to the use of when at the beginning of the phrase. Regarding the DESCRIPTIVE-THEME role, the most common type of confusion is with THEME. According to the guidelines, one of the situations in which DESCRIPTIVE-THEME should be assigned is to objects of verbs that describe states rather than actions e.g., The fru operon contains the genes for IIFru. Here, there is no action and hence no AGENT. Thus, the fru operon is the THEME and the genes for IIFru is the DESCRIPTIVE-THEME. However, problems sometimes arose for certain verbs such as exhibit, where there may be some confusion as to whether a "state" or "active" interpretation should be taken, e.g., The wild-type and mutant ompR genes exhibit different phenotypes of osmoregulation... The interpretation taken by the annotator determines whether different phenotypes is assigned the role THEME or DESCRIPTIVE-THEME (and also whether The wild-type and mutant ompR genes is assigned AGENT or THEME). In general, DESCRIPTIVE-THEME and DESCRIPTIVE-AGENT have less strict definitions than other roles, in that the only restriction imposed is that they should be assigned to arguments that describe characteristics or behaviour of the AGENT or THEME. This, together with the fact that they are not particularly commonly occurring, could have made them more difficult to assign accurately. As future work, we will consider tightening the definitions and possibly splitting them into different roles. Although it is desirable to keep the set of roles used as small and as general as possible in order to ease the burden on the annotator, a slightly larger range of more tightly-defined roles may help to improve agreement rates. Biological concept assignment As observed in Table 11, there is much more discrepancy between certain biological concept categories than others, especially those that constitute context-specific concepts. An exception to this is Protein, which is a more general concept category within the Proteins supercategory. We therefore examined the most common concept categories with which Protein was confused. These are shown in Table 12. Most common concept categories confused with Protein With the exception of Gene (which belongs to the Nucleic_Acids superclass), all other categories confused with Protein are also categories within the Proteins supercategory. This suggests that some annotators were using the Protein category to encompass all things related to proteins, rather than assigning more specific category labels. This may be related to their differing levels of knowledge regarding gene regulation and expression. Conclusion We have designed an event annotation scheme for biomedical texts and produced an associated corpus, the GREC, consisting of 240 MEDLINE abstracts annotated with 3067 gene regulation event instances. The corpus is unique within the biomedical field in that it combines both linguistically-oriented features (i.e., event-independent semantic roles tuned to the domain) and biologically-oriented features (i.e., biological concepts linked to the Gene Regulation Ontology [43]). The corpus can act as a basis for creating domain-specific semantic frame resources, and has already been used in the production of semantic frames for inclusion within the BioLexicon [44], in which the semantic frames are linked with syntactic information. It is also hoped that the corpus will boost research into other areas of bio-IE, such as the production of domain-specific SRLs, which have previously suffered due to the lack of a suitably annotated corpus. Initial experiments have demonstrated the feasibility of training an SRL using the corpus and as such, we hope to exploit the corpus in future shared tasks with such an aim. There is also evidence to suggest that combining the GREC with other larger biomedical corpora may help to train more accurate IE systems. Evaluation of the corpus quality was carried out using a newly-devised methodology, taking into account multiple aspects of the annotation task. Average agreement rates for the various tasks fell within the range of 66% - 90% F-score. Through error analysis of the corpus, we identified the most problematic issues, which included difficulties in assigning particular semantic roles, particularly CONDITION and DESCRIPTIVE-THEME. A full examination of the problematic cases will allow us to further improve the guidelines and possibly impose further restrictions in the annotation software, to prevent common types of errors being made. As regards biological concepts, our results show that, although high levels of agreement can be achieved when considering a coarse-grained set of categories, the use of a fine-grained classification caused some difficulties. This is possibly due to the differing levels of expertise of annotators within the gene regulation and expression domain, which may have resulted in varying levels of confidence in assigning more specific concepts. A solution for further phases of annotation would be to analyze the domain knowledge of annotators in greater detail and, where appropriate, provide extra training in the assignment of more specific categories. This may be combined with a re-evaluation and possible simplification of the concept hierarchies. A further major direction of future work will be to apply our scheme to a greater range of biomedical texts that describe a wider range of event types. Whilst other event types may require the use of alternative biological concepts or ontologies, we would like to verify that our set of semantic roles is applicable to events in other areas of biomedicine. The texts we will consider will also include full texts, in which events may be expressed in different ways from abstracts, and may involve different (higher) numbers of arguments. Finally, we wish to ensure that others can use and evaluate the GREC as simply as possible. Our future plan includes facilitating this in two different ways: Firstly, in response to the current diversity of corpus annotation formats and the problems this causes in their comparative evaluation [63], a shared format has been created for resources for biomedical relation extraction [15], together with a standard for the evaluation of relation extraction methods using this data [64]. We plan to convert our own corpus to this format, which has already been carried out for several biomedical corpora, e.g., [13,16,21,23]. Secondly, we plan to develop a corpus reader which will allow the GREC to be made available within the U-Compare system [65](http://u-compare.org). This is an integrated text mining/natural language processing system based on the UIMA Framework [66], which provides access to a large collection of ready-to-use interoperable natural language processing components. A structure filter for the Eukaryotic Linear Motif Resource Abstract Background Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality. Results Current methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications. The structure filter is implemented as a pipeline with both a graphical interface via the ELM resource http://elm.eu.org/ and through a Web Service protocol. Conclusion New occurrences of known linear motifs require experimental validation as the bioinformatics tools currently have limited reliability. The ELM structure filter will aid users assessing candidate motifs presenting in globular structural regions. Most importantly, it will help users to decide whether to expend their valuable time and resources on experimental testing of interesting motif candidates. Background In recent years it has become clear that proteins with highly modular architectures possess numerous short peptide motifs that are essential to their function [1-5]. Such peptides are termed Linear Motifs (LM) as, in contrast to the globular domains, their function is independent of tertiary structure and encoded solely by the amino acid sequence. They are found in a diverse range of proteins, such as membrane receptors, adaptors, scaffolds and transcription factors, and mediate numerous tasks, which can be as disparate as directing subcellular localization or acting as sites of cleavage. Well-known LMs include peptides binding SH3, Cyclin, PDZ and WW domains [6-10] and phosphorylated peptides interacting with SH2, PTB, BRCT and FHA phosphopeptide-binding domains [11-17]. The biological properties and range of functions mediated by LMs are reviewed in detail elsewhere [4,18-20]. In order to deconvolute the functional components of modular protein architectures, it is necessary to identify the set of LMs as well as the folded components. However, this is not straightforward because simple searches with short sequence patterns, known to act as functional modules, are uninformative - returning a flood of false positive matches. Several tools have been developed to rank motifs based on confidence of functionality by classifying putative motifs based on the hypothesis that functional motifs will have attributes similar to experimentally discovered motifs. Although classification tools cannot definitely confirm a motif as functional (only experimental analysis can achieve this) they can be used to attach a level of confidence to a motif. For example motifs which occur in an incorrect cellular compartment, or outside the known taxonomic range, are unlikely to be functional as are those which are not conserved in closely related proteins or buried in a globular domain inaccessible for interaction. Available motif discovery tools vary in their implementation of confidence-related metrics. ScanProsite [21], the web-based tool for detecting PROSITE [22] signature matches in protein sequences, recently integrated ProRules [23], a database containing additional information about PROSITE profiles, with the aim of increasing the discriminatory power of PROSITE profiles to facilitate function determination and provide biologically relevant information for the annotation of proteins. MnM [24,25], a motif database and a web-based tool for identifying candidate motif occurrences in proteins, addresses the issue of non-functional false positives by implementing evolutionary conservation, surface prediction and frequency scores to rank motif occurrences in a protein query. The Eukaryotic Linear Motif (ELM) resource filters implausible motif occurrences according to cell compartment and taxonomic range [2]. It also indicates less likely matches that lie within globular domains annotated in the SMART [26] and Pfam [27] resources and contrasts these with intrinsically unstructured polypeptide (IUP) regions predicted by GlobPlot [28] that are more likely to be motif-rich [5]. DILIMOT and SLiMFinder - tools designed for discovery of candidate novel peptide patterns significantly enriched in protein interaction datasets - also use some of these techniques to improve confidence in returned motifs [29,30]. Sequence conservation has also been shown to be effective in up-weighting true motifs relative to false positive matches [31-33]. In the intracellular milieu, LMs are found to be particularly abundant in segments of IUP where they are readily accessible [34]. Accessibility is a basic requirement of LM function which is almost always mediated by direct interaction with globular domain ligands. Extracellular proteins tend to have much less natively disordered polypeptide and therefore the extracellular linear motifs such as N-glycosylation sites [35] and the integrin-binding RGD motif [36] usually occur within globular domains, most often residing in exposed loop regions. LMs are also regularly found in globular regions of intracellular proteins - for example phosphorylation sites are common in flexible loops [37]. However, close inspection of the literature also reveals many instances of candidate motifs falsely reported as functional on the basis of loss of function mutagenesis and out-of-context peptide-binding experiments, despite the motif being well structured and sometimes deeply buried in a globular domain [38-41]. This observation suggests that stringent examination of motif structural context should be an essential processing step for experimental analysis. It also advocates the importance of high quality tools to identify such cases, as the cost associated with failure is detrimental both in terms of effort and quality of the literature. Despite this, neither the ELM globular domain classification nor the MnM surface prediction score take advantage of all the information available to them in the form of the plethora of experimentally solved protein structures. ELM globular domain classification is overly strict, classifying motifs occurring in these regions as low confidence. The MnM surface prediction score uses primary sequence based prediction both in those cases where a structure is available and in regions where a disorder predictor will render secondary structure prediction unnecessary. In the present manuscript, we address the issue of LM accessibility when the matches occur within globular domains for which a reference three-dimensional (3D) structure is available. Development and calibration of a structure filter is currently not straightforward as there are relatively few available structures for most motif classes (an obvious exception being N-glycosylation sites), placing limitations on the training and benchmarking possibilities. Nevertheless, we have been able to develop a protocol in which reference domain structures are selected and then the matched motifs evaluated using accessibility and secondary structure parameters. Benchmarking of the structure filter suggests that deeply buried LM candidates are unlikely to be functional, and that the likelihood of motif matches being valid functional sites improves with accessibility. In this way, the new filter can aid researchers to decide whether they wish to invest effort inexperimentaltesting of candidate motifs. The structure filter pipeline is implemented in a publicly available Python program accessible via a web-service interface [42]. The structure filter is fully integrated into the ELM server [43], providing graphical representation of the results in the context of the other filters. Results The ELM structure filter scoring scheme Structural analysis of true motif instances annotated in ELM supported what is expected from LM biology [3], i.e. that they tend to lie on the surface of protein domains and prefer unstructured and loop regions (See below "Analysis of the ELM 3D benchmarking dataset"). Figure 1 shows two examples of motifs lying on domain interfaces whereas Figure 2 reports cases of motif instances whose functional residues protrude outwards from the domain surface and hence are accessible to the solvent. This observation was further supported by the comparison between the accessibility and secondary structure distributions of true motifs vs random matches (determined as described in Methods) in our datasets (Figure 3), which highlights that true motifs are on average more accessible than random matches (p-value = 1.9e-55); moreover, loops are more represented (p-value = 1.13e-35) in true motifs than in random matches and both alpha-helices and strands are less represented in true motifs than in random matches (p-value = 3.69e-12 and 2.66e-16, respectively). These results convinced us to base the structure filter scoring scheme on accessibility and secondary structure assignments. Two examples of linear motifs packed into structured domains. a) PDB 2D07: Sumo-interacting motif (orange) of TDG domain (green) bound to SUMO-3 protein (cyan); b) PDB 2PTK: closed conformation of the proto-oncogene tyrosine-protein kinase Src. Blue: SH2 domain; red: SH3 domain; green: protein kinase; orange: pTyr-527; yellow: linkers; yellow spheres: SH3 binding peptide. All structure views were prepared with PyMOL http://www.pymol.org/. Examples of linear motifs with functional residues protruding outwards from the structural domain surface. a) A very exposed instance (in white) of LIG_RGD in a loop of SCOP domain d1 mfn_2; b) An instance (in violet) of LIG_RGD in a region outside a domain (SCOP d1ssua_); c) An instance (in pink) of MOD_SUMO in an exposed loop of the d1kpsd_ SCOP domain; d) The MOD_CMANNOS C-Mannosylation site (in magenta) in the SCOP domain d1k2aa_; e) The two MOD_N-GLC_1 N-glycosylation sites (in yellow) in the SCOP domain d1qm3a_; f) The N-glycosylation site (in red) in the SCOP domain d1fl7b_; g), h) The N-glycosylation site (in green) in the SCOP domain d1o7ae2 and (in red) in the SCOP domain d1n26a1. Secondary structure frequency and accessibility distribution for true motif instances and for random matches. 3a) Boxplots representing the accessibility distributions of true motif instances (orange) and of random matches (yellow), calculated for all motif positions (all positions), non-wildcard positions only (non-wildcard) and wildcard positions only (wildcard). The solid box lower and upper bounds represent the 25th and 75th percentile, respectively. Circles represent outliers; 3b) frequencies of each secondary structure element type in true motif instances and in random matches, calculated for all motif positions, for non-wildcard positions only and for wildcard positions only. The aim of the scoring procedure is to assign a score to LM candidates in the user query sequence given that a reference structure is available. In order to do this, the structure filter scans the LM match 3D context position by position, evaluates the relative accessibility and the secondary structure of each single position i, and assigns an accessibility score (Qacc) and a secondary structure score (Qsse) to the motif match as the normalized sums of its single position scores. More specifically, the score of a motif match is calculated on the non-wildcard positions of the regular expression pattern for the motif as: where N is the number of non-wildcard positions of a match, i.e. the number of non-wildcard residues in a LM occurrence, i is the ith position along the match, i ∈ Ω means that the sum is limited to the set of non-wildcard positions, Ω, and q(i) is the positional score of position i. Note that Qacc and Qsse were also calculated for all LM positions (i.e. not limiting the sum to the set of non-wildcard positions) and found to be marginally less discriminating than those only based on non-wildcard positions. In this regard, Figure 3 shows that the accessibility differences between wildcard and non-wildcard positions are statistically significant in the case of true motifs (t-test's confidence level = 0.99, p-value = 3.058e-05) (Figure 3a) and that true motif non-wildcard positions have a more pronounced tendency to be in loops and a less marked disposition to be in helices and strands as opposed to the frequencies both of true motifs for all positions and for wildcard positions, even if none of these differences is statistically significant (Figure 3b). For further details see additional file 1, additional file 2 (Figure S1) and additional file 3 (Figure S2). We adopted as accessibility positional score, qacc(i), of position i, the normalized solvent exposure value of the residue in i, which ranges between 0 (non exposed) and 1. 5. Thus, the higher the residue exposure, the more the corresponding position is rewarded. The secondary structure positional score, qsse(i), was determined in a more complex manner. The analysis of LM instances on structural domains showed that they occur more frequently in loops and unstructured regions than expected by chance. In order to quantify this observation, we calculated, for each secondary structure element (SSE) type (loop, helix, strand, 3/10 helix - see Methods), the ratio between the SSE type frequency (ν) among true motif instances and among random matches. The corresponding values are reported in Table 1. Frequency of secondary structure elements in true and in random motifs Thus, the secondary structure score of a position i whose SSE assignment is loop (or 3/10 helix, helix, strand), is the ratio between the frequency of loops (or 3/10 helices, helices, strands) in the instance dataset and the frequency of loops (or 3/10 helices, helices, strands) in the random dataset. Assessing the predictive ability of the ELM structure filter In order to assess the predictive ability of the ELM structure filter scoring scheme, we made use of five strategies, each introducing useful parameters for the evaluation of the discrimination power of our procedure: 1) we plotted ROC curves and calculated AUCs; 2) we assigned a p-value to predictions; 3) we built LM-specific background distributions; 4) we identified sparse/neutral/enriched score intervals; 5) we carried out a 5-fold cross validation in order to determine sensitivity, specificity and accuracy. In order that the structure filter may be a useful guide to the ELM resource user, we propose that the values of the above-mentioned parameters are used as decision-making tools in evaluating the score of LM predictions. In particular, since having high accessibility and belonging to loop regions is not a prerogative of LMs alone and the random match dataset might in principle be "contaminated" by not yet annotated spurious true motifs, we suggest using as many indicators as possible in evaluating a prediction score and not relying on each single tool as a unique criterion for retaining/rejecting a prediction. In order to establish if one score is more discriminative than the others, we assigned an accessibility score (Qacc), a secondary structure score (Qsse) and a combined score (Qand = Qacc + Qsse) to the true positive instances of our dataset and to the random matches of the random dataset, plotted cumulative score distributions and ROC curves and calculated the area under the ROC curves (AUCs). In calculating the ROC curves, we assumed that random matches are all negative matches. Figure 4 shows that the cumulative distribution of true motifs is clearly separated from that of random matches for each score type. Moreover, the ROC curves (Figure 5) show that all three score types are able to discriminate between the true motif and random match sets and that both Qacc and Qand perform better than Qsse; the AUC values for the three scores are 0.73 (Qacc), 0.66 (Qsse) and 0.72 (Qand); notice that, even though the AUC for Qand is slightly lower than that of Qacc, Qand performs similarly or better than Qacc in the range corresponding to the 20% of the ROC x-axis values. Cumulative score distributions. a) The cumulative distribution of (a) Qacc, (b) Qsse, and (c) Qand = Qacc + Qsse scores calculated for true motif matches (true motifs), and for random matches (random matches) in non-wildcard positions. Red dashed lines indicate the percentile cut-off ensuring that the lower 40% random matches fall in the "sparse" bin. The consequent percentage of true motifs falling in the "sparse" bin is about 15% (accessibility) and 20% (secondary structure). This cut-off corresponds to Qacc~0.3 and Qsse~0.7. Black dotted lines indicate the percentile cut-off that guarantees that the enriched bin collects at least the top 30% true motifs. This cut-off corresponds to Qacc = 0.76, Qsse = 1.5, and Qand = 2. Qacc, Qsse and Qand = Qacc+ Qsse ROC curves. We determined the distribution of random matches and use it to assign a p-value to the score of each ELM prediction. This p-value, which is implemented both in the Web Server and in the Web Service, is calculated using a Z-test and is a conservative estimate of the probability that a LM prediction with a given score is a true positive; more specifically it is the probability of obtaining a random match with a score at least as high as the one that was actually observed, and therefore we expect it to be very stringent. Due to the paucity of true motif instance data, we cannot build a true motif score distribution for each ELM motif (and therefore we cannot build a LM-specific structure filter yet) and compare it to the corresponding random motif score distribution. However, we built, and displayed in the ELM web server output page, LM-specific random score distributions (as described in Methods) in order to use them as background score contexts, telling the users something about the average behavior (in terms of accessibility, secondary structure and combined scores), on a large dataset of structures, of each single LM. These background distributions are only intended as a supplementary guideline for the web users to evaluate whether or not the score assigned to a LM match is reasonably higher than the random match score average for that LM. The background score distributions for 103/112 motifs are shown in the additional file 4 (Figure S3). Correspondences between x-axis labels and ELM names are reported in the additional file 5 We chose two score thresholds for each score type aimed at defining three score intervals (or "bins"), one "sparse", lacking in true motifs and enriched in random matches, one identifying "neutral" matches, and one lacking in random matches and enriched in true motifs. We consider that such a three-interval scheme might effectively help the user in deciding whether to retain or reject a prediction. In fact it is based on the idea that a predicted match that is assigned a score in the "enriched" interval will be indicated by our procedure as a good true motif candidate (i.e. likely to be a valid functional site), motif matches scoring in the bottom interval ("sparse" interval) as unlikely to be valid functional sites and those ranking in the middle one as "neutral". The score thresholds were chosen on the basis of the cumulative distributions of Figure 4 by selecting two cut-offs (one in the percentile range 0-50% and one in the percentile range 50%-100%), roughly corresponding to the inflection points of the random match cumulative distributions, and guaranteeing that at least the top 30% true motifs are retained in the enriched bin and at least the lower 40% random matches fall in the sparse bin. The "neutral" bin is delimited by the "sparse" and "enriched" cut-offs and contains the medium quality matches. Table 2 reports Qacc, Qsse and Qand thresholds defining the three bins. From Figure 4 and Table 3, it can be seen that, in the case of the accessibility score (Figure 4a), the cut-off on the top 30% of true motifs implies that only 10% of random matches are retained in the enriched bin and that the cut-off on the lower 40% random matches implies that only 15% true motifs incorrectly fall in the "sparse" bin. In contrast, Qsse thresholds (Figure 4b) actually assign about the top 60% true motifs and 32% random matches to the enriched bin (see Table 3). This is due to the fact that the top 60% true motifs (and 32% random matches) uniformly get the highest score. Finally, in the case of Qand (Figure 4c), only 9% random matches are retained in the enriched bin and only 16% of true motifs fall in the sparse bin (Table 3). This gives to the users a measure of the percentage of false hits that they can expect in the enriched bin and of the percentage of true hits that they would miss if discarding all the predictions falling in the sparse interval. Score thresholds defining the "sparse", "neutral" and "enriched" bins Number and percentage of true and random motifs assigned to each bin by the different score types In order to establish more rigorously the predictive ability of the structure filter in the enriched and sparse intervals, we carried out a 5-fold cross validation experiment. Referring to score calibration and within the limits of the 5-fold cross validation experiment only, we defined two intervals instead of the three implemented in the ELM Web Server, by incorporating the neutral interval first into the enriched one and then into the sparse one. This made it possible to properly determine sensitivity and specificity values in two different situations: the first accounting for an enrichment of sensitivity and the second for an enrichment of specificity. We defined the positive dataset as the one made up of the ELM true instances and the negative dataset as the set of all the un-annotated random matches. We split both the positive and the negative datasets into five subsets by random sampling the datasets without replacement, thus obtaining five non-overlapping positive and five non-overlapping negative training sets. Five positive (negative) test sets were determined by depriving cyclically the whole positive (negative) dataset of each of the five positive (negative) training sets. We built the scoring schemes as described in the section "The ELM structure filter scoring scheme" and set up score acceptance/rejection thresholds on the training sets as explained above (subsection "Sparse/neutral/enriched score intervals"). Then, we validated them on the corresponding test sets by calculating sensitivity (Sn), specificity (Sp), and accuracy defined as: In evaluating Sn, Sp and Accuracy, we assumed that a match belonging to the negative set and scoring above the "accept" threshold, is a FP and one scoring below, is a TN; a true instance scoring above the "accept" threshold is a TP and one scoring below is a FN. Sensitivity (Sn) and specificity (Sp) and accuracy averaged over the five sets are reported in Table 4. Since the structure filter is designed as a guide to experimentation, we consider that sensitivity should be privileged over specificity - for not missing too many true motifs. Based on this viewpoint, it can be observed in Tables 3 (last column) and 4 that the best performing scoring schemes - in terms of a trade-off between sensitivity, specificity, the percentage of true motifs erroneously discarded and the percentage of true motifs correctly retained - are Qacc and Qand. Sensitivity, specificity and accuracy obtained with the 5-fold cross validation experiment. Notice that the Accuracy values reported in Table 4 might be affected by the fact that the positive and negative datasets are unbalanced. The analysis of the ROC curves, of the cumulative distributions and of the filter performance in the three score bins suggests a more relevant role of the accessibility in discriminating true from false motifs than the secondary structure assignment. This observation is biologically sound since, while a buried motif is unlikely to be a genuine functional site, an exposed motif lying e.g. on a helix can in any case possess an interaction ability. Finally, our results show that the combined score is slightly more effective than the accessibility score and markedly better than the secondary structure score. The combined score Qand is implemented in both the Web Server and Web Service. Usage of the ELM structure filter For practical purposes, the filter exploits available information on protein structures to answer the question "Is it worth testing this motif candidate experimentally?" rather than to categorically tell the users whether they have a real motif or not. In deciding if a prediction is a good experimental candidate, the user should give more weight to accessibility score than to secondary structure score since a buried motif is unlikely to carry a function, whereas an exposed motif may function properly even if it is part of a beta strand or belongs to a helix (see examples in the benchmarking dataset, additional file 6 (Table S1). The main exception to well buried candidates being nonfunctional concerns allosteric rearrangements [44]. If the motif is in the core of a well-known domain like SH3 or a TIM barrel, a review of the accumulated structural knowledge will allow the user to conclude that the chance of valid function is negligible. If there is evidence of allostery, however, depending on which parts of the structure are flexible, this might support or invalidate the motif. If nothing is known, then it should be kept in mind that most parts of most globular domains do not undergo major rearrangements, hence candidates from the sparse bin should not be eyed with hope. The user should also consider overall context in assessing the structure filter results. Is the cell compartment correct: An exposed RGD motif with a significant p-value in an extracellular protein is a very good integrin-binding candidate: one in a nuclear protein is worthless. Is the motif conserved, at least within a phylogenetic lineage such as mammals, tetrapods or vertebrates: the motif should be conserved in such groups if it is functional in a regulatory system common to related organisms. Is the biological context sensible: Is the query protein in some way functionally associated with the ligand protein; Are they in the same regulatory pathway; Are they in the same protein complex? Structural analysis of LMs: Classification and examples of motifs in protein structures Before the structural context of LMs can be evaluated, it is necessary to define and select the structural unit. Structure files may contain large protein complexes, single proteins, single or multiple chains, single globular domains and many other types of molecule. LMs may be bound to their ligands or in an unliganded state. Figure 1a shows the Sumo-Interacting Motif (SIM) of TDG bound to SUMO-3 by beta augmentation but also well packed into the main TDG domain. Clearly we need to measure accessibility of the SIM in the absence of the SUMO protein. The open (active) and closed (inactive) conformations of the Src kinase are dependent on the phosphorylation states of several tyrosines. In particular, the closed conformation is specified by an interaction between the Src SH2 domain and the C-terminal pTyr-527 and an interaction between the Src SH3 domain with a peptide linking the SH2 and kinase domains. Figure 1b shows the closed conformation with these elements highlighted. In particular, the SH3 binding peptide is fully buried, even though it is not part of a globular domain. In the open conformation this peptide is much more accessible, as is the C-terminal peptide which is released from the SH2 domain (e.g. 1Y57, [45]). The dependency of LM accessibility on globular domain rearrangements implies that multi-domain structures are not a suitable structural unit for structure filtering. The appropriate units therefore in the cases of LMs would be the individual globular domains themselves. At least for domains that do not undergo allosteric rearrangement, a motif which is buried in the core of a structural domain unit is unlikely to be a true one. Therefore we chose the SCOP [46] protein domain definition as provided by the ASTRAL resource [47] as the structure dataset to be used to implement the structure filter. The inception of this work required the collection and analysis of the 3D occurrences of LM instances annotated in the ELM resource [1]. Here we present a discussion of our benchmark dataset. Many details and specific examples are reported in the supplementary information (additional file 1). As described in Methods, we obtained a set of 158 3D non-redundant instances from 36 different LM entries (reported in additional file 6 (Table S1) from the ELM resource release June 2007. Sixteen motifs match only one instance and twenty match two or more. The majority (~60%) of LM instances are made up of residues whose relative accessibility to the solvent is at least 50% and are located entirely in loop, turn or unstructured regions. Figure 2 shows two typical examples (LIG_RGD and MOD_SUMO) of a motif in a very exposed loop of a domain (2a and 2c) and a motif in a flexible region which is not in a domain (LIG_RGD, Figure 2b). LIG_RGD is a short peptide ligand motif which interacts directly with extracellular domains of integrins whereas MOD_SUMO is a motif recognised for modification by SUMO-1. The SUMO proteins are Small Ubiquitin-related MOdifiers that are covalently conjugated onto lysine residues within target sequences. Eight out of 36 LM entries have at least one instance which is entirely or almost entirely in helical conformation while two entries have at least one instance almost entirely in a strand conformation. Notwithstanding the greater rigidity of helices as opposed to loops and unstructured regions, LMs found in helical conformation are not necessarily prevented from being exposed to the solvent and carrying out their functions. Two clear examples are shown in Figure 2d and 2e. Figure 2d shows an instance of the MOD_CMANNOS motif, which is part of a helix and is partly hidden by the C-term of the protein. C-Mannosylation is a type of protein glycosylation involving the attachment of a mannosyl residue to a tryptophan. In this particular case, the most buried residues are those corresponding to wildcard positions in the MOD_MANNOS regular expression (W..W), whereas the conserved tryptophan needed for the mannose attachment is protruding outwards from the domain surface. Figure 2e reports two MOD_N-GLC_1 N-glycosylation sites on the same domain. N-linked glycosylation is a co-translational process involving the transfer of an oligosaccharide chain to an asparagine residue in the protein. In this case, one site is part of a well exposed helix whereas the other one consists of a loop with small helix overlap and it is very exposed. Figure 2f, g and 2h show cases of LMs in partly buried beta strands. In figure 2f an instance of the MOD_N-GLC_1 motif is in a long edge beta strand, slightly disrupted, and quite exposed. The N-glycosylation sites of figures 2g and 2h are two examples of motifs lying on partially hidden beta strands but whose modified asparagine (involved in the functional activity) side chain is exposed to the solvent. In our benchmarking dataset, 29/158 instances belonging to 11 different LMs [marked in dark orange in additional file 6 (Table S1)] have a very low average accessibility. In 15/29 of these instances, however, residues belonging to non-wildcard positions in the LM regular expression (e.g. the two tryptophans in the C-Mannosylation site regular expression W..W) display equal or higher accessibility values as opposed to wildcard positions (marked in bold in the acc_nwc column of Table S1, additional file 6). This seems reasonable since LMs are involved in protein interactions and the non-wildcard positions specify LM function. Importantly, this trend is not seen in the case of LM false positive matches, an observation which helped us to improve the benchmark set as it brought to light some poorly annotated instances. See additional file 1 for details. In the benchmark dataset there are a few cases (10/158) of almost completely buried true motif instances, i.e. displaying an average relative accessibility < 0.2 on the non-wildcard positions. We analysed them one by one by manual inspection and concluded that they fall in one of two situations: either their functional residue(s), or at least their side chains, are favourably oriented outwards from the domain surface (see additional file 1, additional file 6, and Figure 2g and 2h), or an allosteric effect is either known or reasonable to hypothesize. Additional file 1 reports details and specific examples. Discussion We have set up a procedure to help in the discrimination of true from false positive LM matches, that is based on the information coming from two important features inherent to the 3D structure of proteins: accessibility to the solvent and secondary structure element. The fact that functional LMs tend to be in flexible and accessible regions of proteins is biologically sound and is furthermore supported by the structural analysis of experimentally validated instances of LMs carried out in this work. As a consequence, our approach will advise a user against considering a match as a true motif if it resides in an unfavourable structural context. Nevertheless, the function of proteins can be regulated by an assortment of different mechanisms, and allosteric modifications or unusual LM position and/or conformation are infrequent but possible. In this sense, we encourage the user to carefully evaluate the possibility that a hidden motif can become exposed upon protein interaction and to use the ELM structure filter cum grano salis, i.e. not as a deterministic predictor but rather by exploiting the supplied 3D information on LM predictions as a supplement to a prior knowledge of the LM biological context. The ELM resource now provides three ways to aid the user about structural context for the query sequence. The disorder predictor GlobPlot highlights potential motif-rich regions that are likely to be intrinsically unstructured. SMART and Pfam domains define regions of well-defined globular structure where LMs are expected to be rare. Where it can be applied, the new structure filter now provides a benchmarked estimate of LM likelihood. MnM has taken a different approach to structural context, a single score for each pattern match being provided by an accessibility prediction algorithm, SPS [24,48]. While MnM does not supply domain and tertiary structural information that is highly informative to the user, an accessibility predictor does have a unique value for a substantial fraction of protein sequence space that is predicted to be globular but is not known to be related to a solved domain structure. In future, we may also consider introducing a predictive accessibility filter into ELM for poorly characterised globular peptide segments. There are many algorithms in the literature, with the current best performing reported to be NetSurfP and Real_SPINE [49,50]. Besides the results on the structure filter discrimination power presented in this work, we want to point out that the process of developing the structure filter has already proven of value to the ELM resource. The structural analysis of annotated motifs reported in section 2.1 highlighted a number of questionable motifs that turned out to be incorrectly annotated with weak or conflicting support in the literature. In this regard, experimentalists should be aware that accurate annotation of LMs concurs with developing effective methodologies aimed at identifying new putative motifs and that inference of shortlists of candidate true motifs is especially useful to reduce the number of assays needed to experimentally validate a new LM. Thus, the experimental strategy adopted to detect functional motifs plays a fundamental role and incorporating some simple stratagems in experimental protocols might crucially help in reducing the number of false motifs in the literature. We consider a pair of much too rarely undertaken controls to be especially important when candidate motifs are mutated [4]: (1) Check if the motif mutation unfolds the protein by cloning in a tagged expression construct that allows fast and easy purification of the protein and examine folding status by e.g. circular dichroism (or NMR if available); (2) When transfecting with mutated proteins, examine the cells by microscopy for intracellular amyloid caused by massive overexpression of unfolded protein and, if it is present, then reason out why the assay is misleading (e.g. remember that amyloids are not subject to ubiquitin-mediated destruction processes so destruction box and degron motif mutation assays give misleading results). We expect that the predictive power of the structure filter can be improved as more data becomes available. For example, one might devise a procedure trained on the structural data of specific motifs and qualified to make predictions only for those motifs. We investigated this approach and concluded that it would currently be applicable only to the very few LMs that have enough instances in the database. For the great majority of LMs, appropriate training and tests cannot be carried out and predictions turned out to be unacceptably stringent: An effective procedure should be based on many more instances per LM and these are not available at the moment. We believe that in the future, as an increasing number of protein structures become available and the quantity of ELM annotation data grows, it will be possible to appropriately train and test motif-specific structure filters for a significant number of LMs. Conclusion In conclusion, LMs are subject to enormous over-prediction, so that the few true motifs are lost amongst the many false positives. Whenever a query can be modelled on a structure, the structure filter can help in discriminating true from false positive matches of LMs. Moreover, since the number of solved structures is rapidly increasing, a benchmark set of true positive structures is going to be available for an increasing number of motifs, thus allowing more reliable tests and consistent score threshold setups. As a consequence, the structure filter, which can be considered to all intents and purposes as a precursor in the use of structural information for short LM false positive discrimination, is going to become increasingly indispensable for the ELM resource's filtering framework in the structural genomics era. Methods Dataset of structural instances The ELM database ([1], release June 2007) collected 112 LM, 93 of which have annotated instances. The set of 1898 annotated instances in 1037 sequences from the 93 different LMs obtained from the ELM database represented our initial dataset. In our vocabulary, "instance" means a true annotated LM occurrence, whereas "match" indicates any regular expression hit on a query sequence. The instances were modeled onto SCOP domains [46] by BLAST alignment [51] of the sequence containing the instance to the reference domain sequence extracted from the PDB entry [52]. In order to assign a "sequence instance" to a "structure instance", the aligned sequences must have at least 70% global identity (over the domain) and 100% local identity (i.e. along the instance positions). The final dataset of structural instances comprised of 185 3D instances from 37 different LMs. Redundancy was removed at the structure level: if two or more instances mapped on identical 3D sites, all but one were discarded, thus reducing the dataset to 158 3D instances from 36 different LMs. For each position of a 3D instance, the solvent accessibility and secondary structure values are collected from the DSSP [53] file of the target structure mapping the instance. For the solvent exposure of a residue, a relative (normalized) value is calculated as the ratio of the residue's DSSP accessibility value to the residue accessible surface area value as defined by Miller and co-workers [54] and which is calculated for the residue in a Gly-Xaa-Gly tripeptide in extended conformation. The DSSP secondary structure types are: H = alpha helix, B = residue in isolated beta-bridge, E = extended strand (participates in beta ladder), G = 3-helix (3/10 helix), I = 5 helix (pi helix), T = hydrogen bonded turn, and S = bend. Unstructured regions are marked as U. In our study we grouped the SSE types in four categories: 1) helices (H, I); 2) 3/10 helices (G); 3) strand (E); 4) loops (B, T, S, U). Pi helices are usually attached to larger alpha helices; therefore we grouped them with helices. 3/10 helices are often poorly conserved as part of a larger loop but sometimes they are continuously linked to a larger helix and so we decided to treat them separately. B, T, S and U are grouped together because they usually belong to 3D flexible loop-like regions. The non-redundant instance dataset is reported in Table S1 (additional file 6). Random structural matches Since the aim of our study was to set up a scoring scheme and to establish accessibility and secondary structure score thresholds for discriminating true motifs among random (mostly FP) matches, we performed a pattern search using all the LM regular expressions available in the ELM database (112) in the 1037 sequences known for having at least one true annotated instance. True motif instances were filtered out from the resulting list of matches. Applying the same sequence-to-structure mapping procedure used for true motif instances, the sequences of random matches were modeled onto SCOP domains, resulting in 22,058 3D non-identical matches from 105 motifs. LM-specific background score distributions Background score distributions have been obtained by scanning the 13,582 sequences of a non redundant PDB dataset (<50% sequence identity, downloaded from PDB clusters [55]) with the regular expressions of the 112 ELM motifs (total number of available motifs), mapping the 323,5412 matches onto SCOP domains and assigning them an accessibility score and a secondary structure score. Score distributions (for each feature) were then plotted for 103 LMs: score distributions for the 9 LMs with less than 10 matches are not reported. LM-specific score distributions are shown in additional file 4 (Figure S3). The ELM structure filter pipeline The user query sequence submitted through either the Web Interface or the Web Service is first scanned for LM matches and then aligned to the database of ASTRAL sequences [47] derived from SCOP domains [46]. The hit with the highest sequence identity and coverage to the query sequence is selected as a reference structure. If more than one hit has the same sequence identity and coverage to the query sequence, the structure with the best experimental resolution is taken as reference and, for the same resolution, one hit is chosen randomly. This approach may result, for example, in the organism of the reference structure being different from the source organism of the user query sequence. However, since proteins with identical sequences fold into identical structures, the procedure for the selection of the reference structure does not introduce any bias in the calculation of solvent accessibility and secondary structure values. For the structure filter to be applicable, two conditions must hold: 1) the query sequence or a region of it can be aligned to one or more (non-overlapping) structural domains; 2) at least one LM match falls in an aligned region, i.e. can be mapped onto a 3D domain. The structural positions of the match are then analysed one by one and solvent accessibility and secondary structure values are collected and scored as described in Methods. The structure filter pipeline for a LM match is schematised in Figure 6 and a snapshot of the ELM server output page, displaying results of the structural filtering procedure, is reported and described in Figure 7. This latter, shows that a solved structure of the C-terminal domain of the RanGAP1 protein is available in SCOP entry d1kpsd_ that is used for structure filtering. The remainder of the sequence is filtered by the cruder domain filter. It can be observed that mouseover of the known sumoylation site reveals that it scores in the enriched bin and receives a significant p-value: if we did not already know it was a true motif, it would be an attractive candidate for experimental testing. Moreover, mouseover of a match to the NES motif reveals that it has poor accessibility and is assigned to the sparse bin. The NES motif is predominantly hydrophobic and this example, like many others falling in globular domains, is not a plausible functional site and experimental follow up would be a waste of valuable experimental effort. The Structure Filter (SF) pipeline. SA: Solvent Accessibility; SSE: Secondary Structure Element. The ELM server output page. An example of the graphical output of the ELM structure filter for the RAGP1_MOUSE Swissprot [63] entry. The key shows the elements of the graphic. Secondary structure elements (in this case helical) are shown as yellow boxes connected by black lines (unstructured loops that tend to be surface accessible). Mouseover of the site rectangles turns on a window reporting structural information; further details of the structure filter results are available by clicking on the site rectangle. The ELM structure filter Web interface and Web service As an initial step for feedback in the development process of the structure filter pipeline methods, the ELM structure filter functionality was implemented directly into the ELM server. This involved integration work on both the display representation in the graphical output in addition to links to the more specific details of the results. As a second step, in order to facilitate a clean encapsulation of the structure filter pipeline code functionality and to enable future remote tool integration, a SOAP Web Service to access the functionality programmatically has been implemented and is available at http://structurefilter.embl.de/webservice/structureFilter.wsdl. At this link the user can find a detailed description of the web service operations and an example client implementation. The functionality provided by the web service encompasses the current ELM server interface functionality with some additional options. For the ELM Server interface functionality, all LMs in the ELM database are matched against the query sequence and this is also the default functionality of the Web Service. The extra options implemented in the Web Service are to search the query sequence by one or more user-specified regular expressions, rather than the default ELM database regular expressions, and/or by one or more user-specified ELM identifiers from the ELM database. Where possible, to a limited extent, if the user-specified regular expression corresponds to an existing ELM this information is made known to the user. The WSDL (Web Service Description Language) [56] file is WS-I compatible. The WS-Interoperability Basic Profile [57] proposes a set of rules to achieve interoperability of web services between different platforms. The WSDL file implements an XML document/literal style [58]. The back-end code is implemented in Java and runs on Axis2 [59] inside a Tomcat servlet container [60]. Statistical Details Score distributions turned out to be normal after visual inspection and quantitative Shapiro-Wilk test [61] at the 0.01 significance level. The average and standard deviation values from random match score distributions are used for the dynamical calculation of the Z-score and the corresponding one-sided p-value. The significance of the differences observed for accessibility and secondary structure frequencies in true motifs vs random matches was assessed through t-tests (for accessibility values) and chi-square tests (for secondary structure assignments). All the statistical calculations were performed with the R package [62]. List of abbreviations LM: Linear Motif; 3D: Three-dimensional; TM: True Motif; FP: False Positive; regexp: regular expression; SSE: Secondary Structure Element; Sn: Sensitivity; Sp: Specificity. multiplierz: an extensible API based desktop environment for proteomics data analysis Abstract Background Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge. Results We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis. We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files. In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports. Moreover, multiplierz is designed around a "zero infrastructure" philosophy, meaning that it can be deployed by end users with little or no system administration support. Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines. Conclusion Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research. Background Mass spectrometry-based proteomics, particularly liquid chromatography coupled to electrospray ionization, has become the predominant technique for identification and quantification of proteins in biological systems [1]. Growing demand for improved annotation of primary proteomics data with biological information from various public databases has catalyzed interest in the development of software tools to support integration of these data types. Unfortunately, a number of factors, including lack of experimental standardization, rapid introduction of novel mass spectrometry technology, and the evolution of proprietary file formats associated with proteomics platforms represent a significant hurdle to the development of efficient and comprehensive software frameworks. To accommodate the emergent nature of proteomics-related technologies and the burgeoning number of databases that contain various biological annotations, data analytic systems must emphasize (i) intuitive and interactive interfaces, (ii) user-accessible coding frameworks to facilitate rapid prototyping of algorithms, and (iii) customizable sets of tools that can be readily integrated to provide pipelines that support a variety of proteomic workflows. Task specific Windows desktop applications such as MSQuant [2] and InsilicosViewer [3] can access a subset of native mass spectrometry data files directly and provide flexibility through adjustable parameters, but are not readily extended across the full spectrum of data analytic activities required in modern proteomics research. To address the full spectrum of analyses, open source projects such as The OpenMS Proteomics Pipeline (TOPP) [4] and ProteoWizard [5] offer a set of modular tools for generation of pipelines. The C++ coding environment of these tools is designed for performance and throughput, although researchers who lack programming experience often struggle to implement novel algorithms or other ad hoc tasks. Therefore, software libraries such as InSilicoSpectro [6] and mspire [7] have been developed based on high-level languages such as Perl and Ruby respectively. These libraries allow scripting of common data analysis tasks but cannot access raw binary data directly, and must rely instead on surrogate text files. Historically the proprietary nature of binary files associated with proteomics technologies represented a significant obstacle to efforts aimed at development of integrated, desktop environments. One solution proposed specifically for mass spectrometry is extraction of native data to a common file format, typically a dialect of XML [8,9]. We [10] and others [11] have challenged the technical merits of this approach. Given that mass spectrometry manufacturers implicitly carry the burden of maintaining up-to-date libraries for access to their native data, we recently proposed that a common API [10] is a more rational solution for shared access to proprietary mass spectrometry files. Here we define and implement a minimal API (mzAPI) that provides direct, programmatic interaction with binary raw files and we demonstrate that performance for practical tasks is significantly faster as compared to equivalent operations for access to mzXML files. We implement mzAPI in Python to maximize accessibility; similarly, mzAPI is exposed to users through multiplierz, a Python-based desktop environment that combines an intuitive interface with a powerful and flexible high-level scripting platform. Together, mzAPI and multiplierz support a wide range of data analytic tasks and facilitate rapid prototyping of novel algorithms. In addition, the multiplierz environment is designed with a "zero-infrastructure" philosophy, meaning that it can be deployed by end users who lack system administration experience or support. We demonstrate the capabilities of multiplierz through a variety of proteomics case studies such as (i) label-free quantitative comparison and interactive validation of datasets from multi-acquisition experiments, (ii) automatic quality control of mass spectrometer performance, (iii) improved peptide sequence assignment via deisotoping of MS/MS spectra, and (iv) assessment of phosphopeptide enrichment efficiency through programmatic fragment ion extraction. Implementation mzAPI: A Common API for Direct Access to Proprietary File Formats As described above, direct access to native mass spectrometry data files is a key factor in the assembly of a powerful and flexible framework for proteomics data analysis. Towards this end, we define a minimal mzAPI as consisting of the following key procedures: 1. scan(time) › [(mz, intensity)] 2. scan_list(start_time, stop_time) › [(time, precursor)] 3. time_range() › (start_time, stop_time) 4. scan_time_from_scan_name (scan_name) › time 5. ric(start_time, stop_time, start_mz, stop_mz) › [(time, intensity)] The first two procedures in mzAPI return: 1) individual scans in the form of a list of (mz, intensity) pairs, and 2) a catalog of all scan descriptions in the form of a list of (time, precursor) pairs in the experiment. In addition, the API provides: 3) 'time_range' that returns the earliest and latest acquisition times in the experiment, and 4) 'scan_time_from_scan_name' for translation of manufacturer-specific scan nomenclature to the mzAPI naming convention. We opted to rely on acquisition time as a common naming convention. In the case of LC-MS this is equivalent to chromatographic retention time. Finally, a fifth procedure generates a reconstructed ion chromatogram (RIC) for a given time and mass-to-charge range, returned as a set of (time, intensity) pairs. While in principle RICs can be generated using the first two calls, we believe that ubiquitous use of the RIC operation in proteomics data analysis justifies exposure of RIC extraction as a primitive in the API. Given that RIC extraction is provided by all manufacturer libraries, this procedure represents an excellent example of efficient re-use of native data system indexing and software. We propose that a proprietary file format is considered mzAPI compliant when the manufacturer provides a freely available, and preferably redistributable, implementation of the aforementioned 5 core procedures, or an extended version that may evolve from a community-driven standardization effort. For example, ThermoFisher Scientific provides a data access library for .RAW files through the MSFileReader program, freely available for download at: http://sjsupport.thermofinnigan.com/public/detail.asp?id=586 or http://blais.dfci.harvard.edu/research/mass-informatics/mzAPI/vendor-libraries/. Naturally, additional procedure calls, such as charge state or signal-to-noise values for each isotope cluster in MS or MS/MS scans, can be incorporated into the mzAPI framework by essentially subclassing the core mzFile class. As a basic test of file access speed, we compared the time required for random access of scans in an LC-MS acquisition from a ThermoFisher .RAW file using mzAPI (through its Python implementation) versus libraries provided by the manufacturer and included in the native Xcalibur file browser (note that mzXML data was not considered for this comparison since it does not provide access by acquisition time as a primitive in the RAP or RAMP API). Table 1 demonstrates that, as expected, random file access via mzAPI is slower than that obtained when working directly in the manufacturer's native environment. The performance of the common API could be further improved by implementation in C++ or C#, but we explicitly chose Python to maintain maximum flexibility through user defined scripts (see below). Interestingly, one consequence of the common API strategy is that it provides a direct measure of manufacturer data system efficiency, as evidenced by the additional time required for random access to scans in .WIFF versus .RAW files. Regardless of native file type, the use of a common API eliminates the need for storage and tracking of surrogate files; based on previous reports, this can be particularly problematic for full profile data files, which can grow significantly in size upon conversion to XML [9]. Access Efficiency for Open and Proprietary Mass Spectrometry Data Files. Given the multidimensional nature of mass-spectrometry data, extraction based on specific slices through the data space, rather than random file access, is a more relevant performance metric for mass spectrometry files. Generation of RICs is perhaps the best example of a data slice procedure supported by all manufacturer data systems. Consequently we next sought to test the performance of mzAPI for creation of RICs directly from a .RAW file. As a point of comparison we generated the corresponding mzXML file (using TPP version 4.0) [12] and extracted RICs using both a graphical user interface (GUI) based browser tool (InsilicosViewer version 1.5.1) [3] as well as the Perl-based InSilicoSpectro environment (version 1.3.19) [6] and the R-based XCMS (version 1.12.1), [13] scriptable interface platforms. Although the latter two are designed to access a number of third-party file formats, none of the GUI- or command-line based tools supports access to specified subsets (in chromatographic time) of the underlying data. As a result we generated RICs by extraction of a specific mass-to-charge range over the full data file, or in the case of InSilicoSpectro, which had no support for RIC generation, we simply timed the opening of mzXML files. While the mzXML schema includes a scan index that provides for random access to scans at speeds competitive with, or exceeding, proprietary data system (in this case ThermoFisher Xcalibur) [9], Table 1 shows that generation of specific data slices, or in this case RICs, is 5- to 10-fold faster when leveraging the underlying manufacturer's API compared to GUI or command line based access to mzXML (scripts used for all timings included in Additional File 1). This result supports the notion that pragmatic data access patterns are well supported by existing, albeit proprietary, manufacturer libraries, and more importantly, that these libraries can be efficiently utilized through a common and redistributable API. multiplierz: An Open-Source and Interactive Environment for Proteomics Data Analysis We extend the functionality of mzAPI by integration into multiplierz, an open-source Python-based environment that provides a flexible framework for comprehensive analysis of proteomics data. Figure 1 illustrates our proposed implementation; all associated code and scripts are available for download at: http://blais.dfci.harvard.edu/multiplierz. In the following sections, we provide a detailed description of core multiplierz capabilities. A Common API and Desktop Environment for Mass Spectrometry Data Analysis. The multiplierz environment provides a central point for user interaction with proprietary data files (via mzAPI), protein/peptide identification algorithms, publicly available annotation databases, and commercial reporting and spreadsheet tools. Our proposal calls for manufacturers to provide a minimal set of libraries for access to their native data files. Ad hoc data analysis tasks are supported through multiplierz scripting capability, including programmatic access for integration into data analytic pipelines. Regardless of final experimental goals, peptide identification is often the first or default operation performed subsequent to LC-MS data acquisition. We designed multiplierz to serve as a user-friendly, desktop tool for interaction with proteomics database search engines; consistent with our zero-infrastructure philosophy, X!Tandem [14] is fully integrated into the multiplierz installation package. Similarly we include support for automated retrieval of Mascot search results. In this case, the URL for a particular search is easily and unambiguously accessed using the Mascot job ID (after completion of the search, the Mascot ID is both on the search submission page and in the Mascot Daemon). The multiplierz module for downloading Mascot search results also allows input for Mascot-specific export options such as "Require Bold Red" and "Maximum Number of Protein hits." Multiple search results are specified using either a comma- or dash-separated list of Mascot Job IDs (e.g., 6556, 5878, 5120-5125). Users can optionally include Mascot MS/MS fragment annotation images (that are displayed in multiple Mascot report web pages) and embed them within a singular multiplierz report; thus multiplierz provides users with comprehensive Mascot information, including images, in a convenient and portable report (described below). Importantly, none of the above tasks require server level administrative privileges. For example, query of MS/MS peak annotations typically requires logon credentials within the web browser. multiplierz interacts with the browser to "screen scrape" MS/MS images and store them within the default report format. Users with full access to the Mascot server may parse results directly from the .DAT file using .mz scripts (multiplierz reports and .mz scripts and described below). Similar support is also provided for Protein Pilot [15] and OMSSA [16]. For maximum flexibility in conversion of parsed data from other search engines we include modules for generation of multiplierz-compatible spreadsheets. Calculation of a false discovery rate (FDR) for peptide sequence identifications is one mechanism to assess the overall quality of search results [17,18]. multiplierz supports calculation of a FDR upon retrieval of peptide identification data from both forward and reverse database searches. The FDR for a given score threshold is calculated as the ratio of reverse database search identifications to that from the forward plus reverse searches, each with a score greater than or equal to the chosen threshold. The FDR thus represents the percentage of identified peptides in the forward search that would also be detected in the reverse database search. multiplierz identifies score thresholds for commonly used FDR (1%, 2%, and 5%) as well as calculates the FDR for each forward peptide score via an .mz Script (see below; scripts for generating a reverse database and calculating the FDR are included in Additional File 2). Correlation of identified peptide sequences with specific features in the source mass spectrometry data, such as chromatographic peak width or maximum precursor intensity, is often complicated by the requirement for users to move between disparate programs and interfaces. The multiplierz desktop environment provides users with a centralized point of interaction with both search results and the underlying mass spectrometry files. For example, high-confidence peptide identifications may be used for direct generation of RICs across user-defined time and mass-to-charge ranges. Various metrics such as full peak width at half maximum (FWHM), peak area, and apex precursor intensity for peptide elution profiles are included in the output report. As described below these data are combined, annotated, and made available in portable, user-friendly reports. Consistent with our underlying motivation to combine open-source and commercial software where appropriate, we opted to export multiplierz results into Microsoft Excel. We take advantage of Excel's ability to store images in the worksheet (as comments), thereby creating an information-rich, yet portable, report that may be readily formatted to meet specific scientific journal data submission requirements [19]. Moreover, we note that the tendency towards analysis of increasingly complex mixtures along with continued efforts in relative protein quantification have placed increased emphasis on data reproducibility in proteomics experiments. Hence it has become common practice to derive a "proteomics result" from comparison, or other manipulation, of multiple mass spectrometry acquisitions. In support of this experimental paradigm, multiplierz includes "multi-detect" and "multi-filter" tools that provide users tremendous flexibility in filtering and collating (e.g., by common or unique proteins, peptides, post-translational modifications, charge state, etc.) each data file obtained from a multiple acquisition study. This feature provides database-like functionality without the need to install and maintain dedicated database servers. Importantly, all multiplierz functions provide spreadsheet-based output with optional embedded images (see Figure 2 and discussion below). Integration of Commercial Tools for multiplierz Reports. A spreadsheet-based report from multiplierz analysis of 11 LC-MS analyses, designed to interrogate performance of LC column geometry and flow rate (see also Figure 5 and Additional File 5). For clarity the spreadsheet shows peptide entries and characterization data from the two extremes in column size and flow rate. Informative images are embedded within spreadsheet cell comments and are accessed by mouse-over, thus facilitating rapid visual inspection. Optional embedded images include: 1.) MS/MS spectra that are annotated with b- and y-type fragment ion labels, peptide sequence, search engine score (in this case Mascot peptide score), and precursor charge state. A color scheme highlights modified amino acids (in this case oxidized methionine) and those residues inferred from b- and y-type fragment ion assignments (horizontal lines, red and blue denote singly- and doubly-charged ions, respectively). 2.) RIC images in which MS scans are annotated with green squares, while yellow circles and blue triangles denote MS/MS scans for the precursor of interest, with the latter indicating the specific MS2 event described in the selected row of the spreadsheet. 3.) precursor region of the MS spectrum (not shown). The highly embedded, spreadsheet-based multiplierz reports provide a very flexible and user-friendly mechanism to query various metrics of the underlying native mass spectrometry data and quickly collate search results based on user-defined filter criteria. However, it is often the case that researchers must go beyond these general characterizations and focus on a small subset of their proteomics data in support of targeted biological questions. To enable this mode of data analysis, multiplierz includes a Peak Viewer tool (Figure 3) that provides dynamic and interactive plots for precursor RICs, and corresponding MS and MS/MS scans. Additionally, users can edit and export publication quality images through a built-in Scrapbook tool (Figure 4). Features of the Peak Viewer include (i) visualization of theoretical fragments superimposed on MS/MS spectra, (ii) automatic zoom-in display for iTRAQ and theoretical ions for rapid manual validation, and (iii) comparison of scans and RICs via mirror and overlay functions. For added convenience the Peak Viewer opens multiplierz spreadsheets and users can generate plots by a simple double-click on specific rows or peptide entries. Dynamic Visualization of Proprietary Mass Spectrometry Data Files. The Peak Viewer tool in multiplierz provides interactive plots for precursor RICs, and corresponding MS and MS/MS scans, in centroid or profile modes. Green squares in the RIC denote MS scans and red triangles indicate MS/MS events. In addition, users may adjust the time or m/z range displayed in each data window. Verification of peptide sequence is facilitated by overlay of theoretical fragment ions on the MS/MS spectra. Users may dynamically evaluate multiple peptide assignment options by changing the proposed sequence or post-translational modification state in the left-most pane. Generation of Publication Quality Images. A Scrapbook tool allows manipulation of Peak Viewer plot properties such as axes labels, titles, and size. Multi-plot comparison via the mirror and overlay functions provide further modes for in-depth, manual data interrogation. Users may export publication quality images from all Scrapbook plots. Researchers are increasingly focused on integration of disparate data types in order to better understand biological phenomena at the so-called network or systems level. As a first step in support of these and similar activities, multiplierz automatically downloads GenBank data over the internet based on an identified protein list, parses information such as gene ontology and domain classification, along with the corresponding Entrez Gene, HPRD, HGNC, and OMIM entries, and then creates hyperlinks directly in the spreadsheet reports. This and other tools including an in silico protein digestion tool and a peptide fragment calculator are described in Additional File 3. While multiplierz includes many built-in features and tools, we also recognize the difficulty of building a "one size fits all" application given the diversity of ideas and efforts pursued within individual research laboratories. Hence multiplierz includes a command line console as well as scripting capability (through ".mz" scripts) which together support ad hoc data analysis tasks. The scripting capability is particularly useful for niche experiments or proteomics workflows not otherwise supported by other open-source or proprietary data systems. All multiplierz tools are available through both the desktop GUI as well as scriptable procedures. In addition, a pre-launch initialization ("rc.mz") script enables full customization of the application and its interfaces without recompiling the underlying code. Finally, we note that programmatic access to mzAPI allows incorporation of multiplierz into automated data-analytic pipelines. For example, users can submit jobs through a laboratory information management system (LIMS). Upon completion of LC-MS acquisition(s) and database search(es), multiplierz executes .mz scripts to access both the search results and underlying .RAW or .WIFF file(s), in order to create a spreadsheet-based report. Users can be notified by email and access their results via the multiplierz desktop environment. Importantly, multiplierz spreadsheet reports, whether generated in low- or high-throughput mode, are portable and readily formatted in accordance with journal-specific requirements for proteomics data. Collectively the features described above facilitate a wide range of data analysis tasks for mass spectrometry-based proteomics activities from technology development and evaluation to prioritization of protein identifications for subsequent biochemical validation. Importantly, multiplierz provides these capabilities to individual users at the desktop level. Results In the following sections, we demonstrate the functionality of multiplierz through relevant examples based on data and results from work in our laboratory. Significantly we note that these examples encompass data generated on mass spectrometers manufactured by ThermoFisher Scientific and AB-SCIEX. Optimization of LC Assemblies and Methods We recently described a novel protocol for fabrication of miniaturized LC-electrospray assemblies that provided significantly improved LC-MS performance [20]. Not surprisingly, elucidation of relevant analytical figures of merit required in-depth and large scale data analysis. Figure 5 shows the multiplierz-dependent workflow required to evaluate the relative performance improvement for analysis of tryptic peptides derived from whole cell lysate as a function of column size and flow rate (also see Additional File 5). From approximately 90,000 MS/MS scans encompassing almost 23,000 peptide assignments (combination of sequence, charge state, and modification) multiplierz identified 198 unique peptide sequences and modifications in common across 11 LC-MS acquisitions. In addition, multiplierz used Mascot-derived peptide identifications to generate RICs, calculate full chromatographic peak width at half-maximum (FWHM), and determine precursor apex intensity. The entire analysis was performed via the multiplierz GUI. Finally, the embedded RIC images facilitated rapid validation and comparison of chromatographic features. Multiplierz -based Workflow for Analysis of LC Column Geometry and Flow Rate. Relative performance improvement for analysis of peptides derived from whole cell lysate as a function of column size and flow rate (a). Original data contained ~90,000 MS/MS scans (b) encompassing almost 23,000 peptide assignments (combination of sequence, charge state, and modification), across 11 LC-MS acquisitions. In addition, multiplierz used Mascot-derived peptide identifications (c) to generate RICs, calculate full chromatographic peak width at half-maximum (FWHM), and determine precursor apex intensity (d). Common peptide sequences and associated analytical metrics are extracted (e) into a final, spreadsheet based report (f). Figure 2 (see above) shows an example of a multiplierz standard format report. To simplify the display we generated a comparison report (using multiplierz) for the two extremes in the 11 LC-MS acquisitions described above. The insets show examples of optional embedded images. We note that, unlike many web-based reports that often require frequent page updates, multiplierz images display immediately upon mouse-over, and hence facilitate rapid data validation and interrogation exercises. Optimization of Phosphopeptide Enrichment Methods In the aforementioned study, we leveraged the improved performance of our miniaturized LC-electrospray assemblies to elucidate signaling events in embryonic stem cells [20]. Our specific choice to focus on tyrosine phosphorylation as a direct probe of the molecular events required for self-renewal and differentiation in these cells required optimization of enrichment protocols for peptides carrying this rare post-translational modification. A typical strategy would be to simply adjust experimental conditions to yield a maximum number of phosphotyrosine sites subsequent to LC-MS/MS and database search. However, given the acutely low levels of tyrosine phosphorylation in embryonic stem cells, we chose instead to gauge enrichment efficiency based on the relative fraction of MS/MS scans that contained a phosphotyrosine immonium ion (m/z = 216.04) [21,22], irrespective of any putative peptide sequence assignment. This strategy allowed us to readily decouple low overall peptide yield from poor enrichment of phosphotyrosine containing peptides in experiments that generally provided modest numbers of peptide identifications (compared to typical large-scale proteomics studies). Figure 6a shows the .mz script used to probe MS/MS scans for the presence of a diagnostic fragment ion at m/z = 216. Note that Python's clear and concise syntax is readily accessible, as compared to that encountered with manufacturer libraries and data systems. Consistent with our reporting strategy, this script outputs a tab delimited file that we readily filter in Excel to generate a histogram view of our phosphotyrosine enrichment efficiency (Figure 6b). User-defined Customization Through .mz Scripts. A short .mz script (a) opens each MS/MS spectrum within a given LC-MS acquisition and returns those scans that contain the phosphotyrosine immonium ion (m/z = 216). The tab-delimited multiplierz output is readily opened in Excel and (b) a histogram view facilitates rapid evaluation of enrichment efficiency. In this example, the majority of peptides in the heart of the LC gradient (~25 - 65 min.) contain phosphotyrosine residues as evidenced by the presence of an m/z = 216 immonium ion. In a separate report we described the novel application of niobium(V) oxide (Nb2O5) for global enrichment of phosphopeptides from complex, biologically derived mixtures [23]. The "multi-detect" and "multi-filter" tools in multiplierz were used to compare phosphopeptides enriched via Nb2O5 and TiO2, (the current standard), and detected across multiple LC-MS/MS analyses. Furthermore, to assess potential bias introduced by the stochastic nature of MS/MS, we compared the precursor peak intensities of unique versus commonly detected phosphopeptides that resulted from each method, and confirmed that Nb2O5 and TiO2 exhibited an empirically useful degree of divergence with respect to phosphopeptide enrichment (Figure 7, reprinted from Ficarro et al. [23] by permission from the American Chemical Society). Quantitative Comparison of Phosphopeptide Enrichment Methods. Histogram distributions of peak heights for unique and overlapping phosphopeptides detected in conjunction with (a) (TiO2)-, and (b) (Nb2O5)-based enrichment. The intensity distributions for phosphopeptides assigned uniquely to either metal oxide did not differ significantly from the intensity distributions for commonly detected phosphopeptides, indicating that the unique precursors were not confined to low signal-to-noise regions. Reprinted from [23] by permission from the American Chemical Society. Improved Peptide Sequence Assignment via De-isotoped MS/MS Spectra In another recent study, we optimized performance of orbitrap HCD MS/MS through systematic exploration of various instrument and post-acquisition parameters [24]. In the context of this work, we observed that high charge state (z > 2+) precursors were frequently not assigned to a peptide sequence despite an otherwise high quality fragment ion spectrum. We speculated that the presence of multiple isotope peaks per fragment ion in the high resolution Orbitrap MS/MS scans may degrade the sensitivity of the search algorithm, resulting in fewer high-confidence sequence assignments. Therefore, we generated an .mz Script (see Additional File 2) that de-isotoped [25] each fragment ion cluster and output a charge state reduced peak list for submission to Mascot. A variety of parameters can be used to adjust the stringency of spectrum filtering such as maximum charge state, minimum fragment ion mass-to-charge ratio, as well as an option to remove any precursor signal that may remain in the MS/MS spectrum. Overall we realized an approximate 30% gain in the number of high-confidence (Mascot score > 30) peptide sequence assignments for high charge state precursors (Table 2). Improved Peptide Sequence Assignment via De-isotoped MS/MS Spectra. Label-Free Quantitative Proteomics Relative protein quantitation can be achieved via a label-free approach whereby tryptic digests of protein samples are analyzed without incorporation of stable isotope labels; the resulting peak intensities (or areas) for the constituent peptides are combined and compared across samples as well as within replicates [26]. Typically 3-5 replicates of each sample are required to account for non-systematic errors associated with shifts in chromatographic elution time, temperature, electrospray stability, etc. For very large studies, performed across extended periods of time and multiple labs, complex software is typically required to combine, align and analyze the resulting native mass spectrometry files. In contrast, we demonstrate a strategy similar to the one described by Bondarenko et al. [27], which is deployed entirely within multiplierz: extraction of MS/MS peak lists, X!Tandem based peptide identification, and the generation of common sequences (detected in at least k out of the n RAW files being analyzed) are implemented directly from the multiplierz menu-system. Finally, feature extraction, quantitation, and report generation is performed via an additional mz script (see Additional File 2). Figure 8 shows an Excel-based report for label-free analysis of two standard protein mixtures (5 proteins each, containing ratios of 1:11, 1:5, 1:1, 2:1 and 5:1, respectively, and analyzed in duplicate). Label-Free Relative Protein Quantification. A multiplierz Excel report provides data analytic figures of merit, including: (a) the ratio of each protein across two conditions with an embedded box plot that illustrates the distribution of feature-level ratios, where each feature is defined as a (peptide, modification, charge state) combination; (b) p-value for the significance of the ratio; (c) the number (N) of features underlying the protein quantification; (d) the ratio and embedded RIC plot (showing all RICs used to quantify the peptide - colored by sample source) from the peptide most representative of the final protein ratio; (e) "expected" field is added manually by the user based on the experimental design. Users may also generate associated graphs using native plotting capabilities in Excel. Automated Quality Control of Mass Spectrometer Instrument Performance High throughput or other core-type operations designed to run in an unattended manner benefit from automated quality control assessment of platform performance. For example, periodic confirmation of measured peptide mass accuracy is required to ensure the integrity of instrument calibration routines. Towards this end, we created a short .mz script (see Additional File 2), which extracts measured mass-to-charge values for a list of standard peptides from a native data file, and automatically calculates mass errors. The output is a calibration report (Figure 9) that shows a reconstructed ion chromatogram and experimental mass accuracy for each standard peptide. The measured mass errors may be used as input for mass tolerance parameters in subsequent database search algorithms (e.g., Mascot, SEQUEST, X!Tandem, etc.) for peptide sequence identification. Instrument Calibration Quality Control Report. An .mz script is used to automatically generate a spreadsheet report that indicates mass errors (in ppm) for a set of standard peptides. Images of the precursor isotope distribution and reconstructed ion chromatogram are embedded within the report for rapid confirmation of mass spectrometry and chromatographic performance. In a second application, we developed a routine for recalibration of MS/MS spectra. It is widely recognized that increased mass accuracy provides for higher stringency searches and yields improved results [28]. First, a given set of MS/MS spectra are searched with mass tolerance values based on the most recent mass calibration parameters. Under these conditions we typically observe a monotonic increase in mass error as a function of fragment ion mass (Figure 10a). A high-confidence peptide is selected from the search output, and the corresponding annotated MS/MS spectrum is used to compute the slope and intercept of the linear mass error function. This equation is then used to recalibrate precursor and product ion masses via an .mz script (see Additional File 2). Finally, we re-search the newly calibrated dataset with a narrower tolerance, reducing the average mass error (Figure 10b). Recalibration of Data Acquired on a Quadrupole Time-of-Flight (QTOF) Mass Spectrometer. Fragment ion mass errors for the peptide ADISSDQIAAIGITNQR (based on Mascot assignment), derived from the protein glycerol kinase (a) before and (b) after recalibration. Conclusion We recognize that some aspects of our proposal diverge from current efforts to establish community standards in proteomics. For example, the use of mzAPI within multiplierz to provide direct access to binary mass spectrometry files does not rely on XML-based surrogate files. We note however, that the two strategies are not mutually exclusive; that is, support for mzXML [9], or the recently described mzML [29] can be readily incorporated into mzAPI. Similarly, output from multiplierz can be readily formatted in pepXML [12]. In addition, recent discussions focused on data sharing in proteomics suggest that standards may evolve beyond XML-based formats [30,31]. Equally important, the emergence of translation layers such as cygwin [32] and Wine [33], continue to blur inter-platform boundaries, such that software solutions amenable to the widest audience may eclipse those based largely on platform independence. In fact, our use of Microsoft Excel as the default report output for multiplierz is one such example. Similar image-enhanced spreadsheets may be generated in open formats such as OpenOffice.org XML [34] (see Additional File 4), but our experience to date indicates that the majority of biomedical researchers still opt for commercial spreadsheet solutions, either out of familiarity or because of existing institutional support. The multiplierz framework is accessible to a wide range of researchers, and simultaneously provides support for novel algorithm development as well as deployment of automated data pipelines. As a central point of integration for information from publically available databases and native data from proprietary instrument platforms, multiplierz offers compelling addition to the ongoing discourse aimed at identifying an effective means to enable broad access and data exchange in the proteomics community. In particular, incorporation of mzAPI into the multiplierz desktop architecture may offer a better impedance match between the rate of proprietary mass spectrometry innovation and researchers' demands for increased autonomy in their data analysis tasks. Availability and Requirements • Project name: multiplierz • Project home page: http://blais.dfci.harvard.edu/multiplierz • Operating system(s): Microsoft Windows • Programming language: Python • License: open source under LGPL (PS)2-v2: template-based protein structure prediction server Abstract Background Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)2-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability. Results To detect homologous proteins with remote similarity, the (PS)2-v2 server utilizes the S2A2 matrix, which is a 60 × 60 substitution matrix using the secondary structure propensities of 20 amino acids, and the position-specific sequence profile (PSSM) generated by PSI-BLAST. In addition, our server uses multiple templates and multiple models to build and assess models. Our method was evaluated on the Lindahl benchmark for fold recognition and ProSup benchmark for sequence alignment. Evaluation results indicated that our method outperforms sequence-profile approaches, and had comparable performance to that of structure-based methods on these benchmarks. Finally, we tested our method using the 154 TBM targets of the CASP8 (Critical Assessment of Techniques for Protein Structure Prediction) dataset. Experimental results show that (PS)2-v2 is ranked 6th among 72 severs and is faster than the top-rank five serves, which utilize ab initio methods. Conclusion Experimental results demonstrate that (PS)2-v2 with the S2A2 matrix is useful for template selections and target-template alignments by blending the amino acid and structural propensities. The multiple-template and multiple-model strategies are able to significantly improve the accuracies for target-template alignments in the twilight zone. We believe that this server is useful in structure prediction and modeling, especially in detecting homologous templates with sequence similarity in the twilight zone. Background For template-based modeling (TBM) and fold recognition methods, a prediction model can be built based on the coordinates of the appropriate template(s) [1]. These approaches generally involve four steps: 1) a representative protein structure database is searched to identify a template that is structurally similar to the protein target; 2) an alignment between the target and the template is generated that should align equivalent residues together as in the case of a structural alignment; 3) a prediction structure of the target is built based on the alignment and the selected template structure, and 4) model quality evaluation. The first two steps significantly affect the quality of the final model prediction in TBM methods. The secondary structure of a protein is often more conserved than the amino acid sequence, and the prediction accuracy of the secondary structure has been achieved ~80% on average. Recently, a number of methods, integrating secondary structures (i.e., α-helix, ß-strand and coil) with primary amino acid sequences, have successfully detected the homologs with remote similarity for automated comparative modeling [2-6] and fold recognition [7-12]. These methods often used two separated substitution matrices [9,10,13] to score secondary structures and primary amino acids, respectively, for aligning a residue pair. The separated matrices are unable to reflect the real score because the amino acid type often prefers to a specific secondary structure. Here, we have developed a substitution matrix, called S2A2, which considers the properties of the secondary structures and amino acid types. The S2A2 is a 60 × 60 matrix that considers all possible pair combination of 20 amino acid types and three secondary structure elements. This matrix was evaluated on the Lindahl benchmark [14] for fold recognition and the ProSup benchmark [15] for alignment accuracies. According to these evaluation results, the S2A2 matrix has higher accuracy than position specific scoring matrix (PSSM) generated by PSI-BLAST and prof_sim for fold recognition and sequence alignments. By integrating the S2A2 matrix and PSSM, each having a unique scoring mechanism, the (PS)2-v2 server blends the sequence profile and secondary structure information so that they work cooperatively. Numerous enhancements and modifications were applied to original (PS)2 servers (namely (PS)2-original) [16] and (PS)2-CASP8 [17] which participates the CASP8 experiment, thereby improving the reliability and applicability of the method. There are four main differences in methodology between the present server ((PS)2-v2) and our previous works (Table 1). First, (PS)2-v2 integrates S2A2 matrix and PSSM for the template selection and the target-template alignment to replace a consensus strategy applied in the (PS)2-original server. Second, we modified the SSEARCH [18] search method to replace the PSI-BLAST search method and Smith-Waterman algorithm applied in the (PS)2-original server and (PS)2-CASP8, respectively. Third, (PS)2-v2 utilized a new multiple template method for modeling different domains of the target sequence. Finally, (PS)2-v2 added a multiple model strategy and utilized ProQ [19] to assess and select the final model. We have assessed the prediction accuracy of the (PS)2-v2 server based on the 154 TBM targets of the CASP8 dataset. The experimental results show that the S2A2 matrix, multiple-template and multiple-model strategies are able to significantly improve the accuracies for protein structure prediction and modeling when the sequence similarity between the template and the target is in the twilight zone. The essential differences of (PS)2-original, (PS)2-CASP8 and (PS)2-v2 Methods Figures 1 and 2 show the framework of the (PS)2-v2 server for protein structure prediction. (PS)2-v2 uses the S2A2 matrix and the PSSM for the template selection and the target-template alignment. (PS)2-v2 first applied the query sequence to generate a PSSM by running three iterations of PSI-BLAST against a non-redundant sequence UniRef90 [20] with an E-value cutoff of 0.001. The PSSM was then used as the input for the PSIPRED [21] tool to predict the secondary structure of this query. We then modified the SSEARCH [18] search method, using the S2A2 matrix and the PSSM as the scoring matrices, to identify the template(s) from the protein structure library, and to generate the target-template alignment(s). The library consists of 20,982 non-redundant structures (April, 2008) selected from protein data bank (PDB) [22]. The secondary structures of each structure in the library were assigned using DSSP [23]. Based on various target-template alignments of top-ranking 5 selected templates, (PS)2-v2 generates 30 protein structures using MODELLER [24]. Finally, the program ProQ was used to evaluate these models and to select the final model for the target. The S2A2 matrix, the aligned method, the modeling process and the final model selection are described in the following subsections. The components of the (PS)2-v2 server were built using C, Perl and PHP (Additional file 1). The framework of the (PS)2-v2 server for protein structure prediction. Overview of the (PS)2-v2 server. The protein sequence of telomere replication protein Est3 (UniProt Q03096) in Saccharomyces cerevisiae was used as the query. (A) Input format of the (PS)2-v2 server. (B) Search results of a query protein, comprising target name, sequence, predicted secondary structure, the graph of the aligned regions and the hits list of the templates of the query. (C) The selected template, target-template alignment and prediction structure of Est3. (D) The visualization of the predicted structure for Est3. (E) The model quality assessment. S2A2 matrix A substitution matrix is the key component of protein sequence alignment methods. We developed the S2A2 substitution matrix (Figure 3 and Figure S1 in Additional file 2) applying a general mathematical structure [25]. To calculate the S2A2, 674 structural pairs (1,348 proteins) [26], which are structurally similar and with low sequence identity, were selected from SCOP 1.65 [27] based on two criteria: 1) the root-mean-square deviation (rmsd) of a protein pair was be less than 3.5 A, with more than 70% of aligned residues included in the rmsd calculation, and 2) the sequence identity of a pair is less than 40%. The selected protein pairs had an average sequence identity of 26%, an average rmsd of 2.3 A and average aligned residues of 90% (207,492 aligned residues out of 230,915 residues). The program DSSP was used to assign the secondary structure for each residue of these 674 structural pairs. The eight types of the secondary structure used in DSSP were reduced to three commonly accepted types (H (helix), E (strand) and C (coil)) according to the following scheme: (H, G, I) › H; (E, B) › E; (T, S, blank) › C. The 20 amino acid types and 3 secondary structure types were converted into 60 residue-structure (RS) types. The S2A2 substitution matrix. The scores are high if the residue-structure (RS) letters with similar residue types and the same secondary structure are aligned (red blocks). When two identical RS letters (e.g. diagonal entries) are aligned, the substitution scores are very high. In contrast, the scores are low when helix letters are aligned with strand letters (blue blocks). The S2A2 matrix (60 × 60) reveals substitution preferences between homologs with low sequence identity, and was developed in a similar way to BLOSUM62 [25] based on these 674 structural pairs. The entry (Sij), which is the substitution score for aligning a RS letter i, j pair (1 ≤ i, j ≤ 60), of the S2A2 matrix is defined as Sij = λlog2(qij/eij), where λ is a scale factor, and qij and eij are the observed and expected probabilities, respectively, of the occurrence of each i, j pair. The observed probability is given by , where fij is the total number of aligning i, j pairs in these 207,492 RS letters. The factor eij = pipj if i = j; otherwise, eij = 2pipj (if i ≠ j), where pi is the background probability of occurrence of the letter i, and equals . The substitution score is greater than zero (Sij > 0) if the observed probability is greater than the expected probability. By contrast, Sij < 0 if qij x) = 1 - exp(-Kmn exp(-λx)) where m, n are the lengths of the query and library sequence. The score shows that the average score for an unrelated library sequence increases with the logarithm of the length of the library sequence. SSEARCH uses simple linear regression against the log of the library sequence length to calculate a normalized "z-score" with mean 50, regardless of library sequence length, and variance 10. These z-scores can then be used with the extreme value distribution and the Poisson distribution to calculate the number of library sequences to obtain a score (i.e. E-value) greater than or equal to the score obtained in the search. The top-ranking 5 templates with the lowest E-values were considered as the templates if the E-values < 0.1. For each structure in the top-ranking 5 templates, The (PS)2-v2 server generated six alternative target-template alignments by using different S2A2-matrix (wS2A2) weights, including 0, 0.2, 0.4, 0.64, 0.8 and 1.0. Finally, we yielded 30 target-template alignments for a target protein. Model building and evaluation Protein structure models were built using the homology modeling tool, MODELLER [24] according to the selected template(s) and target-template alignment(s) and then the ability to discriminate a correct protein model from incorrect models is critical when a server used multiple model methods. Here, we utilized the program ProQ [19] to assess the quality of protein models based on the LGscore [30] and a model was considered correct if the LGscore was greater than 1.5 [19]. The (PS)2-v2 server first selected the protein model, generated by the first rank template with wS2A2 = 0.64 as the seed model. The LGscore of the seed model was then compared with those of the other models based on the top-rank 5 templates with different wS2A2 weights. A model was chosen as the final one if it had the highest LGscore and its LGscore (> 0.7) was significantly better than that of the seed model. Otherwise, the server selected the seed model as the final model. Multiple-template method (PS)2-v2 considered a target as a multiple domain protein if any region with >40 residues has non-aligned residues to the template(s) when using above "model building and evaluation" steps. For a multiple domain protein, (PS)2-v2 automatically decided domain boundaries based on the borders of the large gaps between the target and the template(s), and repeatedly executed above steps to model the structures of the non-aligned residues (Figure 1). Finally, these multiple models were then used as structure templates to generate the full-length final model for the query protein. Utility Input format The (PS)2-v2 server is an easy-to-use web server (Figure 2). Users input the query protein sequence in FASTA format. The server provides three modes (Automatic, Manual and 'Use this template') for choosing template(s) (Figure 2A). The default mode is 'Automatic'. In this mode, (PS)2-v2 automatically selects the modeling template(s). For the 'Manual' mode, our server enables users to assign specific template(s) from a list of candidates (Figure 2B). The 'Use this template' mode allows users to assign a specific protein structure as the template. Finally, (PS)2-v2 transmits the predicted results to the users by email addresses. Output format The (PS)2-v2 server typically yields a predicted structure within 7 minutes if the query sequence length is ~200. The server shows a list of templates, selected template(s), target-template alignment(s), predicted structure(s) and structure evaluations (Figures 2B and 2C). The predicted structures are visualized in PNG format generated by the MolScript [31] and Raster3D [32] packages. If the user clicks a PNG picture, then the corresponding protein 3D structure is also displayed on the AstexViewer [33] (Figure 2D). A user can download the predicted structure coordinates in the PDB format. The server also provides the target-template alignments and the structure quality factors (Figure 2E). Modeling of ever shorter telomeres 3 The ever shorter telomeres 3 (Est3, UniProt Q03096), which is essential for telomere replication in vivo, is a small regulatory subunit of telomerase from Saccharomyces cerevisiae. According to structure prediction combined with in vivo characterization, it has been reported that Est3 consists of a predicted OB-fold (oligosaccharide/oligonucleotide binding) with structurally similar to the OB-fold of the human Tpp1 protein [34]. Because of the limited degree of conservation between these two protein families, these two proteins could not be recognized from simple sequence profile methods. Additionally, the original (PS)2 -v2 server could not recognize them. For the target Est3, the (PS)2 -v2 server selected the OB-fold domain of the Tpp1 protein (PDB code 2i46) from Homo sapiens as the template [35], with an E-value of 0.014. This template shared only 17.6% sequence identity with the query sequence. Figure 2C shows the target-template alignment. The server successfully recognized Tpp1 as the template since the secondary structure identity between the template and Est3 was 66.7%. Our method could align together three conserved residues (i.e. Trp21/Trp98, Asp86/Asp148 and Leu155/Leu204, in Est3 versus Tpp1; green blocks in Figure 2C), which are primarily involved in protein folding and/or stability of the OB-fold. Seven amino acid positions (yellow blocks in Figure 2C), which are structurally similar between the two protein families, were also aligned. These 10 aligned residues, depicted in cyan, are clustered in the interior of the core of the OB-fold (Figure 2D). Results and Discussion In the template-based protein structure prediction, the template selection and the target-template alignment are the two critical steps, since they will significantly affect the quality of the final model prediction. The template selections and the sequence alignments of the proposed method with the S2A2 matrix were evaluated by the Lindahl benchmark [14] and ProSup benchmark [15], respectively. In general, it is neither straightforward nor completely fair to compare the results of different fold-recognition and alignment methods given that each employs different sequence databases for sequence profiles, structure databases for structure profiles and properties, release dates, and scoring functions. Therefore, the comparisons between our methods and other published methods serve as an approximate guide. Here, we evaluated S2A2 matrix, PSI-BLAST and prof_sim using the same sequence database, UniRef90 [20], with the same parameters to generate a PSSM for fold recognitions (Lindahl benchmark) and sequence alignment (ProSup benchmark). Furthermore, (PS)2-v2 was assessed and compared with other 71 automatic servers on 154 TBM targets in CASP8. Please note that (PS)2-v2 did not participate in the CASP8 experiment. Evaluation of S2A2 matrix The S2A2 matrix (60 × 60) offers insights about substitution preferences of RS letters between homologous protein sequences (Figure 3 and Figure S1 in Additional file 2). The highest substitution score in this matrix is for the alignment of a RS letter 'Wß' with a RS letter 'Wß', where Wß is the residue Trp with the ß-strand structure (Figure S1 in Additional file 2). This substitution score is 6.2. In addition, the substitution scores are also high when two identical structural letters (e.g., diagonal entries) are aligned. For example, the alignment scores are 5.6 and 6.1 while 'Wα' and 'Cα' are aligned with 'Wα' and 'Cß', respectively; where Wα is the residue Trp with the α-helix structure and Cα represents the residue Cys with the α-helix structure. Most of the substitution scores are positive if two RS letters in the same secondary structure are aligned. On the other hand, the lowest substitution score is -7.8 in this S2A2. All of the substitution scores are low when the helix RS letters are aligned with the strand RS letters. The above relationships are in good agreement with biological functions of the relevant structures, showing that the matrix S2A2 embodies conventional knowledge about secondary structure conservation in proteins. We compared the S2A2 matrix with BLOSUM62. The highest substitution scores are 6.2 (S2A2) and 11 (BLOSUM62). In contrast, the lowest score for S2A2 (-7.8) is much lower than that for BLOSUM62 (-4). The main reasons for this large difference are that α-helices and ß-strands constitute very different protein secondary structures, and the RS letters pertaining to these two types of structure are more conserved than amino acid sequences. These results demonstrate that the RS letters with the S2A2 matrix may be able to more accurately find remote homologous sequences than simple amino acid sequence analyses. Template selection For the template selection, our method with S2A2 matrix was compared to other methods on Lindahl benchmark [14], which consists of 976 proteins, for the fold recognition. This set included 555, 434 and 321 assignments for the family, superfamily and fold levels, respectively. The S2A2 matrix outperforms PSI-BLAST and is comparative to other methods on this set (Table 2). Our method (S2A2+PSSM), incorporating PSSM into S2A2, is the best for detecting similarity on the superfamily and fold levels for the top five ranks among the 10 comparative methods. At the superfamily level, the S2A2+PSSM, PSI-BLAST and prof_sim [36] identified 75.6%, 49.1% and 61.3% of assignments, respectively. At the fold level, the S2A2+PSSM (54.5%) outperformed PSI-BLAST (14.6%) and prof_sim (39.6%) in identifying homologous pairs. Comparing S2A2 matrix with other methods for fold recognition on the Lindahl benchmark Target-template alignment For the alignment between the target and the template, our algorithm was evaluated based on the ProSup benchmark [15], which consists of 127 protein pairs with significant structural similarity but with sequence identity of no more than 30%. The total numbers of correctly aligned residue pairs (Tc) of the S2A2, S2A2+PSSM, prof_sim and SSALN [10] were 8732, 9470, 8009 and 9256 pairs, respectively (Table 3). The percentage σ0 (average percentage of correctly aligned residues, divided by the length of the structural alignment per protein pair) of the S2A2, S2A2+PSSM, PSI-BLAST, prof_sim and SSALN were 53.4%, 58.7%, 36.4%, 43.6% and 58.3%, respectively. The S2A2 matrix is significantly better than those of sequence-based approaches, including FASTA, PSI-BLAST and prof_sim. The S2A2+PSSM achieved the highest alignment accuracy with slightly better than SPARKS [9] and SSALN, and much better than the other comparative methods. Comparing S2A2 matrix with other methods for sequence alignment accuracies on the ProSup benchmark CASP8 structure prediction Our previous server ((PS)2-CASP8) and other 70 servers participated in the CASP8 competition, involving 121 targets for tertiary structure prediction. These 121 targets are officially classified into 154 TBM domains (Table S1 in Additional file 3). The accuracies of these 71 servers were evaluated based on the GDT_TS [37] scores directly summarized from the CASP8 website http://predictioncenter.org/casp8/. (PS)2-v2, (PS)2-original and (PS)2-CASP8 servers were evaluated on these 154 TBM targets (Figure 4, Table 4 and Table S2 in Additional file 4). The sum of GDT_TS scores were 10331.4 ((PS)2-v2), 9954.4 ((PS)2-CASP8) and 9447.5 ((PS)2-original), respectively. (PS)2-v2 yielded 99 and 34 higher GDT_TS scores than (PS)2-original and (PS)2-CASP8, respectively, among 154 targets. When the sequence identity between the target and template was more than 30%, these three servers achieved similar GDT_TS scores. However, if the sequence identity was less than 20%, the (PS)2-v2 server was significantly better than (PS)2-original server (p-value is 4.0E-7) and (PS)2-CASP8 (p-value is 6.6E-4) using the paired Student's t-test (Table 4). For each target in CASP8, Table S2 (in Additional file 4) shows the GDT_TS score improvement with contributing components (i.e. multiple templates, multiple models, and template search method) between the (PS)2-v2 and our previous servers. Comparison the (PS)2-v2 server with (A) (PS)2-original and (B) (PS)2-CASP8 servers on the 154 TBM targets in CASP8. (PS)2-v2 yields 99 and 34 higher GDT_TS scores than (PS)2-original and (PS)2-CASP8, respectively, among these 154 targets. These three servers have the similar GDT_TS scores when the sequence identity (SI) between the target and template is more than 30% (blue +). (PS)2-v2 outperforms our previous servers when SI is less than 20% (green ×). Comparison the (PS)2-v2 server with (PS)2-original and (PS)2-CASP8 servers on the 154 TBM targets in CASP8 based on GDT_TS scores These 154 TBM targets were also used to evaluate the automatic servers participating in CASP8. For the templates selection, the accuracy of identifying the best template of the target protein was used to evaluate the performance of these servers (Figure S2 in Additional file 5). The accuracies of the (PS)2-v2 server were 54.1% and 75.0% for identifying the Top 1 templates and Top 10 templates, respectively. In addition, (PS)2-v2 was the rank 6th among these 72 severs based on GDT_TS scores (Table 5). This server is often able to yield reliable predicted structures (i.e. GDT_TS score = 60%) if the E-value is less than 10-2 (Figure S3 in Additional file 6). Comparing (PS)2-v2 with 71 automatic servers on 154 targets in CASP8 The top-rank five serves (Zhang-Server, RAPTOR, pro-sp3-TASSER, Phyre_de_novo and BAKER-ROBETTA) are better than (PS)2-v2 on 40 hard targets (i.e., LGA_S score < 70%) (Table S3 in Additional file 7). These serves were much slower than (PS)2-v2 because they often utilized ab initio methods to build the unaligned loop regions and to generate the models, such as the Poing folding system for Phyre_de_novo server, the chunk-TASSER [38] for pro-sp3-TASSER server, and the Rosetta fragment-assembly methodology [39] for BAKER-ROBETTA server. In the near future, our (PS)2-v2 server will incorporate ab initio methods to model long-length loops and hard targets. Multiple templates for multiple domains We used the target T0504 as an example to describe (PS)2-v2 for selecting multiple templates to model protein structures (Figure 5). The (PS)2-v2 server first selected the 53BP1 tandem tudor domains (PDB code 2g3r) as the best template. The template 2g3rA aligned a part of regions (138 residues, residues 10-147) to the target, and the model yielded the GDT_TS scores of 74.2 and 32.2 for the target T0504-D1 and T0504-D2. Since the number of the unaligned residues is 61 (residue 148-208), the (PS)2-v2 server used unaligned residues to search the new template for modeling this segment. After search template library, (PS)2-v2 selected the PHD finger protein 20-like 1 (PDB code 2eqm) as the template for modeling this unmodeling residues (T0504-D3). The GDT_TS score of this model is 80.7 for the target T0504-D3. The total GDT_TS score improvement is 136.42 when (PS)2-v2 utilizes a multiple-template strategy. Conversely, the GDT-TS scores of the (PS)2-original server, using PDB code 2g3r as the template, are 17.3 (T0504-D1), 48.9 (T0504-D2) and 56.1 (T0504-D3), respectively. For the (PS)2-CASP8 server, the GDT-TS scores using PDB code 2ns2 as the template are 44.4 (T0504-D1), 25.6 (T0504-D2) and 41.0 (T0504-D3), respectively. Comparison the (PS)2-v2 server with (PS)2-original and (PS)2-CASP8 servers on the target T0504 in CASP8. The (PS)2-CASP8 server uses human spindlin1 (PDB code 2ns2) as the template, conversely, (PS)2-v2 utilizes a multiple-template strategy and selects both 53BP1 tandem tudor domains (PDB code 2g3r) and PHD finger protein 20-like 1 (PDB code 2eqm) as templates. (PS)2-v2 significantly outperforms (PS)2-CASP8 on the T0504-D1 and T0504-D3 domains. Multiple models and model selection Figure 6 shows the improvement in GDT_TS scores of (PS)2-v2 by applying a multiple-model strategy and using the program ProQ for the final model selection. Among these 154 CASP8 targets, (PS)2-v2 improved GDT_TS scores for 23 targets; conversely, only 4 targets are lightly worse when (PS)2-v2 used a multiple-model strategy. For the other 127 targets, (PS)2-v2 obtained the same GDT_TS scores and the total GDT_TS improvement is 145.3. According to the paired Student's t-test (p-value is 0.0045 shown in Table S4 Additional file 8), (PS)2-v2 applying the multiple-model strategy significantly improved the GDT_TS scores when the sequence identity between the target and the template is less than 20%. (PS)2-v2 results for using single-model and multiple-model strategies on 154 targets in CASP8 based on GDT_TS scores. (PS)2-v2 improves and decreases the GDT_TS scores for 23 and 4 targets, respectively, when the multiple-model method is utilized. For the other 127 targets, (PS)2-v2 obtains the same GDT_TS scores. The symbols "+", "▫" and "×" represent the performance when the sequence identity (SI) ≥ 30%, between 30% and 20%, and less than 20%, respectively. The target T0471 selected from CASP8 was taken as an example to describe the structure modeling of the (PS)2-v2 server using multiple-model strategy (Figure 7). When the multiple-model strategy was not considered, (PS)2-v2 selected the 2-dehydro-3-deoxyphosphooctonate aldolase (PDB code 2nwr) as the best template with an E-value of 0.055. GDT_TS score of this model is 32.67. If we considered the top-ranking 5 structures (PDB codes 2nwr, 1pea, 1nv8, 1ufr and 1v2d) as the modeling templates, (PS)2-v2 generated 6 alternative target-template alignments for each template, and obtained 30 alignments for this target. The software MODELLER was then applied to generate 30 structures for these 30 target-template alignments. Figure 7 shows the best model with the highest LGscores, assessing by the program ProQ, for each template. The model generated by the template 1nv8A was selected as the final model, because it had the best LGscore (2.838) among these 30 models. The GDT_TS score of this final model is 61.65. The (PS)2-v2 server using multiple models is often able to effectively improve accuracies when the E-value between the target and the template is more than 0.01. The average GDT_TS improvements are 8.53 and 2.23, respectively, when the E-value ≥ 0.01 and E-value ≤ 1e-6. (PS)2-v2 models the target T0471 in CASP 8 using multiple models. This server models T0471 by selecting top-ranking five structures (PDB code 2nwrA, 1peaA, 1nv8A, 1ufrA and 1v2dA) as templates using S2A2 matrix and PSSM scoring matrices. For each template, (PS)2-v2 generates 5 structures and (D) the final model (1nv8) is identified by the program ProQ based on LGscore. T0409 in CASP8 The target T0409 selected from CASP8 was taken to describe the structure modeling of the (PS)2-v2 server (Figure 8). The target is the BIG_1156.2 domain of putative penicillin-binding protein MrcA from Nitrosomonas europaea ATCC 19718. This server yielded the best GDT_TS score (77.8) among all participating servers for this target. An example of the prediction results of the target T0409 from the (PS)2-v2 server. The alignment and predicted structure of the BIG_1156.2 domain of putative penicillin-binding protein MrcA from Nitrosomonas europaea ATCC 19718 using the (PS)2-v2 server. (A) The alignment between the query and the selected template, translation initiation factor 5A protein (PDB code 1bkbA), from Pyrobaculum aerophilum. (B) The superposition, the native structure of T0409 (broad, PDB code 3d0f) and the predicted structure (thin). The green blocks are the regions that the predicted structure matches to the native structure. The yellow and purple blocks indicate the shift errors between predicted structure and native structure, the Cα distances between them are <5 A and >5 A, respectively. For the target T0409, the (PS)2-v2 server selected the C-terminal domain of translation initiation factor 5A protein (PDB code 1bkb) from Pyrobaculum aerophilum as the template [40]. The C-terminal domain is found to be homologous to the cold-shock protein CspA of E. coli, which has a well characterized RNA-binding fold. The best template reported in the CASP8 website is the yeast exosome core, Rrp44 (PDB code 2vnvD) [41], which contains four domains (CSD1, CSD2, RNB and S1). The S1 domain has the most similar structure to the target T0409-D1. The S1 domain also has a common OB fold characteristic of RNA-binding protein, with five anti-parallel ß strands. Figure 8A shows the target-template alignment and the template shares 17.0% sequence identity with the query sequence. Our server could align the five anti-parallel ß strands together. Figure 8B shows the superposition of the predicted structure (thin) and the X-ray structure (broad) of the target T0409. Conclusion This study presents an automatic server for protein structure predictions by applying numerous enhancements and modifications to the original technique, thereby improving the reliability and applicability. By integrating the S2A2 and PSSM matrixes, the (PS)2-v2 server seamlessly blends the amino acid and structural propensities so that they work cooperatively for the template selection and target-template alignments. In addition, our (PS)2-v2 utilizes multiple templates and multiple models for building models and assessing models. Experimental results demonstrate that the (PS)2-v2 server is efficient and effective for template selections and target-template alignments in template-based modeling. We believe that this server is useful in protein structure prediction and modeling, especially in detecting homologous templates with sequence similarity in the twilight zone. Availability and requirements Project home page: http://ps2v2.life.nctu.edu.tw Operating system(s): Platform independent Programming language: C, Perl and PHP Other requirements: JavaScript-enabled web browser Any restrictions to use by non-academics: None Competing interests The authors declare that they have no competing interests.