Monitoring treatment and early detection of fatal breast cancer (BC) remains a major unmet need. Aberrant circulating DNA methylation (DNAme) patterns are likely to provide a highly specific cancer signal. We hypothesized that cell-free DNAme markers could indicate disseminated breast cancer, even in the presence of substantial quantities of background DNA.
We used reduced representation bisulfite sequencing (RRBS) of 31 tissues and established serum assays based on ultra-high coverage bisulfite sequencing in two independent prospective serum sets (n = 110). The clinical use of one specific region, EFC#93, was validated in 419 patients (in both pre- and post-adjuvant chemotherapy samples) from SUCCESS (Simultaneous Study of Gemcitabine-Docetaxel Combination adjuvant treatment, as well as Extended Bisphosphonate and Surveillance-Trial) and 925 women (pre-diagnosis) from the UKCTOCS (UK Collaborative Trial of Ovarian Cancer Screening) population cohort, with overall survival and occurrence of incident breast cancer (which will or will not lead to death), respectively, as primary endpoints.
A total of 18 BC specific DNAme patterns were discovered in tissue, of which the top six were further tested in serum. The best candidate, EFC#93, was validated for clinical use. EFC#93 was an independent poor prognostic marker in pre-chemotherapy samples (hazard ratio [HR] for death = 7.689) and superior to circulating tumor cells (CTCs) (HR for death = 5.681). More than 70% of patients with both CTCs and EFC#93 serum DNAme positivity in their pre-chemotherapy samples relapsed within five years. EFC#93-positive disseminated disease in post-chemotherapy samples seems to respond to anti-hormonal treatment. The presence of EFC#93 serum DNAme identified 42.9% and 25% of women who were diagnosed with a fatal BC within 3–6 and 6–12 months of sample donation, respectively, with a specificity of 88%. The sensitivity with respect to detecting fatal BC was ~ 4-fold higher compared to non-fatal BC.
Detection of EFC#93 serum DNAme patterns offers a new tool for early diagnosis and management of disseminated breast cancers. Clinical trials are required to assess whether EFC#93-positive women in the absence of radiological detectable breast cancers will benefit from anti-hormonal treatment before the breast lesions become clinically apparent.
Breast cancer (BC) is by far the most frequently occurring cancer in women. Every year 522,000 women die from BC .
Mammography is used as a screening tool for early diagnosis but has its limitations due to over-diagnosis and a modest impact on mortality . Recent evidence demonstrates that dissemination might occur during the very early stages of tumor evolution and before clinical manifestation of the cancer in the breast . The analyses of circulating markers in order to identify women with disseminated disease before diagnosis have not been successful .
Numerous studies have demonstrated that patients with disseminated tumor cells in the bone marrow [5, 6, 7] or circulating tumor cells (CTCs) [8, 9, 10, 11, 12] have an inferior prognosis. The immunocytochemical detection of CTCs is reliant upon the isolation of intact cells.
Adjuvant systemic treatment has reduced BC mortality over the last two to three decades . The current strategy guiding administration of adjuvant systemic treatment is reliant upon primary tumor characteristics. However, systemic relapse and subsequent death are caused by disseminated disease whose biological properties may be very different to those comprising the primary tumor .
Recently, markers based on DNA shed from tumor cells have shown great promise in monitoring treatment response and predicting prognosis [15, 16, 17, 18, 19]. However, efforts to characterize the cancer genome have shown that only a few genes are frequently mutated in cancer and the site of mutation per gene differs across tumors . A further limitation is that current technology only allows for the detection of a mutant allele fraction of 0.1% [15, 21].
Over the last decade, DNA methylation (DNAme) has been shown to be a hallmark of cancer  and occurs very early in BC development . DNAme is centered around specific regions (CpG islands)  and is chemically and biologically stable. This enables the development of early detection tools and personalized treatment, based upon the analysis of cell-free DNA contained within serum or plasma [24, 25, 26, 27, 28, 29]. However, two major challenges have to be overcome: (1) the very low abundance of cancer-DNA in the blood; and (2) the high level of “background DNA” shed from white blood cells (WBC)  in banked samples.
To date, virtually all research work has been carried out in relatively small studies and focused on the analyses of cell-free DNAme in metastatic/relapsed breast cancers using markers from previously published studies . In our study we: (1) used an epigenome-wide approach to identify new markers which indicate disseminated breast cancer; (2) analyzed the top marker in 419 primary non-metastatic patients before (i.e. immediately after resection of the primary breast cancer) and after adjuvant chemotherapy; and, most importantly (3) analyzed the marker in 925 healthy women who either remained healthy or developed fatal or non-fatal BC within the first three years after serum sample donation.
Patients and sample collection
We used a total of 31 tissues and 1869 serum samples (Fig. 1). In Phase 1, we analyzed breast cancer tissue and WBCs in order to identify breast cancer specific DNAme markers. In Phase 2, we established serum DNAme assays using serum sets 1 and 2, collected from women attending hospitals in London, Munich, and Prague where they were invited and consented. Blood samples (20–40 mL) were obtained (in VACUETTE® Z Serum Sep Clot Activator tubes), centrifuged at 3000 rpm for 10 min, and serum collected and stored at – 80 °C. Finally, Phase 3 was initiated to validate the top marker performance by using serum samples from two large clinical studies: (1) from 419 patients recruited within the SUCCESS trial  (ClinicalTrial.gov registration ID is NCT02181101), where bloods were taken before and after chemotherapy and (within 96 h) sent to the laboratory for CTC assessment and serum samples stored (Additional file 1: Figure S1); and (2) from UKCTOCS  (ClinicalTrial.gov registration ID is NCT00058032), where serum samples were used from: (i) 229 women diagnosed with BC within the first three years after serum sample donation and subsequently died during follow-up; (ii) 231 matched women who developed BC within three years after sample donation and were alive at the end of follow-up; and (iii) 465 women who did not develop BC within five years after sample donation (Additional file 1: Figure S2). Blood samples from all UKCTOCS volunteers were spun down for serum separation after having been transported at room temperature from trial centers to the central laboratory. The median time between sample collection and centrifugation was 22.1 h. Only 1 mL of serum per UKCTOCS volunteer was available. All patients provided written informed consent.
Isolation and bisulfite modification of DNA
DNA was isolated from tissue and serum samples at GATC Biotech (Konstanz, Germany). Tissue DNA was quantified using NanoDrop™ and Qubit™, and the size was assessed by agarose gel electrophoresis. Serum DNA was quantified using the Agilent Fragment Analyzer and the High Sensitivity Large Fragment Analysis Kit (AATI, USA). DNA was bisulfite converted at GATC Biotech.
DNAme analysis in tissue
Genome-wide methylation analysis was performed by reduced representation bisulfite sequencing (RRBS) at GATC Biotech. DNA was digested with MspI followed by size selection of the library, providing enhanced coverage for the CpG-rich regions [33, 34]. The digested DNA was adapter-ligated, bisulfite-modified, and polymerase chain reaction (PCR)-amplified. The libraries were sequenced on Illumina’s HiSeq 2500. Analysis of the first samples sequenced with a 100-bp paired-end mode showed that the library insert size was small. Therefore, the remaining samples were sequenced with a 50-bp paired-end mode. Using Genedata Expressionist® for Genomic Profiling v9.1, we established a bioinformatics pipeline for the detection of cancer specific differentially methylated regions (DMRs). The most promising DMRs were taken forward for the development and validation of serum-based clinical assays.
Targeted ultra-high coverage bisulfite sequencing of serum DNA
Targeted bisulfite sequencing libraries were prepared at GATC Biotech. Bisulfite modification was performed with 1 mL serum equivalent. A two-step PCR approach was used to test up to three different markers per modified DNA sample. The first PCR amplifies the target region and adds linker sequences which are used in the second PCR to add barcodes for multiplexing and sequences needed for sequencing. Ultra-high coverage sequencing was performed on Illumina’s MiSeq or HiSeq 2500 with a 75-bp or 125-bp paired-end mode, respectively.
Genedata Expressionist® for Genomic Profiling was used to map reads to human genome version hg19, identify regions with tumor-specific methylation patterns, quantify the occurrence of those patterns, and calculate relative pattern frequencies per sample. Pattern frequencies were calculated as number of reads containing the pattern divided by total reads covering the pattern region. Methylation patterns are represented in terms of a binary string, where the methylation state of each CpG site is denoted by “1” if methylated or “0” if unmethylated. The algorithm that we developed scans the whole genome and identifies regions that contain at least ten aligned paired-end reads. These read bundles are split into smaller regions of interest (ROIs) which contain at least 4 CpGs in a stretch of < 150 bp. For each region and tissue/sample, the absolute frequency (number of supporting reads) for all observed methylation patterns was determined (Fig. 2a). This led to the discovery of tens of millions of patterns per tissue/sample. The patterns were filtered in a multi-step procedure to identify the methylation patterns specifically occurring in tumor samples. To increase the sensitivity and specificity of our pattern discovery procedure, we pooled reads from different tumor or WBC samples and scored patterns based on over-representation within tumor tissue. The results were summarized in a specificity score, Sp, which reflects the cancer specificity of the patterns. After applying a cut-off of Sp ≥ 10, 1.3 million patterns for BC remained and were further filtered according to the various criteria detailed in Fig. 2b (further details are provided in Additional file 2). The 95% confidence intervals (CI) for sensitivity and specificity have been calculated according to the efficient-score method . The endpoints were defined according to the STEEP criteria, with relapse-free survival and overall survival as the primary endpoints. The product-limit method according to Kaplan–Meier was used to estimate survival. The survival estimates in different groups were compared using the log-rank test. The Cox proportional hazards regression model was used for the analyses taking into account all variables simultaneously. Further details on samples and methods can be found in Additional file 2. Results The samples, techniques, and purpose of the three phases used in this study (marker discovery, assay development, and assay validation) are summarized in Fig. 1. We first identified DMRs based on their methylation patterns and frequencies in relevant genomic regions, within a BC tissue panel. Methylation patterns with high specificity for breast cancer tissue were identified using the procedure described in Fig. 2b. The selected 18 BC specific patterns identified by RRBS, were further validated using bisulfite sequencing. Thirty-one bisulfite sequencing primer pairs (1–3 per region) were designed and technically validated to determine PCR efficiency and sensitivity. A dilution series obtained by mixing fully unmethylated (i.e. whole genome amplified DNA) with fully methylated DNA (i.e. whole genome amplified DNA treated with CpG methyltransferase) was used to select six reactions which showed good coverage after sequencing (> 104 reads) and sensitivity in highly diluted (<1:104) samples (Additional file 3: Table S1). The best six reactions were taken into Phase 2, for further testing and assay development, in prospectively collected serum sets. We used ultra-deep bisulfite sequencing to develop assays for these candidate regions in 32 serum samples from Serum Set 1 (Figs. 1 and 2c). Five of the six reactions showed good sensitivity and specificity (particularly when discriminating between metastatic and primary BC), based on the abundance of tumor-specific patterns (see Additional file 1: Figure S3 for a complete overview of pattern counts from region EFC#93) and were selected for further validation in Serum Set 2 (n = 78). DNA methylation marker EFC#93, which was identified in RRBS as a region of ten linked CpGs methylated in BC, was optimized to a pattern of five linked CpGs and showed the best sensitivity and specificity independently in Set 1 and 2 (Additional file 1: Figure S4). A statistically higher pattern frequency, for the optimized marker EFC#93, was observed in the metastatic BC groups compared to the healthy/benign lesions or primary BC groups, in both Sets 1 and 2. This translates to an area under the curve (AUC) of a receiver operating characteristics (ROC) curve of 0.850 (95% CI = 0.745–0.955, P = 0.000004) and 0.845 (95% CI = 0.739–0.952, P = 0.000004) to discriminate healthy/benign lesions or primary BC from metastatic BC in Set 1 and Set 2, respectively. When Set 1 and 2 data were combined, the pattern frequency threshold was set to 0.0008 (i.e. 8 in 10,000 reads demonstrated methylation at all CpGs in the EFC#93 region), which led to a sensitivity of 60.9% and a specificity of 92.0% with respect to identifying metastatic BC (Additional file 1: Figure S4). EFC#93 was then validated for use as a prognostic and predictive BC marker in clinical trial samples (Fig. 1). As expected, due to delayed sample processing within these trials, serum samples from both SUCCESS and UKCTOCS contained high levels of contaminating WBC DNA, leading to dilution of the cancer signal (Additional file 1: Figure S5). In order to adjust for this, we made an a priori decision to reduce the threshold for EFC#93 pattern frequency by a factor of 10 to 0.00008 (i.e. 8/100,000 reads demonstrated methylation at all five linked CpGs within the EFC#93 region). Table 1 shows SUCCESS patient characteristics, correlated with EFC#93 positivity/negativity, before and after chemotherapy. Using our predetermined threshold, EFC#93 positivity was significantly associated with CTC presence, both before and after chemotherapy (Chi-square test, P < 0.01, Table 1) although ECF#93 pattern frequencies were not significantly different in samples from patients with either no, 1–4, or > 4 CTCs detected, respectively (Additional file 1: Figure S6). Patients who underwent breast-conserving therapy were more likely to be EFC#93-negative compared to patients who underwent a mastectomy; this is in all probability explained by the fact that patients which presented with larger tumors tended to be EFC#93-positive and would not have been eligible for breast-conserving surgery. This is consistent with the findings that EFC#93 positivity after chemotherapy is significantly (P = 0.014) less frequently observed in early stage (T1) compared to late stage (T2–4) cancers. None of the other clinical–pathological features correlated with cell-free DNA methylation of EFC#93 (Table 1). EFC#93 serum positivity before chemotherapy was a very strong marker of poor prognosis, for both relapse-free and overall survival (Table 2 and Fig. 3a and b). This was independent of the prognostic capability of CTCs (Additional file 1: Figures S7 and S8). Hazard ratios (HRs) (95% CI) for overall survival in the multivariable model were 5.973 (2.634–13.542) and 3.623 (1.681–7.812) for EFC#93 and CTCs, respectively (Table 2). Patients who were CTC-positive and EFC#93-positive had an extremely poor outcome, with > 70% of these patients relapsing within five years (Fig. 3c and d). Neither serum marker EFC#93 nor CTCs alone were predictive of the outcome in samples collected after chemotherapy (Additional file 1: Figures S9 and S10).
To assess whether EFC#93 serum DNAme can diagnose women with poor prognostic BC earlier, we analyzed serum samples from 925 women from our UKCTOCS cohort. The amount of DNA as well as the fragment length was dramatically higher than expected and correlated with the average UK temperature (Additional file 1: Figures S11 and S12); there was also a good correlation between DNA amount and fragment length (Additional file 1: Figure S13) indicating a substantial leak of blood cell DNA into the serum during the blood transport. Within this nested case/control setting, the women with BC (cases) had provided serum samples up to three years before diagnosis. Again, we a priori hypothesized that the high background levels of DNA from lysed blood cells would impact on assay sensitivity, particularly in a pre-clinical setting where only traces of cancer DNA were expected in the circulation. We therefore split all samples into two groups: (1) low serum DNA amount; and (2) high serum DNA amount. In the “low DNA” group, we observed a significantly higher EFC#93 serum DNAme pattern frequency in the women who developed BC within one year after sample donation and subsequently died (Fig. 4a; cut-off threshold of 0.00008). Due to the high levels of background DNA, no significant findings were observed in the “high DNA” sample groups (Fig. 4b). In the “low DNA” group, EFC#93 DNAme was able to identify 43% of women 3–6 months and 25% of women 6–12 months before the diagnosis of a BC which eventually led to death, with a specificity of 88% (Fig. 4c). The sensitivity of serum EFC#93 methylation in detecting fatal BCs up to one year in advance of diagnosis was ~ 4-fold higher compared to non-fatal BCs (33.9% compared to 9.3%). In fact, the sensitivity for non-fatal BCs was within the false-positive range of the healthy samples, indicating that non-fatal BCs are not detected with this marker…..