Expanding the genetic basis of copy number variation in familial breast cancer

Introduction Familial breast cancer (fBC) is generally associated with an early age of diagnosis and a higher frequency of disease among family members. Over the past two decades a number of genes have been identified that are unequivocally associated with breast cancer (BC) risk but there remain a significant proportion of families that cannot be accounted for by these genes. Copy number variants (CNVs) are a form of genetic variation yet to be fully explored for their contribution to fBC. CNVs exert their effects by either being associated with whole or partial gene deletions or duplications and by interrupting epigenetic patterning thereby contributing to disease development. CNV analysis can also be used to identify new genes and loci which may be associated with disease risk. Methods The Affymetrix Cytogenetic Whole Genome 2.7 M (Cyto2.7 M) arrays were used to detect regions of genomic re-arrangement in a cohort of 129 fBC BRCA1/BRCA2 mutation negative patients with a young age of diagnosis (<50 years) compared to 40 unaffected healthy controls (>55 years of age). Results CNV analysis revealed the presence of 275 unique rearrangements that were not present in the control population suggestive of their involvement in BC risk. Several CNVs were found that have been previously reported as BC susceptibility genes. This included CNVs in RPA3, NBN (NBS1), MRE11A and CYP19A1 in five unrelated fBC patients suggesting that these genes are involved in BC initiation and/or progression. Of special interest was the identification of WWOX and FHIT rearrangements in three unrelated fBC patients. Conclusions This study has identified a number of CNVs that potentially contribute to BC initiation and/or progression. The identification of CNVs that are associated with known tumour suppressor genes is of special interest that warrants further larger studies to understand their precise role in fBC.


Introduction
Global cancer statistics identify BC as the most frequently diagnosed cancer (23%) and leading cause of cancer related death (14%) in females [1]. Nearly 27% of these BCs occur in a familial setting typically associated with an earlier age of disease diagnosis and a higher frequency among family members and is termed fBC [2,3]. It is estimated that 5-10% of these families harbor germline mutations or complex genomic changes that render inactive one of four high penetrance genes (BRCA1, BRCA2, TP53 or PTEN) or moderate penetrance genes (CHEK2, ATM, BRIP1 and PALB2) [2,4,5]. Associations have also been identified for other genes in fBC including ATM, CASP8, CTLA4, NBN, CYP19A1, TERT, and XRCC3 [6]. The most recent BC meta-analysis has identified 41 loci and suggests that over 1000 loci may be involved in disease susceptibility [7]. The identification of BRCA1 and BRCA2 as susceptibility genes for BC and the more recent addition of PALB2, BRIP1 and RAD51C [5] have focused attention on genes associated with double strand break repair (DSBR). There are at least 39 genes implicated in DSBR, all of which could potentially be associated with BC risk. This is analogous to DNA mismatch repair (MMR), where there are at least 21 genes associated with this process, of which four are now routinely assessed and more recently a fifth gene (POLD1) has been added to the list [8,9]. Despite the plethora of information regarding genetic loci associated with BC risk, for many fBC cases no genetic predisposition has been identified. Outside the context of gene mutations other mechanisms may be associated with disease development including gene silencing as a result of epigenetic re-programming of BC susceptibility genes (analogous to loss of EPCAM and the re-arrangement of the epigenetic profile on chromosome 2, rendering MSH2 inactive [10,11]), or mutations in genes not yet associated with a predisposition to disease.
One type of genetic alteration that could account for susceptibility is genetic re-arrangements detected as CNVs. CNVs represent a class of structural variation involving regions of duplication or deletion of genomic material that can encompass large stretches of genomic sequence ranging from megabases (Mbs) to a few kilobases (Kb) in size. As a consequence, CNVs can contribute to disease when they incorporate functional gene sequence (coding and promoter regions of genes) or exert more cryptic effects, that could affect epigenetic regulation (methylation, micro-RNA targets) and non-coding intronic gene sequences [12][13][14][15][16][17][18][19][20][21][22][23]. Two reports have recently examined CNVs in association with BRCA1/BRCA2 mutation negative fBC patients. The first of these has reported a greater abundance of rare CNVs in fBC patients and suggest that rare CNVs are likely to contain genetic factors associated with BC predisposition, while the second report associated several CNV markers with fBC risk and suggests their use in disease risk assessment [24,25].
The detection of CNVs has historically relied upon the use of DNA arrays, typically comprised of oligonucleotide markers distributed across the whole genome. The resolution of DNA arrays has increased to allow for the detection of genomic rearrangements as small as a few Kb in size. In this study we used the Affymetrix Cyto2.7 M array which provided the highest genomic coverage of any commercially available microarray at the time of assay to assess CNV variation in an fBC cohort. The Cyto2.7 M array contains a combination of 400,000 single nucleotide polymorphisms (SNPs) and >2.1 million copy number probes (average spacing 1395 base pairs (bp)) which together can be used to accurately detect genomic rearrangements.
We conducted a patient-control analysis examining 129 fBC patients and 40 control subjects derived from the same population to identify CNVs which could be associated with the genetic basis of their disease. To date this study represents one of the largest CNV studies of BRCA1/BRCA2 mutation negative fBC patients.

Samples
The study was approved by the University of Newcastle's Human Research Ethics Committee and the Hunter New England Human Research Ethics Committee. Genomic DNAs were obtained from fBC patients who had given informed consent for their DNA to be used for studies into their disease and control DNA samples from the Hunter Community Study (HCS) [26]. DNA was extracted from whole blood by the salt precipitation method [27].
A cohort of 129 patients clinically diagnosed with earlyonset fBC were used in this study. All patients had been diagnosed with BC and were the first individual (proband) of their family to seek genetic testing for mutations in BRCA1/BRCA2. Mutation screening was performed using Sanger Sequencing and Multiplex ligation-dependant probe amplification (MLPA) analysis. No mutations were identified in any of the patients (BRCA1/BRCA2 mutation negative). The average patient age was calculated to be <40.7 years. Genomic DNA from 40 controls [26] was also utilized in this study. These were healthy (cancer free) individuals aged >55 years at the time of sample collection.

Genomic array preparation and data processing
The genomic DNA from 129 fBC patients and 40 controls were processed on the Affymetrix Cyto2.7 M array consistent with manufacturer's protocols. CEL files were analysed in Affymetrix, the Chromosome Analysis Suite (ChAS) (Version CytoB-N1.2.0.232; r4280) using NetAffx Build 30.2 (Hg18) annotation. Quality control (QC) parameters were optimized and validated using a training set of 20 randomly selected samples. All samples were subject to a series of quality cut-off measures: snpQC >1.1 (SNP probe QC based off distances between the distribution of alleles (AA, AB and BB) where larger differences are associated with an increased ability to differentiate genotype; default), mapdQC <0.27 (Median Absolute Pair-wise Difference; CN probe QC based off a reference model; default) and wavinessSd <0.1 (measure of standard deviation in data waviness; the GC content across the genome correlates with average probe intensities i.e. high GC probes are brighter than low GC probes on average, creating waves in the data). CNV regions were assessed according to call confidence, probe count, size and by visual inspection for distinction from normal CN state. Data was also visually inspected to identify regions with low density of markers (Additional file 1: Table S1) which were excluded across all samples. Most thresholds were more stringent than default settings alone in an aim to minimize false-positive CNVs being included in the analysis. CNV regions were filtered across all samples using the following parameters: >90% confidence, autosomes only and a minimum number of 24 probes. Using these parameters the limit of detection was 9.65 Kb across all samples used in the current study. This does not exclude the possibility of CNVs smaller than this from contributing to disease in a proportion of fBC patients.

CNV and statistical analysis
CNVs in fBC patients and controls were subject to a series of comprehensive analyses which included: (1) interrogation for CNVs residing in or ±100 Kb of 61 genes (associated with DSBR, MMR and BC susceptibility) and 41 SNPs recently reported to be associated with BC risk [6,7,28,29] (see Additional file 1: Tables S2 and S3); (2) comparison of CNVs between fBC patients and controls according to CN occurrence and distribution across the genome; (3) identification of rare CNVs using the Database of Genomic Variants (DGV); and (4) the identification of genes associated with malignancy (non-specific) using the Network of Cancer Genes (Version 3.0) and the Cancer Gene Census (CGC; 15 March 2012) databases [30,31]. Associations (e.g. numbers and sizes of CNVs) were statistically compared using a two tailed un-paired t-test Graphpad Prism (Version 6) [32].

Validation of CNV results
CNV results were validated using pre-designed TaqMan Copy Number (CN) Assays (Applied Biosystems). Up to two CN assays were selected within the CNV region indicated by the Cyto2.7 M array and CN assays, proximal but external to the region were also selected as controls (assay information summarized in Additional file 1: Table S4). A total of 11 samples were run in triplicate comprised of the sample(s) of interest, a calibrator (control) sample with known CN for the region of interest and a no-template-control (NTC). Real-time PCR was conducted according to manufacturer's protocols using 10 ng of DNA sample in a final reaction volume of 20 μL. The assay was run on the real-time PCR machine (Applied Biosystems 7500; SDS software Version v1.4) according manufacturer's protocols. The results were exported to CopyCaller v2.0 software (Applied Biosystems) for analysis.
Three CNVs were validated using this secondary independent assay (Additional file 1: Table S5). The CNVs included a CN gain and a CN loss in the WWOX gene as well as a CN loss in the FHIT gene. Given the high concordance between the CNV calling within the experimental parameters set for this study and the independent copy number assays we considered that it was not necessary to confirm all CNVs using a second independent assay.

Array resolution and CNV detection
Analysis of Cyto2.7 M array data revealed a total of 414 CNVs in 169 individuals assessed in this study (Table 1). CNVs detected ranged in size from 9.65 Kb to 1335.06 Kb. There was no difference in the average number of CNVs identified in the patients versus the controls (p = 0.75). The average genomic burden of CNVs also did not differ between patients (226.93 Kb) and controls (295.52 Kb), p = 0.30; or the average CNV size between patients (76.22 Kb) and controls (106.57 Kb), s, p = 0.07.

Occurrence and distribution of CNVs in fBC patients
Overall 310 CNVs were identified in fBC patients of which 35 also occurred in controls (Additional file 1: Table S6). Since these regions were represented in the control population they were removed from further analysis. Of the 275 CNVs unique to the patients (Additional file 1: Table S7), 94 have was previously described in the DGV and 39 spanned genomic regions that were common to multiple patients ( Table 2). Of these 11 CNVs (located on chromosomes 2, 3, 4, 6, 11, 14, 15, 17 and 18) were common to two patients; three were common to three patients (located on chromosomes 4, 5 and 19); and two were common to four patients (located on chromosomes 3 and 18). Among these, three genomic regions (located chromosomes 6, 11 and 19) were considered novel (not reported in the DGV) and likely to represent regions of potential association with BC risk.
Of the CNVs unique to patients 160 (58.18%) encompassed genes. A CNV located in SUPT3H was also excluded from analysis as it was identified to be affected by a re-arrangement in a control sample and considered unlikely to be associated with disease risk. Therefore a total of 159 genes were associated with a CNV were identified as being unique to the fBC patients and represent genes potentially associated with disease. A total of 24 genes associated with 44 CNVs (gains, losses or both) were identified in multiple individuals (as shown in Table 3): 19 genes, including LAMB3, NBN, IL8 and WWOX, were affected by a CNV in two individuals; PIK3R5 and POU2F3 were affected by a CNV in three individuals; ARHGEF12 and TMEM136 were affected by a CNV in four individuals; and NAMPT was affected by a CNV in five individuals.

Rare CNVs in fBC patients
There were 95 rare CNVs identified in 42 of the fBC patients. Of these 70 were associated with 78 genes and were found in 27 patients. Out of the 78 genes SUPT3H was excluded from further analysis as it was identified in a healthy control subject. Ten genes that were disrupted due to the presence of a CNV had previously been associated with cancer [30,31] including ARHGAP26, ARH-GEF12, CARD11, CPD, FAM135B, TSHR, MLLT11, PTK2B, RHOH and FHIT ( Table 4). The remaining CNVs affecting 67 genes were unique and have not previously been associated with malignancy (listed in Additional file 1: Table S8). These genes potentially represent new candidates that require further investigation.
Genomic changes involving BC susceptibility genes or the recently identified BC susceptibility loci There are at least 61 genes including those involved in DNA DSBR and MMR that could potentially contribute to fBC [6,7,28,29]. CNV data for the 129 fBC patients and 40 controls was screened for genomic re-arrangements within or ±100 Kb either side of these 61 genes. Five  . With respect to the NBN gene a CNV loss was also identified in a control residing in a region located 52.6 Kb downstream of the gene but did not appear to be associated with disruption of the coding sequence.
No CNVs were identified that were located in the same 41 genomic regions that have recently been reported as BC susceptibility loci [7].
The identification of a CNV that involved WWOX in two unrelated patients (see Table 6, Figures 1 and 2) was of interest as this gene is located in a fragile site (FRA16D) associated with cancer development and has been shown to interact with TP53 and ACK1 [33] and has recently been reported to be involved in breast carcinogenesis [34,35]. Together, this suggests that loss of function of WWOX could potentially be involved in BC susceptibility. One patient harboured a CNV gain that was predicted to disrupt the coding sequence of the gene via the insertion of additional genomic material whereas the other patient had a CNV loss that is expected to result in loss of function. Both of these changes were confirmed using an independent CN assay (see Additional file 1: Table S5). A number of recent reports have also correlated BC development with changes in the FHIT gene which similarly to WWOX is located in a fragile site (FRA3B) and has again been linked to tumour development [36][37][38][39][40][41][42][43]. CNV analysis revealed a CN loss that encompassed FHIT (Table 6 and Figure 3) which was confirmed using an independent assay (Additional file 1: Table S5).

Discussion
The association between CNVs and fBC is yet to be fully defined. In this study we provide evidence that CNVs are a potential explanation for small but significant number of fBC patients who do not harbour germline mutations in known susceptibility genes.
Genomic resolution provided by microarray technology has increased significantly allowing for the discovery of ever smaller CNVs. The resolution of the array used in this study was limited to the identification of CNVs greater than 9.65 Kb in size, and hence we cannot rule out the potential involvement of smaller CNVs in the aetiology of fBC. There have been a number of technical issues associated with the identification of CNVs that have compounded the difficulties in assessing the role of genomic rearrangements in disease. Different array platforms, software algorithms, batch effects and population stratification influence the accuracy of calls made to and comparisons of CNV data [44][45][46]. To help in reducing the influence of these effects a set of 40 older population controls was used as the basis to differentiate between CNVs associated with breast cancer and uninformative controls. All samples (both cases and controls) were processed on one platform and analysed using the same analysis software and experimental parameters. Comparison between the number and size of CNVs between patients and controls did not reveal any significant differences between cohorts. It is important to note the limited number of controls utilized in the current study Gene, age of patient diagnosis (Dx), CNV type (gain or loss), location (chromosome, start and end) and CNV size are indicated. represents a potential bias, however it is reassuring to note that despite this potential limitation, our observations are consistent with two previous reports on fBC (68 patients and 100 controls) and BRCA1-associated ovarian cancer (84 patients and 47 controls) [24,47]. We also identified 67 genes associated with novel CNVs that have yet to be linked with BC risk. It is interesting to note that many of these have been implicated in biological processes involving metabolism and biological regulation [48]. This provides the basis for further investigation into expanding the number of genes involved in BC development.
Our study has identified CNVs in close proximity to a number of genes previously associated with BC risk in a fBC cohort: ARHGEF12 has been proposed to be a candidate tumour suppressor gene in BC whereby its under expression (typically as a result of genomic loss) has been observed in BC cell lines and where re-induction of the gene resulted in reduced cell proliferation and colony formation [49]; Laminin 5 (LN5) genes (including LAMB3) have been shown to exhibit reduced expression as a result of epigenetic inactivation in 65% of BC cell lines [50]; NBN has been recently reported to be associated with BC risk [6]; and NAMPT has been shown to modify the effects of PARP inhibitors used in the treatment of triple-negative BCs suggesting the potential for a combination of NAMPT and PARP inhibitors in the treatment of this disease [51].
Of all the genes affected by a CNV identified in more than one patient, the most frequently reported for BC development has been aberrations in WWOX. This tumour suppressor gene has been shown to be critical for normal breast development [34] with mutations in exons 4 to 9 frequently observed in BC tumours [35]. High expression of WWOX has been shown to be beneficial in association with tamoxifen treatment [52]. We further evaluated two unrelated fBC patients, one harbouring a CNV gain and the other a CNV loss. In both cases, the genomic rearrangements are predicted to reduce WWOX expression and thereby contribute to disease risk. Our results suggest that inherited deficiencies in WWOX are associated with disease but we could not demonstrate that these alterations were transmitted across generations due to ethical CNV location (chromosome, start bp and end bp) and size (Kb); as well as the confidence score associated with CNV call, the gene affected by the variant, the number of probes used to call the CNV and if the variant has previously been reported in the DGV. considerations. Notwithstanding, the frequency at which we have observed variants occurring in this gene (>1.55%) suggests that they may account for a significant proportion of BRCA1/BRCA2 mutation negative fBC patients. Functional studies are required to determine the precise effect of these variants in the alteration of WWOX expression and BC development.
The identification of CNVs in close proximity to BC susceptibility genes and loci that either contributes to disease development directly or via more cryptic means expands our understanding of their contribution to disease risk in fBC. Our study identified CNVs residing in three genes RPA3, NBN, MRE11A and CYP19A1 which supports their involvement in BC [6,28,29,[53][54][55][56]. Given the predicted disruption of RPA3, NBN, MRE11A and CYP19A1 it is likely that these variants are associated with disease.
Within our fBC cases we identified several genes within or in close proximity to rare CNVs which have previously been associated with BC: the putative oncogene MLLT11 (aka AF1Q) has been reported to be over expressed in a BC cell line affecting invasive and metastatic potential [57,58]; while PTK2B has been shown to be the most frequently lost kinase in sporadic BC tumours and is suggested to contribute to the disease phenotype [59]. Of the rare CNVs associated with malignancy, the gene most frequently associated with BC development is the tumour suppressor FHIT. FHIT has been reported multiple times to be genetically and epigenetically modified in breast tumours [36][37][38][39][40][41]; its expression has been reported to be protective against HER2-driven breast tumour development [42]; whereas reduced expression is associated with poor prognosis [43]. A germline intronic deletion in FHIT has also been identified in a pancreatic cancer study [60]. Given that we have found a constitutional CNV in FHIT we suggest that variants in this gene could also account for a fraction of fBC patients. As we were unable to obtain other family members it remains to be seen if these genomic re-arrangements confer significant disease risk in a family setting rather than being associated with disease progression.
A recent report using 68 patient and 100 controls suggested that rare CNVs may contribute to disease in a small proportion of fBC patients [24]. In contrast to our findings this study reported significantly lower percentages of rare CNVs in fBC patients (4%) compared to the level observed in the current study (30.65%) [24]. The discrepancies in these findings are most likely to be related to differences in sample populations, the type of array used (variation in array coverage and density), as well as the algorithm used by the analysis software [44][45][46]. These findings reinforce the need to obtain larger cohorts of patients and controls to better understand the contribution of CNVs to breast cancer development.

Conclusions
This study has revealed that there are a number of CNVs which may contribute to the development of fBC. Several previously reported BC susceptibility genes that include RPA3, NBN, MRE11A and CYP19A1 were found to be influenced by the presence of a CNV. It was also revealed by this investigation that three unrelated fBC patients harboured CNVs in WWOX and FHIT. We propose that variants in these genes may account for disease in a significant proportion of fBC patients. Overall the results of this study provide further grounds for further investigation into the presence of CNVs in larger series of fBC patients who do not harbour changes in known breast cancer susceptibility genes.