Germline deletions in the EPCAM gene as a cause of Lynch syndrome – literature review

Lynch syndrome (clinically referred to as HNPCC – Hereditary Non-Polyposis Colorectal Cancer) is a frequent, autosomal, dominantly-inherited cancer predisposition syndrome caused by various germline alterations that affect DNA mismatch repair genes, mainly MLH1 and MSH2. Patients inheriting this predisposition are susceptible to colorectal, endometrial and other extracolonic tumors. It has recently been shown that germline deletions of the last few exons of the EPCAM gene are involved in the etiology of Lynch syndrome. Such constitutional mutations lead to subsequent epigenetic silencing of a neighbouring gene, here, MSH2, causing Lynch syndrome. Thus, deletions of the last few exons of EPCAM constitute a distinct class of mutations associated with HNPCC. Worldwide, several investigators have reported families with EPCAM 3’end deletions. The risk of colorectal cancer in carriers of EPCAM deletions is comparable to situations when patients are MSH2 mutation carriers, and is associated with high expression levels of EPCAM in colorectal cancer stem cells. A lower risk of endometrial cancer was also reported. Until now the standard diagnostic tests for Lynch syndrome have contained analyses such as immunohistochemistry and tests for microsatellite instability of mismatch repair genes. The identification of EPCAM deletions or larger EPCAM-MSH2 deletions should be included in routine mutation screening, as this has implications for cancer predisposition.


Introduction
Lynch Syndrome (LS; or previously HNPCC -Hereditary Non-Polyposis Colorectal Cancer) is one of the most common cancer susceptibility syndromes, which accounts for approximately 1-4% of all colon cancer cases [1]. It is characterized by an early onset of ColoRectal Cancer (CRC) and increased risk for the occurrence of several extra-colonic malignancies, in particular endometrial cancer [2]. In the largest published series 3, 1% of colorectal cases have been familiar to LS [3]. HNPCC is caused by inactivating germline mutations in the MisMatch Repair (MMR) system genes (mainly MSH2, MLH1, MSH6, but also PMS2) [4]. According to data from NCBI base MLH1 and MSH2 mutations account for about 90% of all mutations connected with Lynch syndrome; MHS6 accounts for 7-10% and PMS2 is found in less than 5% of these alterations. According to recent studies there is another 1-3% of LS patients within the Dutch and German populations) and this is the EPCAM gene ( Figure 1) [5].
Lynch syndrome-associated tumors are usually characterized by DNA mismatch repair deficiency, and result from a second somatic event which inactivates the remaining functional mismatch repair gene allele [6,7]. As a consequence of lack of mismatch repair, tumorigenesis is promoted by secondary mutations that accumulate at short repetitive sequences, a phenotype termed "high level microsatellite instability" (high-MSI).
In other words, a MMR gene defect in one allele gives susceptibility to further mutations which may affect second allele cause lack of mismatch repair function in cell. This results in an accumulation of mutations in coding and non-coding microsatellites in such tumors: so-called "microsatellite instability" (MSI), which is a characteristic feature of more than 95% of LS-associated CRCs [8], in addition to the loss of expression of the mutated mismatch repair gene [9]. Carriers of mutations in MLH1, MSH2 or MSH6 have a 30-80% cumulative risk of developing colorectal cancer and women have additional 27-71% cumulative risk of endometrial cancer below age 70 years [2]. The main clinical features are an early age of onset and the occurrence of multiple tumors.
Appropriate diagnosis of LS may curried out in two major ways. One of them is to focus on an adequate family history in all patients visiting a physician. The revised Bethesda guidelines are probably the most common used criteria for selecting patients with CRC for further molecular tests [10,11] (Table 1). The other way is systematic testing for all patients with CRC for loss of MMR function by means of high level microsatellite instability in tumor tissue or immunohistochemistry (ImmunoHistoChemistry, IHC). The advantage of the immunohistochemistry, is also allowing prediction of which mismatch repair gene is likely to be affected by a germline mutation [10]. In our International Hereditary Cancer Center patients are classified to Lynch syndrome according to characteristic clinical features or criteria and pedigrees typical for Lynch syndrome, what is presented by Kladny and Lubinski [12]. An example of a pedigree of a family with definitive HNPCC and EPCAM carriers is shown in Figure 2.
In some individuals with Lynch syndrome the MMR genes mutation search fails. This group is of particular interest to researchers, who trying to find the genetic factors causing the disease. In some LS patients it have been shown that MMR genes methylation cause disease occurence [13][14][15][16]. Some evidence for this came from studies in which the MLH1 gene was the target of methylation in germline tissues in HNPCC patients who were not carriers of a germline MLH1 mutation [17]. Moreover, heritable germline epimutations in MSH2 have been reported as well, in some MMR germlinemutation-negative LS families [18].
A new mechanism of inactivating MSH2 gene was therefore predicted. In multiple patients in which LS was suspected, with no germline mutation found in the MMR genes, a heterozygous germline deletion was identified encompassing the polyadenylation site located in the last two exons [8,9] of the EPCAM gene (OMIM#185535, formerly known as TACSTD1) [19]. Such deletions disrupt the 3' end of the EPCAM gene, leading to transcriptional read-through of the mutated EPCAM allele and epigenetic inactivation, and silencing of, its neighbouring gene MSH2. MSH2 is located 17 kb downstream of EPCAM on chromosome 2, and causes Lynch syndrome [20]. This epigenetic inactivation is restricted only to cells expressing EPCAM, and therefore patients who carry EPCAM deletions show mosaic patterns of MSH2 inactivation that, compared with carriers of a mutation in MSH2, may lead to differences in tumor occurrence or spectrum [20]. What is interesting, high expression of EPCAM in colorectal cancer stem cells answers the question of why carriers with an EPCAM 3' end deletion have a substantially increased risk of colon cancer. Kempers and colleagues (2011) in their studies established different cancer risks associated with EPCAM deletions, depending on whether a deletion affects only the EPCAM gene or both the EPCAM and its neighboring gene MSH2 (EPCAM-MSH2). These risks were then compared with those for Lynch syndrome carriers of a mutation in MMR genes. This was the first study that described the cumulative cancer risks and cancer profile of EPCAM deletion carriers [21]. They reported a low risk for endometrial cancer in patients with deletions of EPCAM compared to that with a mutation in an MMR gene. Of the 194 individuals with an EPCAM mutation included in their studies, 16 Figure 1 The frequency of mismatch repair gene mutations in Lynch syndrome. from EPCAM deletion carriers, suggesting EPCAM immunohistochemistry as a potential analysis tool for the identification of Lynch syndrome patients with EPCAM germline deletions [22]. However, there was a problem with this approach, because EPCAM protein expression was retained in some cancers from EPCAM mutation carriers. Investigators have not determined the relationship between EPCAM protein expression status in cancers and localization of an EPCAM germline deletion [22]. This is why Huth et al. (2012) hypothesized that a second somatic hit (leading to MSH2 inactivation) determined EPCAM expression in tumor cells. (They analyzed four carcinomas and two adenomas from EPCAM deletion carriers for EPCAM protein expression and allelic deletion status of the EPCAM gene [7].

LYNCH SYNDROME MUTATIONS
The EPCAM gene The EPCAM gene (Epithelial Cellular Adhesion Molecule; alternative name TACSTD1 -Tumor Associated Calcium Signal TransDucer 1, OMIM#185535) encodes a carcinoma-associated antigen which is a glycosylated member of a family that includes at least two type I membrane proteins [23]. It is located on the short (p) arm of chromosome 2 at position 21. More precisely, from base pair 47,596,286 to base pair 47,614,166 on chromosomes 2.
In healthy tissues EPCAM is located in the basolateral membrane but in cancer tissues this protein is homogeneously distributed on the cell surface. EPCAM is not only implicated in mediating epithelial-specific intercellular adhesion, but also in intracellular signaling, migration, proliferation and differentiation. The extracellular part of EPCAM contains an epidermal growth factor-like domain and a presumed thyroglobulin domain. Activation of EPCAM signaling is mediated by intra-membrane proteolysis through which the extracellular domain is shed and the intracellular domain (EpICD) is released into the cytoplasm ( Figure 3). Here it becomes part of a large nuclear complex containing the transcriptional regulators βcatenin and Lef, both components of the wnt signaling pathway [2].
It has, therefore, been speculated that EPCAM on normal epithelia is sequestered and, therefore, much less accessible to antibodies than EPCAM in cancer tissue, where it is homogeneously distributed on the cancer cell surface [23].
The antigen EPCAM is presently being used as a target for immunotherapy treatment of human carcinomas.
Deletions involving the transcription termination signal of EPCAM are causative in 1% to 2.8% of families with Lynch syndrome. Other EPCAM alterations that don't affect the transcription termination signal cause autosomal-recessive congenital tufting enteropathy [24].

Epigenetic silencing of MSH2
EPCAM deletions cause MSH2 gene silencing by a mechanism known as promoter hypermethylation. Additional methyl groups attached to the MSH2 promoter reduce the expression of the MSH2 gene, which means that less protein is produced in epithelial cells. The MSH2 protein plays an essential role in repairing mistakes in DNA, so loss of this protein prevents proper DNA repair and mistakes accumulate as the cells continue to divide. These mistakes can lead to uncontrolled cell growth and an increased risk of cancer. Lack of EPCAM immunostaining in MSH2-negative CRCs is indicative of EPCAM gene alterations, and therefore Musulen Eva and Blanco Ignacio have recommended (2013) the performance of EPCAM immunohistochemistry before MLPA (Multiplex Ligation-dependent Probe Amplification) analysis [4].  lead to transcriptional read-through into the MSH2 locus [19]. What's more, Ligtenberg's group (2012) noticed that this transcriptional read-through induced monoallelic hypermethylation of the MSH2 promoter present on the same allele as the 3′ end EPCAM deletion. Sequence analysis defined the deletion breakpoints, with a deletion of 4909 bp denoted 859-1462_*1999del from EPCAM cDNA. Haplotype analysis suggested that this mutation originated from a common founder. All six high-MSI tumors from these families showed methylation of the MSH2 promoter by methylation-specific PCR and subsequent bisulfite sequencing ( Table 2). These observations convincingly indicated that deletion of the 3′ end of EPCAM can lead to inactivation of the MSH2 promoter and, therefore, should be considered a novel cause of Lynch syndrome [2]. Ligtenberg et al. (2009) also analyzed the family reported by Chan et al. (2006) with heritable MSH2 promoter methylation and identified a heterozygous deletion of 22.8 kb (EPCAM cDNA, 555+894_+14194del) that segregated with the disease [19]. The deletion extended from intron 5 of the EPCAM gene to approximately 2.4 kb upstream of MSH2, encompassing the 3' end of EPCAM and leaving the MSH2 promoter intact. The same authors identified the same mutation in another Chinese family, where there was no evidence for a founder mutation. Analyses such as RT-PCR and methylation-specific PCR of tissue samples from affected individuals showed that methylation of MSH2 was limited to EPCAM-expressing cells ( Table 2). Huth et al. (2012) reported that lack of EPCAM expression occurs in many, but not all, tumors from Lynch syndrome patients with EPCAM germline deletions [7]. The differences in EPCAM expression were not related to the localization of EPCAM germline deletions. Therefore they hypothesized that a type of second somatic hit, leading to MSH2 inactivation during tumor development, determines EPCAM expression in the tumor cells.
In four out of six tumors, investigators detected lack of EPCAM expression accompanied by biallelic deletions affecting the EPCAM gene. In contrast, monoallelic retention of the EPCAM gene was observed in the remaining two tumors with retained EPCAM protein expression.
These results indicate that EPCAM expression in tumors from EPCAM deletion carriers depends on the localization of a second somatic hit that inactivates MSH2. These data demonstrate the lack of EPCAM protein expression observed in tumors from a combination of a germline mutation and second somatic hit. They also show that heterozygous EPCAM germline deletions are not necessarily associated with loss of EPCAM expression in tumor tissue. The detection of a somatic mutational event causing MSH2 inactivation in one of the EPCAM positive tumors, explains why some tumors from EPCAM deletion carriers show loss of MSH2, but retained EPCAM expression [7]. The incidence of EPCAM deletions appeared to vary between populations and was found to represent at least 1-3% of the explained Lynch syndrome families. Detailed analysis of the EPCAM deletions revealed their range of variability as well as their Alu-repeat-mediated origin as a likely mechanism for these rearrangements [5]. Indeed, all EPCAM deletion breakpoints characterized by various authors were located within repetitive Alu elements (Table 2).
Alu elements are a family of Short Interspersed Nuclear Elements (SINE) found only in primates and comprise about 10.5% of the human genome [26]. These are mobile retrotransposable elements that also contain a recombinogenic motive, leading them to recombinational activity. However, their mobility and susceptibility to recombination is presumably tempered by host-defensive methylation of these CpG-rich elements [26]. Deletions formed through unequal Alu-mediated homologous recombination involve a cross-over at regions of shared sequence identity between two parental Alu elements located in cis in the same orientation, with loss of the loop of intervening genomic sequence during the exchange. These deletions are identifiable by signatory tracts of perfect Alu-derived sequence identity, overlapping or adjacent to the deletion breakpoints, such that the deletion junction cannot be mapped to a precise nucleotide. Perez-Cabornero and colleagues (2011) mapped each of the deletion breakpoints to within short stretches of fused Alu sequences that shared close homology to their respective parental Alu elements [27]. This is the first such finding to date and prompted a revisitation of the role of Alu elements in the causation of Lynch syndrome.

Prevalence of EPCAM deletions
Worldwide, EPCAM deletions were found to be present in various populations from different geographic origins [18,28] (Table 2). Their prevalence was found to vary between these populations, partly because of the presence of various founder mutations [5], and to account for up to 10% of the MSH2 inactivating mutations. All Lynch syndrome-associated tumors from EPCAM deletion carriers that were available for testing showed hypermethylation of the MSH2 promoter [13]. Detailed analyses of the breakpoints of these deletions indicated that they predominantly originate from Alu repeat-mediated recombination events [2]. A wide variety of different deletions could be reflected in different recombination events caused by a high number of Alu repeats spread across this locus [9]. This situation was documented by Grandval et al. [29], namely EPCAM deletions in three out of seven of their patients as de novo mutations, which probably reflects the relatively high Alu repeat-mediated recombination frequency at this locus [2].
Our unpublished data from the studies of 55 patients with LS indicates that deletions of 8 and 9 exons of the EPCAM gene determine 7% of LS cases without MMR mutation.

Tumor spectrum of EPCAM deletions
Constitutional rearrangements affecting the open reading frame of genes typically lead to constitutive inactivation of these genes, irrespective of the cell type. In contrast, in EPCAM 3′ end deletion carriers MSH2 inactivation is cell type-specific, since the epigenetic silencing of MSH2 is restricted to cells in which the EPCAM locus is active and transcriptional read-through occurs. The outcome of this is that carriers of EPCAM deletions show mosaic patterns of MSH2 inactivation. This phenomenon involving the inactivation of one of the EPCAM alleles might lead to a tumor spectrum that is different from that of germline mutations directly affecting MSH2 [2].
Some investigators have compared the cancer risk for carriers of an intragenic MSH2 mutation, a combined EPCAM-MSH2 deletion, and a deletion of the 3′ end of EPCAM. The colorectal cancer risk of EPCAM mutation carriers, as reflected by the mean age at diagnosis and the cumulative risk by age 70 years, was similar to that of EPCAM-MSH2 or MSH2 mutation carriers. In contrast, the cumulative risk of endometrial cancer by the age of 70 years was significantly lower for 3′ end EPCAM deletion carriers than for combined EPCAM-MSH2 deletion carriers and MSH2 mutation carriers (Table 3). Importantly, the comparison of the tumor risk between the EPCAM and EPCAM-MSH2 deletion carriers indicates that the difference in endometrial cancer risk relates to the mosaic inactivation of MSH2 and not to a constitutive loss of EPCAM [20]. A relatively low incidence of endometrial tumors was also observed in several other studies [30].
As mentioned above, the average age at onset, the risk of colorectal cancer and the tumor phenotype in EPCAM deletion carriers are comparable to those carrying a typical mismatch repair gene mutation in MLH1 or MSH2, whereas the cumulative risk of endometrial cancer is much lower [5,20]. In EPCAM deletion carriers there is a relatively low risk of endometrial cancer, which is the second most prevalent Lynch syndrome-associated malignancy in carriers of a mismatch repair mutation. EPCAM deletion carriers will probably be more easily recognized than carriers of an MSH6 mutation, whose colorectal cancer risk is lower with a higher age of onset [2].
In the cohort from Grandval et al. (2012), EPCAM deletion carriers only developed tumors of the digestive tract. Their risk to develop colorectal cancer was particularly high, only two of the 29 deletion carriers aged over 30 being unaffected [29].

EPCAM founder deletions
A number of founder mutations have been identified in MMR genes, but only one affecting the EPCAM gene [19,31]. This 4.9 kb EPCAM founder deletion, thus far observed by Ligtenberg et al. in seven Dutch families, was found to be present in nine out of ten additional families from The Netherlands, but in none of the families from other geographic origins, thus confirming its founder nature (Table 2) [5]. In 2013, Mur and Pineda detected two EPCAM deletions: c.858+2568_*4596del (found in three families) and c.858+2488_*7469del (in two families; all five were unrelated Spanish LS families). Furthermore, they describe the EPCAM c.858+2568_ *4596del mutation as the first reported EPCAM founder mutation in Spain (Table 2) [32].

Diagnostics of EPCAM 3'end deletion carriers
Mutational screening of carriers of 3'end deletions of EPCAM is based on matching the Amsterdam or Bethesda criteria and is associated with an MSI phenotype or loss of MMR protein expression in tumors.
Immunohistochemical analysis of MMR protein expression is a hallmark of Lynch syndrome diagnostics, but it cannot distinguish between EPCAM deletion carriers and MSH2 mutation carriers [33]. The dependence of EPCAM expression on both germline and somatic alterations explains why EPCAM immunohistochemistry can yield inconspicuous results in a subset of tumors from EPCAM deletion carriers, namely if a second somatic MSH2 inactivating hit does not affect the EPCAM gene [7]. Moreover, Huth and his team reported a lack of EPCAM protein expression in a colorectal adenoma, suggesting that EPCAM immunohistochemistry may detect EPCAM deletions already at a precancerous stage [7].
Germline rearrangements in the EPCAM gene and MSH2 promoter methylation are detected by using multiplex ligation-dependent probe amplification (MLPA) analysis containing probes for a specific region [9,19,32]. Huth et al. (2012) revealed, following MLPA analysis, biallelic deletions affecting the EPCAM gene in four out of six analyzed tumors. In the remaining two tumors, no biallelic EPCAM deletions were observed, and the allelic profile obtained for the EPCAM gene region was identical in DNA isolated from tumor tissue and matched blood samples. All tumors showing biallelic deletions in the EPCAM gene region were negative for EPCAM protein expression, while EPCAM protein expression was retained in the tumors which retained a one normal EPCAM allele. These scientists also proved that the MLPA technique is applicable for the detection of heterozygous and homozygous deletions in DNA isolated from paraffin-embedded tumor tissue (when for example no blood is available) [7].
As a additional method for detecting specified deletions in the EPCAM gene, Ligtenberg and Kuiper (2009) used long range PCR and real-time quantitative RT-PCR. For long range PCR across the deletion, in Dutch families, a TAKARA PCR kit was used with primers on either side of the deletion. Further, they specified the deletion by direct-sequencing, using a forward primer in combination with an internal reverse primer. For the Chinese families they performed multiple long range PCRs, which yielded an aberrant amplicon suggestive of a large deletion mutation [19]. To detect fusion transcripts, a direct RT-PCR reaction or a nested-RT-PCR was performed [19,21].
Recently, Pritchard and his team from the USA (2012) notified a comprehensive and cost-effective test called "ColoSeq" that detects all classes of mutations in Lynch and polyposis syndrome genes, using solution-based targeted capture and next-generation sequencing [34]. Due to this technique they correctly identified 28/28 (100%) pathogenic mutations in MLH1, MSH2, MSH6, PMS2, EPCAM, APC and MUTYH, including single nucleotide variants (SNVs), small insertions and deletions, and large copy number variants. These scientists focused on defining the sensitivity of heterozygous variant detection because pathogenic mutations in Lynch and polyposis syndromes are almost always heterozygous, except in the MUTYH gene. There was 100% reproducibility of mutation detection between independent runs [34]. The Coloseq assay demonstrated at least exon-level resolution for all large deletions and duplications, which was comparable or even better than the resolution of traditional approaches to these kind of mutations analysis such as MLPA, in which exact breakpoints could not be determined, because they are commonly in Alu or other repetitive DNA elements [34].

Conclusions
Based on previous worldwide results, there is a strong suggestion that implementation of EPCAM deletion mapping in routine diagnostics on suspected Lynch syndrome families should be considered. Some studies suggest that the frequency of EPCAM deletions as a cause of Lynch syndrome is up to 30% in patients with MSH2-negative tumors (from IHC results) or approximately 20% of LS patients without a mutation in MMR genes [18,22]. This underlines the importance of EPCAM deletions in the Lynch syndrome, as it is a more frequent cause of LS than mutations in PMS2 or MSH6 [33].
The frequent occurrence of somatic deletions affecting the EPCAM gene as a second hit in tumors from EPCAM deletion carriers suggests that the localization of somatic events inactivating mismatch repair genes in Lynch syndrome is not random, but related to the underlying germline mutation [7].
In conclusion, EPCAM 3'end deletions are a recurrent cause of Lynch syndrome, and detection should be implemented in routine Lynch syndrome diagnostics.