Prevalence and spectrum of MLH1, MSH2, and MSH6 pathogenic germline variants in Pakistani colorectal cancer patients

Background Pathogenic germline variants in MLH1, MSH2 and MSH6 genes account for the majority of Lynch syndrome (LS). In this first report from Pakistan, we investigated the prevalence of pathogenic MLH1/MSH2/MSH6 variants in colorectal cancer (CRC) patients. Methods Consecutive cases (n = 212) were recruited at the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC), between November 2007 to March 2011. Patients with a family history of > 3 or 2 HNPCC-associated cancers were classified as HNPCC (n = 9) or suspected-HNPCC (n = 20), respectively (group 1; n = 29). Cases with no family history were designated as non-HNPCC (group 2; n = 183). MLH1/MSH2/MSH6 genes were comprehensively screened in group 1. Pathogenic/likely pathogenic variants identified in group 1 were subsequently evaluated in group 2. Results Eight distinct pathogenic/likely pathogenic MLH1/MSH2 variants were found in group 1 (10/29; 34.5%), belonging to HNPCC (5/9; 55.6%) and suspected-HNPCC (5/20; 25%) families and in group 2 (2/183; 1.1%) belonging to non-HNPCC. Overall, three recurrent variants (MSH2 c.943-1G > C, MLH1 c.1358dup and c.2041G > A) accounted for 58.3% (7/12) of all families harboring pathogenic/likely pathogenic MLH1/MSH2 variants. Pathogenic MSH6 variants were not detected. Conclusion Pathogenic/likely pathogenic MLH1/MSH2 variants account for a substantial proportion of CRC patients with HNPCC/suspected-HNPCC in Pakistan. Our findings suggest that HNPCC/suspected-HNPCC families should be tested for these recurrent variants prior to comprehensive gene screening in this population.


Background
Colorectal cancer (CRC) is the fifth most common malignancy in Pakistan and endometrial cancer (EC) is the third most common gynecologic malignancy in Pakistani women [1]. The age-standardized (world) annual rates of CRC and EC are 4.0 and 3.6 per 100,000 in Pakistan, respectively. Affected individuals generally present at a young age. The majority of CRC and EC are not linked with inherited cancer syndromes. Up to 30% of CRC are hereditary and these may be divided into polyposis and non-polyposis syndromes. The term hereditary non-polyposis colorectal cancer (HNPCC) refers to patients and families who fulfill the Amsterdam criteria and differentiates familial aggregation of CRC from the polyposis phenotype. Up to 50% of HNPCC families have the Lynch syndrome (LS), with a DNA mismatch repair (MMR) defect, while the rest comprise those with a Lynch-like syndrome and a familial colorectal cancer type X (FCCTX) with no DNA MMR defects [2]. LS refers to families with a pathogenic germline variant in one of the DNA MMR genes (MLH1, MSH2, MSH6, and PMS2) or the EPCAM gene 3′ end deletions [3]. The most common pathogenic MMR gene variants (up to 90%) in LS are reported in MLH1 and MSH2 [4,5], less commonly in MSH6 (up to 10%) and uncommonly in PMS2 [6]. Deletions in EPCAM gene (1-3%) in LS are rarely reported [7]. Individuals with LS have a lifetime risk of CRC, EC, and ovarian cancer ranging from 50 to 80%, 31.5-62%, and 6.7-13.5%, respectively. These individuals also face increased lifetime risks of developing cancer of the small bowel, stomach, upper urologic tract, biliary tract, pancreas and brain [8][9][10][11][12]. Identification of individuals harboring pathogenic MMR gene variants is clinically important and has a significant impact on surveillance and management [13].
Various clinical criteria such as the Amsterdam II criteria [14,15] or the Bethesda guidelines exist for identifying patients at high risk of HNPCC. These criteria are based on a strong family history of at least three HNPCC-associated cancers, age at diagnosis and tumor histology. However, these stringent criteria have reported under-diagnosis of LS [16,17]. Less stringent criteria of suspected-HNPCC, based on a family history of only two HNPCC-linked cancers, have also been found useful in identifying pathogenic variants in MMR genes [18][19][20].
The prevalence and spectrum of pathogenic MMR gene variants show considerable variation by ethnicity and by geographic origin worldwide [21][22][23]. However, little is known about the contribution of MMR gene variants to CRC in Pakistan. In the current study, we comprehensively investigated the contribution of pathogenic germline variants in MLH1, MSH2 and MSH6 genes to 212 Pakistani cases with HNPCC/suspected-HNPCC or non-HNPCC.

Study subjects
Consecutive cases were identified at the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) in Lahore, Pakistan, from November 2007 to March 2011. These study cases were stratified into two groups: HNPCC/suspected-HNPCC group (n = 29) and non-HNPCC group (n = 183). Stringent criteria were applied for inclusion in the HNPCC subgroup. These included: (i) at least three relatives affected by histologically verified CRC or EC, small bowel or urinary tract; at least one of whom was a first degree relative of the other two, (ii) at least two of the above individuals were first degree relatives from two different generations, (iii) at least one of the above persons had cancer diagnosed at age under 50 years, (iv) familial adenomatous polyposis (FAP) had been excluded [14,15]. Somewhat less stringent criteria used for the suspected-HNPCC subgroup included: (i) diagnosis of at least one CRC, EC, small bowel or urinary tract malignancy amongst first degree relatives of a CRC patient (or in him/herself), (ii) at least one of the above cancers diagnosed under age 50, (iii) FAP had been excluded [18]. The remaining 183 enrolled CRC cases did not fulfill the diagnostic criteria of HNPCC/suspected-HNPCC and were assigned to the non-HNPCC group. Clinical and histopathological data of all index patients were collected from medical records and pathology reports. A detailed description of the 212 index cases is shown in Table 1.
The control population included 100 healthy individuals of Pakistani origin, having no family history of CRC. These were care-givers or family members of hospital registered patients or those visiting the hospital for medical reasons other than cancer. All study participants were furnished with and signed an informed written consent. The study was approved by the Institutional Review Board (IRB) of the SKMCH&RC (IRB approval number SKMCH-CRC-001).

Molecular analysis
Genomic DNA was extracted as previously described [24]. The entire coding region and exon-intron junctions of the MLH1, MSH2 and MSH6 genes (GenBank accession numbers NM_000249.3; NM_000251.2; NM_ 000179.2, respectively) were screened in 29 index patients of HNPCC/suspected-HNPCC group using denaturing high-performance liquid chromatography (DHPLC) analysis. The DHPLC analysis was carried out with the WAVE system (Transgenomics, Omaha, NE, US). PCR-primer pairs and DHPLC running conditions for MLH1/MSH2 genes were according to Kurzawski and colleagues [4] and for MSH6 gene was according to Kolodner et al. with some modifications [25] and are available upon request. When available, a positive control for each exon with a known variant was included in the DHPLC analyses.
Each sample showing variants detected by DHPLC analyses was sequenced using BigDye Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, US), as described elsewhere [26]. Bidirectional genomic DNA sequencing was performed on an independent sample to verify the presence of a sequence variant.
Pathogenic/likely pathogenic variants identified in the HNPCC/suspected-HNPCC group were subsequently screened in the non-HNPCC group by DHPLC. Novel pathogenic variants and in silico predicted likely pathogenic variants were further analyzed in 100 healthy individuals.

Classification of MMR gene variants
The MMR gene variants were stratified according to the following 5 tier classification, as described elsewhere: class 5 (pathogenic), class 4 (likely pathogenic), class 3 (uncertain significance), class 2 (likely benign) and class 1 (benign) [27]. The variants were designated as novel or previously reported variants by searching the following six databases: Exome Aggregation Consortium (ExAC), http://exac.broadinstitute.org/; Exome Sequence Project P values marked in bold are statistically significant CRC Colorectal cancer, pN0 no regional lymph node metastasis, pN1 metastasis in < 3 regional lymph nodes, pN2 metastasis in > 4 regional lymph nodes, pT2, tumor invades through muscularis propria, pT3 tumor invades through muscularis propria into pericolorectal tissues, pT4 tumor directly invades other organs or structures a One index patient with breast-endometrial cancer and the other with ovarian cancer were not included b Fisher

Characteristics of the study participants
In total, 212 unrelated Pakistani index patients were included in the current study. Of these, 86.3% were diagnosed with CRC with no family history (non-HNPCC group = 183) and 13.7% reported a family history of cancer within the spectrum of HNPCC (HNPCC/suspected-HNPCC group = 29; 9 fulfilled the HNPCC criteria and 20 met the suspected-HNPCC criteria). Characteristics of the index CRC cases are shown in
A recurrent frame shift variant in exon 12, c.1358dup (p.T455Dfs*24), was identified in two unrelated patients of Punjabi ethnicity. One patient presented with carcinoma of the sigmoid colon at 44 years of age (III:3, Fig. 1a). The other patient was diagnosed with carcinoma of the transverse colon at age 61 (III:18, Fig. 1b). Both reported a family history of HNPCC.
Another frame shift variant in exon 1, c.67delG (p.E23Kfs*13), was detected in a 48-year-old patient (II: 1, Fig. 1c) of Pathan ethnicity, who presented with carcinoma of the cecum and reported a family history of HNPCC.
A nonsense variant in exon 15, c.1672G > T (p.E558*), was identified in a 32-year-old patient (IV:2, Fig. 1d) of Kashmiri background, diagnosed with carcinoma of the transverse colon who also reported a family history of HNPCC.
One missense variant in exon 18, c.2041G > A (p.A681T), was identified in a 41-year-old patient (II:1, Fig. 1e) of Punjabi ethnicity with carcinoma of the transverse colon who reported a family history of suspected-HNPCC. This variant has been previously classified as a pathogenic variant [4,30].
A recurrent likely pathogenic splice site variant, c.943-1G > C, was found in three unrelated patients of Pathan ethnicity: one with rectosigmoid carcinoma at 32 years of age (III:1, Fig. 1f) and a family history of HNPCC. The remaining two patients harboring this variant presented with carcinoma of the ascending colon (III:2, Fig. 1g) and sigmoid colon (II:1, Fig. 1h) at age 43 and 60, respectively and both reported a family history of suspected-HNPCC.
A pathogenic nonsense variant in exon 12, c.1861C > T (p. R621*), was identified in a 45-year-old patient (II:1, Fig. 1i) of Punjabi ethnicity, who was diagnosed with carcinoma of the rectum and also reported a family history of suspected-HNPCC.
Another pathogenic nonsense variant in exon 16, c.2656G > T (p.E886*), was identified in a 67-year-old patient of Pathan ethnicity, who was diagnosed with endometrial and breast cancer at age 48 and 67, respectively. This patient had a family history of suspected-HNPCC and has been reported recently [28].

Pathogenic germline variants: non-HNPCC group
Screening of the index patients in the non-HNPCC group for the presence of the pathogenic/likely pathogenic MLH1/MSH2 variants identified in the HNPCC/suspected-HNPCC group revealed two additional pathogenic MLH1/MSH2 variants. The MLH1 missense variant,    Previously reported in Pakistani population [28] c.2041G > A (p.A681T) was detected in a 41-year-old patient (II:1, Fig. 1j) of Urdu speaking background, who was diagnosed with carcinoma of the rectum. His sister (II:2, Fig. 1j) was diagnosed with a brain tumor ( Table 4). The MSH2in-frame deletion (c.1786_1788delAAT) was identified in a 39-year-old CRC patient (II:1, Fig. 1k) of Punjabi ethnicity with a family history of breast cancer.

Other MMR gene variants: novel or previously reported
In addition to the pathogenic/likely pathogenic variants, 35 distinct MMR variants including nine novel and 26 previously reported variants were detected. Among these were eight missense variants, six silent variants, and 21 intronic variants ( Table 2). The novel variants were analyzed for their potential functional effect by in silico analyses (Table 5). A novel MLH1splice-site variant, (c.116 + 3A > T), is predicted to be the likely pathogenic as suggested by four of the five splice-site prediction algorithms integrated into the Alamut software implying that this is disease-causative. This variant was identified in a 30-year-old patient of Punjabi origin, diagnosed with carcinoma of the sigmoid colon with no family history (Table 4). This variant was not found in 100 healthy controls, further supporting its pathogenicity.
A novel MSH2 missense variant, c.2120G > A (p.C707Y), is also predicted to be a likely pathogenic as suggested by five of the seven in silico prediction tools (Table 5). This variant was identified in three unrelated patients with CRC diagnosed at or below age 54: one patient of Pathan ethnicity reported a family history of HNPCC and two Punjabi patients of the non-HNPCC group (Table 4). Moreover, this variant was found in two out of 100 healthy controls including one with a family history of carcinoma of the pharynx and Ewing's sarcoma. Characteristics of families harboring pathogenic/likely pathogenic MLH1/MSH2 variants are shown in Table 4. The remaining seven novel MMR gene variants were also analyzed for their potential functional effect by in silico analyses and classified as benign.
Among the 26 previously reported MMR gene variants, 25 were benign or likely benign (Table 2). One MLH1 missense variant, c.1919C > T (p.P640L), is predicted to be likely pathogenic as suggested by all seven in silico prediction tools used (Table 5). We identified this variant in eight unrelated CRC patients of Pathan ethnicity: six from the HNPCC/suspected-HNPCC group and two from the non-HNPCC group.

Discussion
In this first comprehensive study from Pakistan, we investigated the contribution of MLH1, MSH2, and MSH6 pathogenic germline variants to 212 patients belonging to HNPCC/suspected-HNPCC group or non-HNPCC group. Initially, index patients from the HNPCC/suspected-HNPCC group (including HNPCC = 9 and suspected-HNPCC = 20; group 1) were screened for the entire coding sequence of these genes. The pathogenic/ likely pathogenic variants identified in this group were then analyzed in the non-HNPCC group (n = 183; group 2). Eight different pathogenic/likely pathogenic variants in MLH1/MSH2 were identified, with an overall frequency of 34.5% (10/29) in group 1 and 1.1% (2/183) in group 2. No pathogenic variants were detected in the MSH6 gene. Among the group 1, five pathogenic MLH1/MSH2 variants were detected in each subgroup of HNPCC and suspected-HNPCC, with frequencies of    This variant is reported as VUS in LOVD database and considered in the current study as likely pathogenic by seven of the seven protein function prediction algorithms combined with functional assay [29] 55.6% (5/9) and 25% (5/20), respectively. The stringent criteria of HNPCC are two times more sensitive for detection of a pathogenic variant than the less stringent criteria of suspected-HNPCC. Our findings are in agreement with an international collaborative study reporting pathogenic variant detection rates of 50% (109/217) and  Table 5 In silico analysis of the The variant is considered as likely pathogenic by four of the five splice-site prediction algorithms c > 20% change in score (i.e., a wild-type splice-site score decreases and/or a cryptic splice-site score increases) is considered as significant 26% (32/123) for HNPCC and suspected-HNPCC criteria, respectively [20]. In our study, one in two patients identified with pathogenic variant did not meet the criteria of HNPCC, suggesting the need to use the criteria of suspected-HNPCC in Pakistani population.
Of the identified distinct pathogenic/likely pathogenic MLH1/MSH2 variants (n = 8) in both groups, the MSH2 variant, c.2656G > T, is likely to be specific to the Pakistani population as it has not been reported in other populations. The other seven variants have been reported in Asia, Europe, and North America [3,[30][31][32][33][34][35][36][37]. These findings suggest that the spectrum of MLH1/ MSH2 variants in Pakistan does not differ from other populations.
In the current study three distinct recurrent pathogenic/likely pathogenic variants in MLH1 (n = 2) and MSH2 (n = 1) were identified. The likely pathogenic MSH2 variant, c.943-1G > C, was identified in three unrelated HNPCC/suspected-HNPCC families of Pathan ethnicity. It was also frequently reported in HNPCC families from Germany [33]. The pathogenic MLH1 variant, c.1358dup, was found in two unrelated HNPCC families of Punjabi origin. This variant was recently found in HNPCC families from Australia [36]. The pathogenic MLH1 variant, c.2041G > A, was detected in two unrelated suspected-HNPCC or non-HNPCC families of Punjabi and Urdu-speaking background, respectively. This variant was first reported in Poland as a potential founder variant [4,31], has been reported as a recurrent variant in Scotland [30] and has also been described once each in Germany [33], and Colombia [3]. These recurrent variants accounted for 58.3% (7/12) of all MLH1/MSH2 carriers from Pakistan. This further suggests a step-wise and cost-effective strategy of screening these recurrent variants, prior to the exhaustive analyses of MMR genes in our population. However, haplotype analysis of these recurrent variants is required to classify these as true Pakistani founder variants.
In addition to eight pathogenic/likely pathogenic variants found in twelve families, 35 MMR gene variants were detected: nine novel and 26 previously reported sequence variants. Of the novel sequence variants, two were suggested as in silico predicted likely pathogenic variants. The novel MLH1splice-site variant, c.116 + 3A > T, is predicted to be likely pathogenic as suggested by four of the five splice-site prediction algorithms. This variant was identified in a CRC patient of the non-HNPCC group and was not detected in 100 healthy controls. Further evidence of the impact of c.116 + 3A > T variant on aberrant mRNA splicing could not be provided because of the unavailability of an RNA sample from this patient. The novel MSH2 missense variant, p.C707Y, is predicted to be likely pathogenic on the basis of the effect on protein function predicted by five of the seven in silico prediction tools. This variant was identified in three unrelated patients, one belonged to HNPCC group and other two were from the non-HNPCC group. It is located in the highly conserved ATPase domain (amino acid residues 620 to 855), may disrupt interaction of MSH2 with other proteins in repair pathway and result in MMR defect [38]. This variant was detected in two out of 100 healthy controls with a family history of carcinoma of the pharynx or Ewing's sarcoma. Functional analyses of both in silico predicted likely pathogenic novel variants (MLH1 c.116 + 3A > T and MSH2 p.C707Y) are warranted to further establish the association of these variants with the disease. One previously reported MLH1 missense variant, p.P640L, is a likely pathogenic variant as predicted by seven in silico prediction tools used. This variant was identified in eight unrelated CRC patients of Pathan origin: six belonged to the HNPCC/suspected-HNPCC group while the other two were from the non-HNPCC group. This variant is located in a highly conserved C-terminal interaction domain (amino acid residues 492 to 756) and may ablate interaction of MLH1 with PMS2 and result in the MMR defect. Previously, Hardt and colleagues performed two functional assays and characterized p.P640L as a pathogenic variant [29]. Overall, these findings suggest that MLH1 p.P640L is deemed to be a pathogenic variant.
Several criteria have been reported for the identification of potential candidates for the detection of pathogenic MMR gene variant. The most stringent and commonly applied Amsterdam II criteria [14,15] is based on a family history of at least three relatives with histologically verified CRC or cancers linked with HNPCC. In our study, five out of nine patients belonging to families fulfilling this criterion were found to harbor a pathogenic MLH1/MSH2 variant (5/9; 55.6%). The revised Bethesda guidelines recognize high-risk patients by the assessment of microsatellite instability and/or immunohistochemical testing of their tumors. However, this approach was not utilized due to limitations of normal/tumor tissue of study subjects. Nevertheless, the Amsterdam II criteria and Bethesda guidelines are shown to miss up to 72 and 27% of cases with HNPCC, respectively [17]. A recently suggested less stringent criteria of suspected-HNPCC are based on a family history of only two HNPCC-associated cancers [18][19][20]. In our study, five out of 20 patients belonging to families fulfilling this criterion were found to harbor a pathogenic MLH1/MSH2 variant (5/20; 25%). Of the identified twelve carriers of pathogenic/likely pathogenic variant, five carriers met the HNPCC criteria and five met the suspected-HNPCC criteria and only two carriers were found in the non-HNPCC group. Our data support the notion that the suspected-HNPCC criteria may be useful for the identification of Pakistani families. The suspected-HNPCC criteria have also been utilized in other studies from Turkey, Poland, Italy and Latvia [31,32,37,46].
In the current study, the frequency of pathogenic MMR gene variants observed in HNPCC/suspected-HNPCC group may be an underestimate as the sensitivity of DHPLC can be below 100% and screening for large genomic rearrangements or EPCAM gene 3′ end deletions was not performed. Furthermore, PMS2 mutation screening was not performed. It is possible that we could have missed PMS2 variants. However, pathogenic PMS2 variants have only rarely been reported and accounted for less than 5% of all identified pathogenic MMR gene variants [7]. Finally, the contribution of additional undiscovered gene(s) in early onset CRC patients with a family history of LS-associated cancer who tested negative for any pathogenic MMR gene variants cannot be excluded. Thus, further studies in these patients are warranted. Ethnic variations in frequencies of pathogenic MLH1/ MSH2 variant carriers have been reported in selected HNPCC families from Europe and US [21][22][23]. Similar ethnic variations in carrier frequencies of pathogenic/ likely pathogenic MLH1/MSH2 variants have been noted in our study. Of the identified variants, the majority of the families carrying MLH1 variants (3/6; 50%) belonged to the Punjabi ethnicity. Majority of the families harboring pathogenic/likely pathogenic MSH2 variants (4/5; 80%) had a Pathan background. These findings suggest that families with Punjabi or Pathan background should be first screened for the MLH1 or MSH2 gene, respectively. However, no firm conclusion could be made due to a small number of pathogenic MLH1/ MSH2 variant carriers. Furthermore, this study is not population-based and therefore might have some ascertainment bias.
Previous studies in Caucasians have predominantly reported the proximal tumor location in CRC patients harboring pathogenic MMR gene variants [47]. Similarly, in our study, CRC patients with pathogenic/likely pathogenic MLH1/MSH2 variants more commonly presented with proximal tumor location compared to non-carriers. Similar observations have been noted in other Asian studies from Singapore [40], and Japan [48]. However, no such association was reported in studies from Korea [39] and China [49]. The differences in phenotypic manifestation may be due to ethnic variations or involvement of other genetic and/or non-genetic risk factors.

Conclusion
In summary, this is the first comprehensive study conducted in Pakistani CRC patients to assess the prevalence and spectrum of MLH1, MSH2, and MSH6 pathogenic germline variants. Pathogenic/likely pathogenic MLH1/MSH2 variants account for a substantial proportion (10/29; 34.5%) of CRC patients with HNPCC/suspected-HNPCC in Pakistan, whereas no pathogenic MSH6 variants were seen. Three recurrent MLH1/MSH2 variants accounted for 58.3% (7/12) of all families carrying pathogenic/likely pathogenic variants. We recommend that HNPCC families, even those fulfilling the less stringent criteria of suspected-HNPCC, should first be tested for the recurrent pathogenic/likely pathogenic MLH1/MSH2 variants prior to whole gene screening in Pakistani patients.