The complementary roles of genome-wide approaches in identifying genes linked to an inherited risk of colorectal cancer
Hereditary Cancer in Clinical Practice volume 21, Article number: 1 (2023)
The current understanding of the inherited risk of colorectal cancer (CRC) started with an observational clinical era in the late 19th century, which was followed by a genetic era starting in the late 20th century. Genome-wide linkage analysis allowed mapping several high-risk genes, which marked the beginning of the genetic era. The current high-throughput genomic phase includes genome-wide association study (GWAS) and genome-wide sequencing approaches which have revolutionized the conception of the inherited risk of CRC. On the one hand, GWAS has allowed the identification of multiple low risk loci correlated with CRC. On the other, genome-wide sequencing has led to the discovery of a second batch of high-to-moderate-risk genes that correlate to atypical familial CRC and polyposis syndromes. In contrast to other common cancers, which are usually dominated by a polygenic background, CRC risk is believed to be equally explained by monogenic and polygenic architectures, which jointly contribute to a quarter of familial clustering. Despite the fact that genome-wide approaches have allowed the identification of a continuum of responsible high-to-moderate-to-low-risk variants, much of the predisposition and familial clustering of CRC has not yet been explained. Other genetic, epigenetic and environmental factors might be playing important roles as well. In this review we aim to provide insights on the complementary roles played by different genomic approaches in allowing the current understanding of the genetic architecture of inherited CRC.
Familial clustering of colorectal cancer (CRC) and gynecological tumors in families affected with what later came to be identified as Lynch syndrome (LS) was among the first clinical clues that led to the proposal of the principle of “inherited cancer” in the late 19th century. Back then, the idea was completely eccentric, and was first addressed in 1866 by the neuroanatomist Paul Broca who presented the clustering of 15 cases of breast cancer in his wife’s family . In 1895, the pathologist Aldred Warthin “officially” proposed the concept of cancer predisposition by establishing a 4-generation pedigree for “family G” who had CRC and gynecological tumors transmitted in an autosomal dominant (AD) fashion . An update about the family was published in the mid-1930s by Warthin’s colleagues Hauser and Weller. Yet, no specific diagnosis was made . In the meantime, familial adenomatous polyposis (FAP) was starting to come to light. In 1925, the senior surgeon Percy Lockhart-Mummery, described 3 families with polyposis and CRCs at an early age. In 1939, Cuthbert E. Dukes, together with Lockhart-Mummery, published 7 other families showing the same AD inheritance, and they contributed the condition to an inherited abnormality . In 1962, during his internal medicine residency, Henry Lynch came across a CRC patient with a family history similar to “family G” without evidence of polyposis, and in 1966, published the pedigree of the family together with another family identified by Marjorie Shaw . Since then, international collaborations have been initiated by Lynch and colleagues with the aim of collecting and evaluating cancer-prone families , and it was not until 1984 that the terminologies of hereditary non-polyposis colorectal cancer (HNPCC) and LS were coined .
During the late 1980s, the molecular basis of familial cancers began to be unraveled. In 1986, RB1 was the first identified cancer predisposition gene (CPG) [5, 6]. APC was identified in 1991 , and MSH2 was the first identified LS locus in 1993, when microsatellite instability (MSI) was first described as well . Linkage analysis is the classic method which has contributed to the discovery of more than a 100 high-to-moderate risk CPGs [1, 8], including the major CRC risk genes [9,10,11]. During this era, it became known that 2–8% of CRCs could be attributed to inherited genetic defects. However, despite the great contribution of linkage analysis to CPG mapping, the number of identified genes had soon come to a plateau, and the focus shifted toward a polygenic model of inheritance. In 2005, the first GWAS for a non-cancerous disease was published, and since then more than 50,000 associations have been reported, which have revolutionized the understanding of genetic architecture of several inherited diseases, including cancer [12,13,14]. As for CRC, GWASs have identified ~ 100 validated loci accounting for ∼12% of familial relative risk , and have additionally offered a possible explanation for the inherited personalized risk of CRC when the family history is negative . However, as the rare high-to-moderate penetrant variants and the common variants with ultra-small effect sizes are not well interrogated by GWASs, many questions regarding CRC inheritance were left unanswered [16, 17]. In the last 10 years, genome-wide sequencing approaches have led to the identification of a second batch of rare CRC genes that are mainly of moderate-risk. These genes have helped define new familial CRC and polyposis syndromes .
Together, genome-wide approaches have aided the current understanding of inherited risk of CRC, and it is believed nowadays that contrary to most common cancers which are dominated by polygenic architecture, CRC is equally governed by monogenic and polygenic inheritance . This review aims at highlighting the complementary roles played by different genomic approaches that have led to the current knowledge about the genetic architecture of inherited CRC (Fig. 1).
Linkage analysis: the classical high-penetrance CRC genes
Monogenic inheritance refers to a type of Mendelian inheritance whereby a trait is determined by the expression of a single high-risk gene. This term is related to the phrase “rare allele model of inheritance”, as it attributes inherited diseases to the inheritance of any of various rare deleterious alleles contributing to the disease. Each of these rare variants occurs at a frequency of < 1% and follows a monogenic inheritance architecture in the sense of elevating the risk to two-fold or more against the background [1, 17]. Classically, there used to be multiple arguments supporting this model of inheritance, of which the evolutionary theory provides the strongest ground by advocating that deleterious variants should be selected against, and hence could not be common . Further supporting this model of inheritance are the multiple familial disorders which can be browsed on the Online Mendelian Inheritance in Man (OMIM) database .
Genome-wide linkage analysis, which was developed in the second half of the 20th century, is the classic statistical method applied for mapping Mendelian traits to their chromosomal locations by examining genome-wide markers such as microsatellite markers (and later on, single nucleotide polymorphisms (SNPs)) in pedigrees segregating the trait. After identifying the chromosomal region linked to the trait, targeted sequencing is conducted to identify the particular responsible gene (Fig. 1A). An accurate extended pedigree with a near-typical Mendelian inheritance pattern is an important prerequisite for linkage analysis to yield a statistically significant result .
As for CRC, linkage analysis has allowed the identification of the 10 classic high-penetrance genes that contribute to the well-known hereditary CRC and polyposis syndromes. These genes include the 4 DNA mismatch repair (MMR) genes, MLH1, MSH2, MSH6 and PMS2, that are inherited in an AD manner and cause LS when mutated. There are also the APC gene with AD mutations, and the MUTYH gene, which follow autosomal recessive (AR) inheritance, and contribute to FAP, together with SMAD4, BMPR1A, STK11 and PTEN which contribute to the less frequent CRC predisposing syndromes characterized by the presence of hamartomatous polyps [10, 11]. These discoveries of hereditary CRC and polyposis syndromes, together with the development of multiple-CPG panels  have led to significant implications on clinical oncology practice, facilitating cancer predisposition services . In addition, patients with some CRC predisposition syndromes can particularly benefit from therapeutic options such as immunotherapy for LS patients .
On the other hand, there are two major problems that have come to light with the discoveries of these high-penetrance CPGs. The first is the psychosocial burden of available genetic information on carrier families who need to adapt to the dilemma of living with the uncertainties of several family members manifesting the various tumors predicted by the genetic testing. An MSH2 carrier for example has an 80.4% probability that he or she will develop any cancer by the age of 75 years, including a 43% risk of CRC and a 65.7% risk of a gynecological cancer for women, in addition to a 50% probability that each of their offspring carries the variant with its risks of various malignancies . The second concern is that much of the familial cases have not been solved by these genes. This gap between the number of familial CRC cases, and the number explained by high-risk genes, was well illustrated in 2021 by Hemminki et al. who have studied familial CRCs in the Swedish population . They showed that among the 49,000 CRCs that were diagnosed in one generation, there were 7,650 patients (15.7%) who had a family history of CRC in parents or siblings; among these, only 417 (5.5%) had two or more family members (parents or siblings) affected by CRC, which could be suggestive of underlying high-risk CRC genes .
Genome-wide association studies: polygenic architecture of CRC
With the beginning of the 21st century and with the gap in understanding of genetic architecture of diseases left by monogenic model of inheritance, the focus shifted to complex polygenic inheritance of common low-risk variants, for which GWAS has become the most popular approach to find associations at a population level. In comparison to genome-wide linkage analysis, which is based on examining genome-wide polymorphic markers within the same family, whereby a few opportunities of recombination events have occurred, the GWAS approach examines SNPs throughout the genomes of thousands of genetically diverse individuals, allowing a higher-resolution map (Fig. 1B). However, this advantage comes with an increased probability of false-positive results if not properly accounted for . The term “polygenic inheritance” refers to a non-Mendelian model of inheritance, whereby a trait is governed by the co-inheritance of 2 or more independent genes that exert additive effects . With the development of GWAS, this model has gone through two major refinements, the first was the “common disease-common variant hypothesis” which correlates traits to the co-inheritance of a small number of moderate-effect loci. Then came the better-developed “infinitesimal model of inheritance” which correlates traits to co-inheritance of hundreds or thousands of common small-effect size variants . As for malignancies, it is believed that most of the common cancers, such as lung and prostate cancer, are dominated by polygenic inheritance, and that most rare malignancies are dominated by monogenic inheritance. However, as the “common disease-common variant model” has not been able to explain much of the “missing heritability” of diseases , it is not surprising that some cancers do not follow this paradigm. Examples are ovarian cancer which is dominated by monogenic inheritance despite being a common disease, and CRC which seems to be equally dictated by monogenic and polygenic architectures .
GWASs have identified ~ 100 independent common low-risk loci for CRC [11, 15]. There are multiple applications of these discoveries. Firstly, they have provided an explanation of about 12% of familial relative risk of CRC which do not manifest typical Mendelian inheritance [11, 15]. Secondly, GWAS has allowed the development of polygenic risk scores (PRSs) which calculate the total risk inferred by the various risk loci harbored by individual subjects and therefore predicts their personalized risk of cancer . These PRSs could be theoretically used to develop personalized cancer screening protocols. A British group has suggested adopting a personalized screening protocol for CRC based on a PRS in the top 1% of risk, which infers a 7.7-fold increased risk of developing CRC. Such an approach could result in 26% fewer subjects being eligible for screening, in comparison to the currently utilized age-based protocol, which could have positive economic implications. However, it would also decrease the number of cases detected by screening by about 6% . In addition, while an elevated PRS is associated with a theoretical few-fold increased risk of a disease, it refers to a relative risk in comparison to individuals across the PRS continuum within a specific population [23, 25]. As a consequence, its utility for individual-level risk prediction is limited, as is its transferability across populations. Thus, PRSs are still considered as research tools and have not been put into clinical practice [23, 25].
Thirdly, GWASs have allowed the discovery of novel oncogenic pathways by further analysis and characterization of the function of the genes defined by candidate loci . It is estimated that 20% of identified loci by GWASs include a pathogenic gene that is involved in monogenic disease inheritance . In fact, GWAS might be more important to understand the somatic pathways for CRC than germline processes. A meta-analysis of GWASs conducted in 2019 has identified, in addition to the established pathways related to TGF-β/SMAD, BMP, Wnt-β-catenin, Hedgehog signaling, cell cycle, and telomere maintenance, further signals implicating Krüppel-like factors, Hippo-YAP signaling, long noncoding RNAs, somatic drivers, and supported a role of immune function . However, this expanding knowledge about the polygenic architecture of CRC is challenging the current understanding of CRC biological pathways, together with the ability of developing precision therapeutics [15, 28]. Finally, as GWAS is a population-based approach designed to detect associations with common low-risk loci and not rare moderate-to-high risk genes, much of the familial CRC cases has remained unanswered despite the discoveries revealed by GWAS .
Genome-wide sequencing approach: moderate-penetrance CRC genes and the new familial CRC and polyposis syndromes
A decade after the development of GWAS, whole-exome (WES) and whole-genome sequencing (WGS) approaches have been developed and led to the identification of various putative cancer predisposition variants, that could neither have been identified by linkage analysis, due to their non-Mendelian inheritance, nor by GWASs due to their rare frequencies among general populations (Fig. 1C) . As for CRC, genome-wide sequencing approaches have allowed the identification of a second batch of high-to-moderate risk genes, together with a new group of familial CRC and polyposis syndromes . These syndromes include polymerase proofreading-associated polyposis (PPAP) which result from an AD mutation of POLE or POLD1 genes , NTHL1-associated polyposis which, similar to MUTYH, is involved in base-excision repair (BER) pathway, and also follows AR inheritance , and mismatch repair gene biallelic inactivation-related adenomatous polyposis due to mutations of MSH3  or MLH3 . Additional syndromes include GREM1-associated hereditary mixed polyposis syndrome (HMPS1) , RNF43-associated serrated polyposis , and RPS20 mutations which is a rare cause of hereditary nonpolyposis CRC . With these discoveries coming to light, it can be said nowadays that both monogenic and polygenic architectures attribute equally to the inheritance of CRC, and together they contribute about 25% of familial clustering .
However, as with the development of GWAS, the growing application of genome-wide sequencing and the increased amount of available genetic data, has similarly led to increased concerns regarding the utility of this knowledge. Variants of unknown significance (VUS) in known CPGs is one of these problems . Lucci-Cordisco et al.  have reviewed current data regarding VUS of 24 CPGs included by the American College of Medical Genetics/Association for Molecular Pathology in the list of genes that should be considered for the return of incidental findings . In this review, which was based on Clinvar Miner (accessed on September 14th, 2021) , the number of identified VUS for the APC gene, for example, was about 5100 in comparison to 1300 pathogenic variants . Another problem is the increasing number of cases of CRC that are observed in families harboring germline variants in CPGs that are not generally associated with CRC, which raises the debate whether screening for CRC should be initiated in such families. BRCA1 and BRCA2, which are associated with hereditary breast and ovarian cancer are the best studied in this context. Nevertheless, the debate whether pathogenic BRCA variants increase the risk of CRC is still ongoing [41, 42]. Other genes, raising similar debates, were summarized in a review of Valle et al. .
Not infrequently, genome-wide sequencing methods on familial or early-onset CRC cases, as well as within CRC families, have led to the identification of novel candidate genes, that need validation. In family-based studies, data are often analyzed using filtering approaches whereby a few affected and unaffected family members are prioritized for sequencing, to exclude variants that are not shared by affected individuals. Such an approach takes into consideration the type of the variant and available data on diverse genomic in silico databases . In addition to well-established InSiGHT database with evidence-based classification of variants in high-penetrance CRC genes , novel approaches are emerging for evaluation of variants in novel candidate genes. An example is the familial cancer variant prioritization pipeline (FCVPP) that has been developed for detection of deleterious germline variants with potential clinical importance in cancer predisposition . As well, there are increasing proposals for next-generation linkage analysis methodologies to be coupled with family-based WGS studies, instead of the filtering approach [16, 29]. Such efforts further illustrate the complementary roles of the various genome-wide approaches.
To conclude, genome-wide linkage analysis, GWAS, and genome-wide sequencing approaches have allowed the current comprehensive, but not exhaustive, understanding of the genetic architecture of CRC. Nevertheless, much of the heritability of the disease has remained unexplained, and might be attributed to a broader model, including other biological and environmental modifiers.
Availability of data and materials
All reviewed papers were cited in the references.
Genome-wide association study
Familial adenomatous polyposis
Hereditary non-polyposis colorectal cancer
Online Mendelian Inheritance in Man
Single nucleotide polymorphisms
Polygenic risk scores
Hereditary mixed polyposis syndrome
Variants of unknown significance
Familial cancer variant prioritization pipeline
Rahman N. Realizing the promise of cancer predisposition genes. Nature. 2014;505(7483):302–8.
Lynch HT, Snyder CL, Shaw TG, Heinen CD, Hitchins MP. Milestones of Lynch syndrome: 1895–2015. Nat Rev Cancer. 2015;15(3):181–94.
Dukes CE. Familial intestinal polyposis. Ann Eugen. 1952;17(Part 1):1–29.
Plazzer JP, Sijmons RH, Woods MO, Peltomaki P, Thompson B, Den Dunnen JT, et al. The InSiGHT database: utilizing 100 years of insights into Lynch syndrome. Fam Cancer. 2013;12(2):175–80.
Hino O, Kobayashi T. Mourning Dr. Alfred G. Knudson: the two-hit hypothesis, tumor suppressor genes, and the tuberous sclerosis complex. Cancer Sci. 2017;108(1):5–11.
Berry JL, Polski A, Cavenee WK, Dryja TP, Murphree AL, Gallie BL. The RB1 Story: Characterization and Cloning of the First Tumor Suppressor Gene. Genes (Basel). 2019;10(11):879.
Friedrich A, Kullmann F. Familial adenomatous polyposis syndrome (FAP): pathogenesis and molecular mechanisms. Med Klin (Munich). 2003;98(12):776–82.
Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33(Suppl):228–37.
Turnbull C, Sud A, Houlston RS. Cancer genetics, precision prevention and a call to action. Nat Genet. 2018;50(9):1212–8.
Valle L. Genetic predisposition to colorectal cancer: where we stand and future perspectives. World J Gastroenterol. 2014;20(29):9828–49.
Valle L, de Voer RM, Goldberg Y, Sjursen W, Forsti A, Ruiz-Ponte C, et al. Update on genetic predisposition to colorectal cancer and polyposis. Mol Aspects Med. 2019;69:10–26.
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45(D1):D896–901.
Tam V, Patel N, Turcotte M, Bosse Y, Pare G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–84.
Loos RJF. 15 years of genome-wide association studies and no signs of slowing down. Nat Commun. 2020;11(1):5900.
Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet. 2019;51(1):76–87.
Ott J, Wang J, Leal SM. Genetic linkage analysis in the age of whole-genome sequencing. Nat Rev Genet. 2015;16(5):275–84.
Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2012;13(2):135–45.
OMIM. Available from: https://www.omim.org/.
Hemminki K, Sundquist K, Sundquist J, Forsti A, Hemminki A, Li X. Familial Risks and Proportions Describing Population Landscape of Familial Cancer. Cancers (Basel). 2021;13(17):4385.
Luchini C, Bibeau F, Ligtenberg MJL, Singh N, Nottegar A, Bosse T, et al. ESMO recommendations on microsatellite instability testing for immunotherapy in cancer, and its relationship with PD-1/PD-L1 expression and tumour mutational burden: a systematic review-based approach. Ann Oncol. 2019;30(8):1232–43.
Moller P, Seppala TT, Bernstein I, Holinski-Feder E, Sala P, Gareth Evans D, et al. Cancer risk and survival in path_MMR carriers by gene and gender up to 75 years of age: a report from the Prospective Lynch Syndrome Database. Gut. 2018;67(7):1306–16.
NIH. Polygenic Trait: NIH; Available from: https://www.genome.gov/genetics-glossary/Polygenic-Trait.
Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12(1):44.
Frampton MJ, Law P, Litchfield K, Morris EJ, Kerr D, Turnbull C, et al. Implications of polygenic risk for personalised colorectal cancer screening. Ann Oncol. 2016;27(3):429–34.
Kullo IJ, Lewis CM, Inouye M, Martin AR, Ripatti S, Chatterjee N. Polygenic scores in biomedical research. Nat Rev Genet. 2022;23(9):524–32.
Varghese JS, Easton DF. Genome-wide association studies in common cancers–what have we learnt? Curr Opin Genet Dev. 2010;20(3):201–9.
Hirschhorn JN. Genomewide association studies–illuminating biologic pathways. N Engl J Med. 2009;360(17):1699–701.
Low SK, Takahashi A, Mushiroda T, Kubo M. Genome-wide association study: a useful tool to identify common genetic variants associated with drug toxicity and efficacy in cancer pharmacogenomics. Clin Cancer Res. 2014;20(10):2541–52.
Vieland VJ, Devoto M. Next-generation linkage analysis. Hum Hered. 2011;72(4):227.
Palles C, Cazier JB, Howarth KM, Domingo E, Jones AM, Broderick P, et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet. 2013;45(2):136–44.
Weren RD, Ligtenberg MJ, Geurts van Kessel A, De Voer RM, Hoogerbrugge N, Kuiper RP. NTHL1 and MUTYH polyposis syndromes: two sides of the same coin? J Pathol. 2018;244(2):135–42.
Adam R, Spier I, Zhao B, Kloth M, Marquez J, Hinrichsen I, et al. Exome Sequencing Identifies Biallelic MSH3 Germline Mutations as a Recessive Subtype of Colorectal Adenomatous Polyposis. Am J Hum Genet. 2016;99(2):337–51.
Olkinuora A, Nieminen TT, Martensson E, Rohlin A, Ristimaki A, Koskenvuo L, et al. Biallelic germline nonsense variant of MLH3 underlies polyposis predisposition. Genet Med. 2019;21(8):1868–73.
Jaeger E, Leedham S, Lewis A, Segditsas S, Becker M, Cuadrado PR, et al. Hereditary mixed polyposis syndrome is caused by a 40-kb upstream duplication that leads to increased and ectopic expression of the BMP antagonist GREM1. Nat Genet. 2012;44(6):699–703.
Carballal S, Rodriguez-Alcalde D, Moreira L, Hernandez L, Rodriguez L, Rodriguez-Moranta F, et al. Colorectal cancer risk factors in patients with serrated polyposis syndrome: a large multicentre study. Gut. 2016;65(11):1829–37.
Valle L. Recent Discoveries in the Genetics of Familial Colorectal Cancer and Polyposis. Clin Gastroenterol Hepatol. 2017;15(6):809–19.
Levkova M, Stoyanova M, Benkova-Petrova M, Georgieva M, Angelova L. Variants of uncertain significance in the era of next-generation sequencing. J Am Assoc Nurse Pract. 2022;34(8):1018–21.
Lucci-Cordisco E, Amenta S, Panfili A, Del Valle J, Capella G, Pineda M, et al. Variants of uncertain significance (VUS) in cancer predisposing genes: What are we learning from multigene panels? Eur J Med Genet. 2022;65(1):104400.
Miller DT, Lee K, Chung WK, Gordon AS, Herman GE, Klein TE, et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23(8):1381–90.
NCBI. ClinVar Miner. Available from: https://clinvarminer.genetics.utah.edu/.
Esteban-Jurado C, Franch-Exposito S, Munoz J, Ocana T, Carballal S, Lopez-Ceron M, et al. The Fanconi anemia DNA damage repair pathway in the spotlight for germline predisposition to colorectal cancer. Eur J Hum Genet. 2016;24(10):1501–5.
Garre P, Martin L, Sanz J, Romero A, Tosar A, Bando I, et al. BRCA2 gene: a candidate for clinical testing in familial colorectal cancer type X. Clin Genet. 2015;87(6):582–7.
Kumar A, Bandapalli OR, Paramasivam N, Giangiobbe S, Diquigiovanni C, Bonora E, et al. Familial Cancer Variant Prioritization Pipeline version 2 (FCVPPv2) applied to a papillary thyroid cancer family. Sci Rep. 2018;8(1):11635.
It is a pleasure to acknowledge Christian Sutter who is a senior staff scientist at Heidelberg Institute of Human Genetics, who suggested a few useful references that helped in drafting the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ahmad, O., Försti, A. The complementary roles of genome-wide approaches in identifying genes linked to an inherited risk of colorectal cancer. Hered Cancer Clin Pract 21, 1 (2023). https://doi.org/10.1186/s13053-023-00245-5
- High-risk gene
- Low-risk gene
- Inherited Cancer
- Cancer predisposition