Unravelling genetic variants of a swedish family with high risk of prostate cancer

Prostate cancer is the most prevalent cancer in men worldwide. It is a polygenic disease with a substantial proportion of heritability. Identification of novel candidate biomarkers is crucial for clinical cancer prevention and the development of therapeutic strategies. Here, we describe the analysis of rare and common genetic variants that can predispose to the development of prostate cancer. Whole-genome sequencing was performed on germline DNA of five Swedish siblings which were diagnosed with prostate cancer. The high-risk variants were identified setting the minor allele frequency < 0.01, CADD > 10 and if tested in PRACTICAL, OR > 1.5, while the low-risk variants were identified minor allele frequency > 0.01, CADD > 10 and if tested in PRACTICAL, OR > 1.1. We identified 38 candidate high-risk gene variants and 332 candidate low-risk gene variants, where 2 and 14 variants were in coding regions, respectively, that were shared by the brothers with prostate cancer. This study expanded the knowledge of potential risk factor candidates involved in hereditary and familial prostate cancer. Our findings can be beneficial when applying targeted screening in families with a high risk of developing the disease.


Background
Prostate cancer (PrCa) is the second most frequent cancer in men worldwide and the most common diagnosed cancer among men in the United States and most of the Western world [1]. PrCa is a heterogeneous disease that can be associated with several risk factors including age, race, familial history of PrCa, diet or environmental factors [2]. Epidemiological and twin studies had supported the pivotal role of genetic predisposition in the development of PrCa. A positive familial history of PrCa has been associated with more than a two-fold increased risk of the disease [3]. Furthermore, having a brother with PrCa represents a greater risk to be affected by the disease than being a son of a PrCa patient [4]. In addition, Scandinavia twins studies have demonstrated that the 42% of PrCa risk was due to heritable factors and the probability of having PrCa was 21% for a man with a monozygotic twin affected and 6% for a man with a dizygotic twin affected [5] with 3.7 year of difference between the diagnose of the first and second monozygotic twin [6].
For these reasons, germline mutations are becoming the focus of increased numbers of research to look for inherited variations that are suitable for potential prognosis and treatment. Although the androgen receptor (AR) has a pivotal role in PrCa, mutations in the AR gene account for a small fraction of all case [7], underling that different genetic variations may contribute to the formation of prostate tumors. Evidence has shown that the genetic contribution to inherited PrCa is constituted by a mixture of rare gene variants with  20:28 high to moderate penetrance and common variants with low penetrance. Linkage studies have identified 8q24 as a significant PrCa risk region [8] and the missense G84E variant in the HOXB13 [9] gene is highly associated with an increased risk of PrCa. It has been shown that variants in DNA repair genes, such as the BRCA1/2, ATM, CHEK2 and NBN genes, are associated with an increased risk of developing PrCa, in particular in men with advanced/metastatic PrCa [10]. Retrospective and association studies have shown that carriers of pathogenic BRCA2 variants are associated with two-to six-times higher relative risks of PrCa while carriers of pathogenic BRCA1 variants are associated with a moderate risk [11,12]. Moreover, the knowledge of a patient's germline status has become of emerging importance, in predicting the response to targeted treatments, especially in advanced or metastatic conditions [13]. The polygenic nature of PrCa susceptibility has brought the attention into genome-wide association study (GWAS) and to the formation of large collaborative international consortia that includes thousands of cases and controls to detect PrCa risk variants. Since the first GWAS in 2006 [14], around 170 variants have been identified to be risk factors for PrCa susceptibility. However, they can only explain approximately 38% of the familial relative risk of PrCa [15] and additional evidence is required to include them in routine genetic testing.
Here, we present the results of five brothers with PrCa and previously shown not to be carrying any of the highrisk variants so far identified in the literature. Through whole-genome sequencing on their germline DNA we identified novel high-and low-risk genetic variants that could contribute to the PrCa risk observed in their family.

Study population
The individuals in this study were five brothers from a Swedish family that had undergone genetic counselling at the Department of Clinical Genetics, Karolinska University Hospital Solna, Sweden. The five brothers were diagnosed with PrCa between the age of 64-80, and they had also a brother diagnosed with a neuroendocrine pancreatic tumor (NET) at the age of 79 years, and a seventh brother who was healthy (at the age of 61 years).
Additional family history of PrCa includes their father and paternal uncle were diagnosed with PrCa. No other cancer was known in the family on the mother´s or father´s side. All seven brothers gave written informed consent to participate to the study and DNA was isolated from peripheral blood.

Whole-genome sequencing of prostate cancer family samples
DNA was quantified using a Qubit Fluorometer (Life Technologies, USA) and converted to sequencing libraries using a PCR-free paired-end protocol (Illumina TruSeq DNA PCR-free for > 1000 ng input). Sequencing was done using the Illumina NovaSeq 6000 platforms aiming at 30x median coverage and performed at Clinical Genomics Stockholm, SciLifeLab.

Bioinformatics workflow
Sequencing reads were aligned to the reference genome GRCh37 using BWA [16] and Picard (http:// picard. sourc eforge. net) was used to mark PCR-duplicated reads. Variants were called using GATK best practice procedure as implemented at the Broad Institute (www. broad insti tute. org/ gatk). Variant annotation was done using Ensembl Variant Effect Predictor (VEP) [17], including RefSeq gene annotation and dbSNP153. Max minor allele frequency (MMAF) was obtained from Genome Aggregation Database [18] (gnomAD), or from the SweGen database [19] when missing from gnomAD. To predict pathogenic effects of the variants, the in silico prediction database CADD (Combined Annotation Dependent Depletion) [20] was used, while to predict the associated risk factor for PrCa the odds ratio (OR) and P values from the PRACTICAL project [21] was used. Variants in the autosomal and sex chromosomes were analysed. Variants located within segmental duplication regions obtained from the UCSC genome browser [22] were excluded.

High-and low-risk PrCa predisposing variants
Variants detected in all siblings with PrCa were divided in two groups: rare high-risk variants and common low-risk variants. Rare high-risk variants were considered variants with MMAF (gnomAD/SWEGEN) < 0.01, CADD > 10 and if tested in PRACTICAL [21], OR > 1.5. Common low-risk variants were considered variants with MMAF (gnomAD) > 0.01, CADD > 10 and if tested in PRACTICAL [21], OR > 1.1 and P-values < 0.05. For both our categories risk factors were also considered the variants shared by all brothers with PrCa and not imputed in the custom high-density genotyping array, the OncoArray, from the PRACTICAL project [21]. Further, for all variants a CADD score greater of equal 10 indicates that these are predicted to be the 10% most deleterious substitutions that you can do to the human genome, while a score of greater or equal 20 indicates the 1% most deleterious.

Enrichment analysis
The annotated variants were imported in R and Bioconductor, the disgenet2r package [24] was used for the gene-neoplastic disease association analysis and neoplastic process enrichment analysis, while the clusterProfiler package [25] was used for or the gene ontology enrichment analysis.

Identification of 38 novel candidate high-risk variants in inherited prostate cancer
To identify the PrCa predisposing genes we analysed the whole-genome sequence from five brothers which were diagnosed with PrCa. To test if they inherited genetic susceptibility to cancer we first checked if they had pathogenic variants in genes included in the comprehensive hereditary cancer panel (https:// bluep rintg eneti cs. com/ tests/ panels/ hered itary-cancer/ compr ehens ive-hered itary cancer-panel/# panel_ conte nt-headin) . We found 4 variants carried by all brothers but none of them had any known pathogenic consequences according to ClinVar (Table 1). Next, we searched for genetic variants that were shared by all sibling and we found 69,548 shared variants, of which 68,846 were in non-coding regions and 702 were in coding transcripts. Since the genetic contribution to inherited PrCa includes a variety of rare or common variants, we started our analysis of the shared variants by looking for high-risk candidates. First, we selected variants with MMAF < 0.01 and CADD > 10, to be defined as the most likely high-risk variants. We excluded variants with low risk (OR < 1.5) in the PRACTICAL study [21]. We identified 38 variants in 35 genes suggested to be associated with high PrCa risk. Most variants were intronic variants; however, two variants, rs150518260 and rs139884486, were found to be located within the coding regions of the SGCB and the IRF4 genes, respectively and they were also the most deleterious substitution (CADD > 20) that we found within our high-risk variant (Table 2). Furthermore, we found that eight of the suggested genes (NFIB, FBXW7, PPARA, ETV4, NCOA2, CYP7B1, OPCML, IRF4) have been demonstrated to be involved in neoplastic processes (Supplementary Fig. 1 A) and four of them, the ETV4, NCOA2, CYP7B1 and PPARA genes, have been shown to be associated with malignant neoplasms of the prostate (Supplementary Fig. 1B and Supplementary Table 1). None of the siblings carried the rare high-risk missense variant (G84E) on the HOXB13 gene but they were carrying one upstream variant in the non-coding region of the HOXB3 gene ( Table 2).

Identification of novel candidate low-risk variants in inherited prostate cancer
Since the five siblings affected by PrCa were sharing a high number of variants, we decided, to study also low-risk PrCa candidates. All variants shared by the five affected brothers with MMAF > 0.01, CADD > 10 and OR > 1.1 for those being tested in the PRACTI-CAL study, were selected for further analysis.In total, 332 variants in 225 genes (Supplementary Table 2) were suggested as potential low-risk candidates in familial PrCa. We found that most of the variants, 227, were in the non-coding transcript region, while 88 variants were in the intragenic region and 17 in coding regions (Table 3). Interestingly, we found that the selected genes have been described in association with several neoplastic diseases with PrCa among the top enriched neoplastic process ( Supplementary  Fig. 2 A). Gene ontology analysis showed that these genes were enriched for pathways involved in early development but also for pathways related to cell division, DNA replication and apoptotic signaling (Supplementary Fig. 2B). Fourteen variants with potential pathogenic effect (non-synonymous, frameshift, inframe deletion) were seen in 12 genes (Table 4), where  four of the genes (KRT18, TET2, IL32, SMPD1) have already been shown to be associated with PrCa (Supplementary Fig. 2 C). Moreover, nine of this fourteen variants located in TET2, NOP16, PRIM2, CACNA1B, PSMD13, SMD13, KRT18, MEF18 and KRT10 genes belong to the 1% most deleterious substitution in the human genome with a CADD score > 20 (Table 4). Among the low-risk variants all siblings were carrying two upstream variants in the non-coding regions of the HOXB3 gene and one downstream variant in the intragenic region of the HOXB6 genes (Supplementary Table 2). In addition, we found that two variants in the KRT18 gene (chr12:53343318 C > T and chr12:53343325T > A) were located on the same allele.

Discussion
To identify potential variant contribution to hereditary PrCa, we sequenced the whole genome of siblings in a family with a seemingly high risk of PrCa. As expected less than 1% of the shared variants were located within protein-coding regions of the genome. Although the stability, the structure and biochemical function of the proteins are important factors implicated in the development of PrCa, several other mechanisms that regulate the transcriptional and translational level of the protein may have a pivotal role in driving tumorigenesis. We initially screened for rare variants that could potentially represent high-risk factors in the development of PrCa. As a second approach we searched for variants considered of low-risk impact in the development of the disease. Two candidate high-risk missense variants were found in the SGCB and the IRF4 genes. The SGCB gene has not previously been linked to any neoplastic disease, mutations in this gene have been associated with the development of limb-girdle muscular dystrophy type 2E [26]. Instead, IRF4, an important regulator of the immune response, has been shown to be associated with many lymphoid malignancies with evidence pointing to a pivotal role in multiple myeloma [27]. Since inflammation is a risk factor for prostate carcinogenesis it could be possible that variations of the IFR4 gene may contribute to increased risk of PrCa. Moreover, we found that four of our high-risk variants were situated in genes earlier demonstrated to be involved in the initiation and progression of PrCa. It has been shown that ETV4 may be involved in the initial events of PrCa development when it is fused with the TMPRSS22 locus [28]. The expression of NCOA2 has been demonstrated to promote PrCa   metastasis and its inhibition has therapeutic potential [29]. The expression of CYP7B1 has been shown to be upregulated in prostate cancer [30] and it is also been demonstrated that a single polymorphism in the promoter of this gene has an effect on its expression [31]. Furthermore, we observed a variant in the PPARA gene, which is overexpressed in advanced PrCa [32]. Variants in this gene have been associated with pesticide exposure and increased risk of PrCa [33] underling the importance of gene-environment interaction as a contributing factor to increased risk of PrCa. In our analysis, we did not find pathogenic variants in any of the genes recommended for genetic testing by the National Comprehensive Cancer Network guidelines for PrCa [23], however in our high-risk group we found that they were carrying one novel variant in the upstream non coding region of the HOXB3 gene. Further, in our lowrisk category we found that they were carrying two novel variants in the HOXB3 gene and one novel variant in the HOXB6 gene but none of them was carrying the well know high-risk missense variant (G84E) in HOXB13 gene [9]. Although none of the sibling was carrying the rare HOXB13 mutation (G84E) we could suggest, based on our results, that also HOXB3 and HOXB6 genes may be PrCa susceptibility genes and germline mutation on these genes can play a role in predisposing the disease. The variation in the HOXB3 and HOXB6 genes that we found where located in upstream and downstream intragenic region and misregulation of the expression of these genes may lead to changes of the downstream gene expression and signaling pathways that play fundamental roles in the development of PrCa.
We identified several candidates as low-risk variants, as well as two missense variants, rs34402524 and rs28927679 in the TET2 and PSMD13 genes, respectively, previously shown to be associated with PrCa low-risk (OR of 1.1) [21]. It is well known that aside genetic factors also epigenetic factors can contribute to the initiation and progression of PrCa [34]. TET2 is an enzyme involved in DNA demethylation, and it has been shown to be an important player during tumorigenesis [35]. TET2 has been shown to be able to bind to the androgen receptor and modulate its signalling pathway. Downregulation of TET2 drives the PrCa proliferation and is strongly associated with reduced patient survival suggesting that the expression levels of this protein can be used as a biomarker for PrCa progression [36]. DNA methylation studies have demonstrated that DNA methylation patter is often altered in cancer compared to normal tissue [37]. This underline that a better understanding of the role DNA methylases including the TET family could be a promising therapeutic target to classify and predict PrCa clinical outcomes more accurately than clinical parameters alone. Interestingly, we found three variants in the KRT18 gene, previously linked to PrCa. KRT18 is a well-known epithelial marker in diagnostic histopathology [38] and its downregulation is associated with prostate cancer aggressiveness [39]. This study has several limitations that need to be considered. The cohort of patients is limited to a single highrisk family containing five brothers with PrCa. The study does not include other affected or unaffected in the family. Strict criteria to identify high-and low-risk variants using population frequency excludes more common variants that could contribute to increased risk of PrCa. Furthermore, in silico prediction programs may be inaccurate resulting in exclusion of variants of high impact. Our variants are not validated with a secondary method or did we identify the variants in additional high-risk families. Moreover, our study does not include analysis of epigenetic variation that could contribute to the increased risk of PrCa of this particular family.

Conclusions
PrCa is a complex disease which risk can be influenced by several genes and pathways. The identification of risk genes is crucial for genetic counselling of PrCa families and to apply the proper therapeutic strategies. The effect of a single genetic variant on the relative risk is probably low. Our study provided additional support that cumulative variants with low-or moderate effect in the germline of an individual contribute to the risk of PrCa. Further studies are needed to estimate the contribution of these variants and genes to PrCa.