disgenet2r: An R package to explore the molecular underpinnings of human diseases

Introduction

The disgenet2r package contains a set of functions to retrieve, visualize and expand DISGENET data (Piñero et al. 2021, 2019). DISGENET is a comprehensive discovery platform that integrates more than 30 millions associations between genes, variants, and human diseases. The information in DISGENET has been extracted from expert-curated resources and from the literature using state-of-the-art text mining technologies (Table 1).

To use DISGENET and the disgenet2r package, you need to acquire a license. Please contact us at for license conditions and pricing.

Table 1: Sources of DISGENET data
Source_Name Type_of_data Description
CLINGEN GDAs The Clinical Genome Resource
ORPHANET GDAs The portal for rare diseases and orphan drugs
PSYGENET GDAs Psychiatric disorders Gene association NETwork
HPO GDAs Human Phenotype Ontology
MGD_HUMAN GDAs Mouse Genome Database, human data
MGD_MOUSE GDAs Mouse Genome Database, mouse data
RGD_HUMAN GDAs Rat Genome Database, human data
RGD_RAT GDAs Rat Genome Database, rat data
UNIPROT GDAs/VDAs The Universal Protein Resource
CLINVAR GDAs/VDAs ClinVar Database
GWASCAT GDAs/VDAs The NHGRI-EBI GWAS Catalog
PHEWASCAT GDAs/VDAs The PHEWAS Catalog
TEXT MINING HUMAN GDAs/VDAs Data from text mining medline abstracts, human
TEXT MINING MODELS GDAs Data from text mining medline abstracts, models
CLINICAL TRIALS GDAs Data from Clinicaltrials.org
CURATED GDAs/VDAs Human curated sources: ClinGen, UniProt, Orphanet, PsyGeNET, ClinVar, MGD Human
INFERRED GDAs Inferred data from the HPO and the GWAS Catalog
MODELS GDAs Data from animal models: MGD MOUSE and TEXT MINING MODELS
ALL GDAs/VDAs All data sources

You can test DISGENET and the disgenet2r package by registering for a free trial account here.

disgenet2r package usage limits

Trial account

Please note that the trial account enables you to test all the functions of the disgenet2r package, but the queries to DISGENET database have the following restrictions:

  • Only the top-30 results ordered by descending DISGENET score are returned (pagination is not supported).

  • Multiple-entity queries support at most 10 entities (genes, diseases, variants).

  • The access to DISGENET with a TRIAL account will expire after 7 days from the day of activation.

Other plans

There are limits in place for the disgenet2r package to ensure smooth performance for all users. These limits apply to academics, advanced, and premium users, mirroring the limits of the DISGENET REST API.

Here’s a breakdown of the limitations:

  • A maximum of 100 pages of results are returned.

  • Multiple-entity queries support at most 100 entities (genes, diseases, variants).

Important Note: The package will display a warning message if you exceed these limits.

Recommendations for Efficient Use:

To improve performance and avoid exceeding limits, consider querying with smaller batches of entities. You can also use disgenet metrics and annotations to refine your search and reduce the number of returned results.

Installation and first run

The package disgenet2r is available through GitLab. The package requires an R version > 3.5.

Install disgenet2r by typing in R:

library(devtools)
install_gitlab("medbio/disgenet2r")

To load the package:

library(disgenet2r)

Once you have completed the registration process, go to your user profile…

… and retrieve your API key

After retrieving the API key from your user profile, run the lines below so the key is available for all the disgenet2r functions.

api_key <- "enter your API key here"
Sys.setenv(DISGENET_API_KEY= api_key)

In the following document, we illustrate how to use the disgenet2r package through a series of examples.

Quick Start

The functions in the disgenet2r package receive as parameters one entity (gene, disease, variant, and chemical), or a list of entities (up to 100) and combinations of them. In addition, they have the following parameters:

  • score A vector with two elements: 1) initial value of score 2) final value of score. Default 0-1.

  • database
    Name of the database that will be queried. Default CURATED. It can take the values: ‘CLINGEN’, ‘CLINVAR’, ‘ORPHANET’, ‘PSYGENET’, ‘UNIPROT’, ‘CURATED’, ‘HPO’, ‘GWASCAT’, ‘PHEWASCAT’, ‘INFERRED’, ‘MGD_HUMAN’, ‘MGD_MOUSE’, ‘RGD_HUMAN’, ‘RGD_RAT’, ‘TEXTMINING_MODELS’, ‘MODELS’, ‘TEXTMINING_HUMAN’, “CLINICALTRIALS” , and ‘ALL’.

  • n_pags
    A number between 1 and 100 indicating the number of pages to retrieve from the results of the query. Default 100. If a number of pages larger than 100 is indicated, the function will stop.

  • verbose By default FALSE. Change it to TRUE to enable real-time logging from the function.

  • order_by
    By default score. Depending on the type of query, it can accept the following values: score, dsi, dpi, pli, pmYear, ei, yearInitial, yearFinal, numCTsupportingAssociation.

Below, an example of a query for the BRCA1 gene in ALL the data. Notice that this query retrieves over 300 pages of results. Only the first 10,000 results will be retrieved (100 pages, 100 results per page).

results <- gene2evidence( gene = "BRCA1", vocabulary = "HGNC", database = "ALL")
## Notice that your query has a maximum of 341 pages.
## By using the default n_pags (100), your query of 341 pages has been reduced to 100 pages.
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        BRCA1 
##  . Results:  10000

Retrieving Gene-Disease Associations from DISGENET

Searching by gene

The gene2disease function retrieves the GDAs in DISGENET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary. By default, vocabulary = "HGNC". To switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.

The function also requires the user to specify the source database using the argument database. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from PsyGeNET, ClinGen, ClinVar, MGD Human data, UniProt, and Orphanet.

The information can be filtered using the DISGENET score. The argument score consists of a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1).

In the example, the query for the Leptin Receptor (Gene Symbol LEPR, and Entrez NCBI Identifier 3953) is performed in the curated data in DISGENET.

results <- gene2disease( gene = 3953, vocabulary = "ENTREZ",
                       database = "CURATED")

The function gene2disease produces an object DataGeNET.DGN that contains the results of the query.

class(results)
## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"

Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single or list), the type of entity (gene-disease), the selected database (CURATED), the score range used in the search (0-1), and the gene NCBI identifier (3953).

results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        3953 
##  . Results:  67

To obtain the data frame with the results of the query

tab <- results@qresult
head( tab, 3 )
##   gene_symbol geneid       ensemblid   geneNcbiType geneDSI geneDPI    genepLI
## 1        LEPR   3953 ENSG00000116678 protein-coding   0.412   0.957 8.8607e-05
## 2        LEPR   3953 ENSG00000116678 protein-coding   0.412   0.957 8.8607e-05
## 3        LEPR   3953 ENSG00000116678 protein-coding   0.412   0.957 8.8607e-05
##       uniprotids protein_classid protein_class_name
## 1 Q4G138, P48357    DTO_05007599          Signaling
## 2 Q4G138, P48357    DTO_05007599          Signaling
## 3 Q4G138, P48357    DTO_05007599          Signaling
##                    disease_name diseaseType diseaseUMLSCUI
## 1                       Obesity     disease       C0028754
## 2 Adult-Onset Diabetes Mellitus     disease       C0011860
## 3             Diabetes Mellitus     disease       C0011849
##                                                                            diseaseClasses_MSH
## 1 Nutritional and Metabolic Diseases (C18), Pathological Conditions, Signs and Symptoms (C23)
## 2                   Nutritional and Metabolic Diseases (C18), Endocrine System Diseases (C19)
## 3                   Nutritional and Metabolic Diseases (C18), Endocrine System Diseases (C19)
##       diseaseClasses_UMLS_ST
## 1 Disease or Syndrome (T047)
## 2 Disease or Syndrome (T047)
## 3 Disease or Syndrome (T047)
##                                        diseaseClasses_DO
## 1                        disease of metabolism (0014667)
## 2 genetic disease (630), disease of metabolism (0014667)
## 3 genetic disease (630), disease of metabolism (0014667)
##                                                                           diseaseClasses_HPO
## 1                                                                 Growth abnormality (01507)
## 2 Abnormality of metabolism/homeostasis (01939), Abnormality of the endocrine system (00818)
## 3 Abnormality of metabolism/homeostasis (01939), Abnormality of the endocrine system (00818)
##   numCTsupportingAssociation numPMIDs
## 1                          7       14
## 2                          0        4
## 3                          1        1
##                                                                                                                                                                                                                                                                                                                                                                              chemicalsIncludedInEvidence
## 1 C0041984, C0039601, C0014942, C0245514, C0028128, C0019392, C0076275, C0017986, C0039286, C0045811, C0002006, C1145760, C1135174, uridine, testosterone, estrone, troglitazone, nitric oxide, hesperidin, orlistat, glycyrrhetinic acid, tamoxifen, 2-amino-1-methyl-6-phenylimidazo(4,5-b)pyridine, aldosterone, treprostinil, H-Indol-2-one, 3-((3,5-dimethyl-1H-pyrrol-2-yl)methylene)-1,3-dihydro-
## 2                                                                                                                                                                                                                                                                                                                        C0041984, C1504945, C0038432, C0021936, uridine, INO-1001, streptozocin, inulin
## 3                                                                                                                                                                                                                                                         C0039601, C0025598, C0038432, C0028193, C0001041, C1307704, testosterone, metformin, streptozocin, nitroprusside, acetylcholine, RUBOXISTAURIN
##                                                                                                                                                                                                                    numberPmidsWithChemsIncludedInEvidenceBySource
## 1 ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 11, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 0, 1, 2, 0, 0
## 2  ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0
## 3  ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0
##   score yearInitial yearFinal evidence_level evidence_index diseaseid
## 1   1.0        1986      2023             NA      0.8634538  C0028754
## 2   0.9        2010      2016             NA      0.9112903  C0011860
## 3   0.9        2003      2003             NA      0.8260870  C0011849

The same query can be performed using the Gene Symbol (LEPR) and the data source (TEXTMINING_HUMAN). Notice how the number of diseases associated to the Leptin Receptor has increased.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  420

The same query can be performed using the ENSEMBL gene identifier of the LEPR gene (ENSG00000116678) by setting the vocabulary to ENSEMBL.

results <- gene2disease( gene = "ENSG00000116678",
                        vocabulary = "ENSEMBL",
                       database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        ENSG00000116678 
##  . Results:  420

Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.3,1) is used. Notice how the number of diseases associated to the Leptin Receptor drops when the score is restricted.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "ALL",
                       score =c(0.3,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.3-1 
##  . Term:        LEPR 
##  . Results:  93

In Table 2 are shown the top 20 diseases associated to the LEPR gene

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )
knitr::kable(tab[1:10,], caption = "Top diseases associated to LEPR" ) 
Table 2: Top diseases associated to LEPR
gene_symbol disease_name score yearInitial yearFinal
LEPR Obesity 1.00 1966 2024
LEPR Adult-Onset Diabetes Mellitus 0.90 1966 2024
LEPR Diabetes Mellitus 0.90 1981 2023
LEPR High blood pressure 0.85 1998 2022
LEPR Hyperinsulinism 0.85 1986 2022
LEPR Polyphagia 0.85 1986 2023
LEPR Morbid Obesities 0.85 1995 2024
LEPR Hyperglycemia 0.80 1986 2024
LEPR NAFLD - Nonalcoholic Fatty Liver Disease 0.80 2006 2024
LEPR Diabetes, Gestational 0.80 1999 2024

Visualizing the diseases associated to a single gene

The disgenet2r package offers two options to visualize the results of querying a single gene in DISGENET: a network showing the diseases associated to the gene of interest (Gene-Disease Network), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Gene-Disease Network on a DataGeNET.DGN object (Figure 1). In the Gene-Disease Network the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop parameter allows to adjust the size of the nodes, while the eprop parameter adjusts the width of the edges while keeping the proportionality to the score.

plot( results,
      type = "Network",
      prop = 20, eprop =5, verbose = T)
The **Gene-Disease Network** for the Leptin Receptor gene

Figure 1: The Gene-Disease Network for the Leptin Receptor gene

Use interactive = TRUE to display an interactive plot (Figure 2).

plot( results,
      type = "Network",
       interactive = TRUE)

Figure 2: The interactive Gene-Disease Network for the Leptin Receptor gene

The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to DiseaseClass (Gene-Disease Class Network, Figure 3). In the Gene-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Leptin Receptor is associated mainly to Nutritional and Metabolic Diseases. There are 2 diseases in the example that do not have annotations to MeSH disease class (shown as a warning).

plot( results,
      class = "DiseaseClass",
       interactive=T, verbose = T)

Figure 3: The Disease Class Network for the Leptin Receptor Gene

Exploring the attributes of a gene

The gene2attribute function allows to retrieve the information for a specific gene, or list of genes.

results <- gene2attribute( gene  = "3953", vocabulary = "ENTREZ"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene 
##  . Database:     ALL 
##  . Score:         
##  . Term:        3953

The result shows the the Disease Specificity Index (DSI), and the Disease Pleiotropy Index (DPI) for the gene (Table 3).

tab <-results@qresult
knitr::kable(tab, caption = "Gene attributes for LEPR") 
Table 3: Gene attributes for LEPR
description geneid gene_symbol ensembl_ids uniprotids proteinClasses ncbi_type geneDSI geneDPI genepLI
leptin receptor 3953 LEPR ENSG00000116678 Q4G138 DTO_05007599, DTO , Signaling protein-coding 0.412 0.957 8.86e-05
leptin receptor 3953 LEPR ENSG00000116678 P48357 DTO_05007599, DTO , Signaling protein-coding 0.412 0.957 8.86e-05

Exploring the evidences associated to a gene

You can extract the evidences associated to a particular gene using the function gene2evidence. Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease.

results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
                        disease ="UMLS_C3554225", database = "ALL")
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  18

The results are shown in Table 4.

tab <- results@qresult
tab <-  tab %>%
  filter(reference_type == "PMID") %>%
  select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))

tab <- tab %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(  pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) ) ) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY" ) 
Table 4: Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY
pmid associationType Year Sentence
37140700 GeneticVariation 2023 In conclusion, we reported ten new patients with leptin and leptin receptor deficiencies and identified six novel LEPR variants expanding the mutational spectrum of this rare disorder.
33922961 GeneticVariation 2021 Recently, we discovered a spontaneous compound heterozygous mutation within the leptin receptor, resulting in a considerably more obese phenotype than described for the homozygous leptin receptor deficient mice.
29158088 AlteredExpression 2018 In this study, we demonstrate that leptin receptor activation directly affects iron metabolism by the finding that serum levels of hepcidin, the master regulator of iron in the whole body, were significantly lower in leptin-deficient (ob/ob) and leptin receptor-deficient (db/db) mice.
25751111 GeneticVariation 2015 Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect.
24611737 CausalMutation 2014 Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population.
22810975 GeneticVariation 2012 Variants in the LEPR gene are nominally associated with higher BMI and lower 24-h energy expenditure in Pima Indians.
18703626 CausalMutation 2008 Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor.
17229951 CausalMutation 2007 Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
16284652 CausalMutation 2005 Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes.
12646666 GeneticVariation 2003 Binge eating as a major phenotype of melanocortin 4 receptor gene mutations.
12031989 AlteredExpression 2002 These data demonstrate that leptin is not needed for ObR gene expression, and they suggest that leptin plays a role in receptor downregulation because sObR levels are negatively correlated with leptin levels and BMI in control subjects, whereas sObR levels are not depressed in obese leptin-deficient or leptin receptor-deficient individuals.
9860295 GeneticVariation 1998 Transmission disequilibrium and sequence variants at the leptin receptor gene in extremely obese German children and adolescents.
9537324 CausalMutation 1998 A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9537324 GeneticVariation 1998 A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9144432 GeneticVariation 1997 Amino acid variants in the human leptin receptor: lack of association to juvenile onset obesity.

To visualize the results when there are many evidences, we suggest to use plot the results using the argument Points (Figure 4). It is important to set the parameter limit to 10,000, in order to include all the evidences in the plot.

results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
                        database = "ALL", score=c(0.7,1) )
plot(results, type="Points",   interactive=T, limit=10000)

Figure 4: The Evidences plot for the Leptin Receptor gene

Searching multiple genes

The gene2disease function can also receive as input a list of genes, either as Entrez NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 5) and then, we apply the function gene2disease.

Table 5: Example of voltage-gated potassium channel family members
Name Description
KCNE1 potassium channel, voltage gated subfamily E regulatory beta subunit 1
KCNE2 potassium channel, voltage gated subfamily E regulatory beta subunit 2
KCNH1 potassium channel, voltage gated eag related subfamily H, member 1
KCNH2 potassium channel, voltage gated eag related subfamily H, member 2
KCNG1 potassium voltage-gated channel modifier subfamily G member 1

Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.

myListOfGenes <- c( "KCNE1", "KCNE2", "KCNH1", "KCNH2", "KCNG1")

The gene2disease function also requires the user to specify the source database using the argument database, and optionally, the DISGENET score can also be applied to filter the results.

results <- gene2disease(
  gene     = myListOfGenes,
 database = "ALL",
 score =c(0.5, 1),
  verbose  = TRUE
)
## Your query has 1 page.
## Warning in gene2disease(gene = myListOfGenes, database = "ALL", score = c(0.5, : 
##  One or more of the genes in the list is not in DISGENET ( 'ALL' ):
##    - KCNG1
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:       KCNE1 ... KCNH2 
##  . Results:  42

In Table 6, the top 20 diseases associated to the list of genes belonging to the voltage-gated potassium channel family.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )

knitr::kable(tab[1:10,], caption = "Top GDAs for the list of genes belonging to the voltage-gated potassium channel family") 
Table 6: Top GDAs for the list of genes belonging to the voltage-gated potassium channel family
gene_symbol disease_name score yearInitial yearFinal
KCNE1 Jervell Lange Nielsen Syndrome 1.00 1993 2021
KCNH2 Arrhythmia 1.00 1975 2024
KCNH2 Long QT Syndrome 1.00 1970 2024
KCNE2 Long QT Syndrome 1.00 1999 2021
KCNH2 LONG QT SYNDROME 2 0.95 1986 2024
KCNH2 Cardiac Death, Sudden 0.90 2000 2024
KCNH2 SQT1 0.90 1999 2022
KCNE1 LONG QT SYNDROME 5 0.90 1991 2021
KCNE1 Long QT Syndrome 0.90 1975 2024
KCNE2 Atrial Fibrillation 0.85 2004 2022

Visualizing the diseases associated to multiple genes

By default, plotting a DataGeNET.DGN resulting of the query with a list of genes produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5).

plot( results,
      type = "Network",
      prop = 10, verbose = T)
The **Gene-Disease Network** for a list of genes belonging to the voltage-gated potassium channel family

Figure 5: The Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Set the argument interactive = TRUE to see an interactive network (Figure 6).

plot( results,
      type = "Network",
      prop = 10,  interactive=TRUE)

Figure 6: The interactive Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Setting the argument type to Heatmap produces a Gene-Disease Heatmap (Figure 7), where the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs. The argument nchars can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type  ="Heatmap",
      limit  = 100, nchars = 60, interactive =T, verbose = T)

Figure 7: The Gene-Disease Heatmap for a list of genes belonging to the voltage-gated potassium channel family

These results can also be visualized as a Gene-Disease Class Heatmap by setting the argument type to Heatmap and class to DiseaseClass (Figure 8). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.

plot( results, type="Heatmap",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 8: The Gene-Disease Class Heatmap for a list of genes belonging to the voltage-gated potassium channel family

Alternative, set the arguments type to Network and class to DiseaseClass to generate a Gene-Disease Class Network (Figure 9).

plot( results, type="Network",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 9: The Gene-Disease Class Network for a list of genes belonging to the voltage-gated potassium channel family

Exploring the evidences associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = myListOfGenes, 
                       database = "TEXTMINING_HUMAN", verbose  = TRUE)
## Your query has 24 pages.

To visualize the results set the argument class=Points (Figure 10).

plot(results, type="Points",   interactive=T, limit=10000)

Figure 10: The Evidences plot for a list of genes belonging to the voltage-gated potassium channel family

Exploring the Clinical trials associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = c("IL3", "IL4", "IL5", "IL6", "IL0"), 
                       database = "CLINICALTRIALS", verbose  = TRUE )
## Your query has 106 pages.
## Notice that your query has a maximum of 106 pages.
## By using the default n_pags (100), your query of 106 pages has been reduced to 100 pages.
## Warning in gene2evidence(gene = c("IL3", "IL4", "IL5", "IL6", "IL0"), database = "CLINICALTRIALS", : 
##  One or more of the genes in the list is not in DISGENET ('CLINICALTRIALS'): IL0

To visualize the results set the argument class=Points (Figure 11).

plot(results, type="Points",   interactive=T, limit=10000)

Figure 11: The Evidences plot for a list of interleukins in clinical trials

Searching by gene and chemical

You can search GDAs by chemicals by specifying a chemical identifier using the chemical filter in the gene2disease function. Table 7 shows the diseases associated to LEPR associated to metformin.

results <- gene2disease( gene = "LEPR", vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN", 
                       chemical = "C0025598" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  4
tab <- results@qresult
tab <-tab%>% dplyr::select(chemical_name, gene_symbol, disease_name,  score)
knitr::kable(tab, caption = "GDAs for LEPR and metformin") 
Table 7: GDAs for LEPR and metformin
chemical_name gene_symbol disease_name score
metformin LEPR Ovary Syndrome, Polycystic 0.40
metformin LEPR Hepatic steatosis 0.35
metformin LEPR Schizophrenias 0.20
metformin LEPR Pulmonary arterial hypertension 0.10

Retrieving the chemicals associated to a gene

For GDAs that have a chemical annotation, we can perform a query with a gene, or list of genes, to retrieve the chemicals annotated to this associations.

results <- gene2chemical( gene  = "PDGFRA", 
                        vocabulary = "HGNC",
                          database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.8-1 
##  . Term:        PDGFRA 
##  . Results:  14
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type == "PMID") %>%   dplyr::select(disease_name, chemical_name, chemical_effect,sentence, 
                           reference, pmYear)
tab <- tab %>% dplyr::rename(  Disease = disease_name, 
                             Chemical = chemical_name, `Chemical effect` =  chemical_effect,
                             Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid )  )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Selection of chemicals associated to PDGFRA" ) 
Table 8: Selection of chemicals associated to PDGFRA
Disease Chemical Chemical effect Sentence pmid Year
GIST Avapritinib therapeutic Avapritinib is the only potent and selective inhibitor approved for the treatment of D842V-mutant gastrointestinal stromal tumors (GIST), the most common primary mutation of the platelet-derived growth factor receptor α (PDGFRA). 38167404 2024
GIST Avapritinib therapeutic|therapeutic The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively. 38756640 2024
GIST imatinib therapeutic|therapeutic The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively. 38756640 2024
GIST 1-N’-[2,5-difluoro-4-[2-(1-methylpyrazol-4-yl)pyridin-4-yl]oxyphenyl]-1-N’-phenylcyclopropane-1,1-dicarboxamide therapeutic Ripretinib, a broad-spectrum inhibitor of the KIT and PDGFRA receptor tyrosine kinases, is designated as a fourth-line treatment for gastrointestinal stromal tumor (GIST). 38973363 2024
GIST sorafenib therapeutic Low Dose Sorafenib in Gastric Gastrointestinal Stromal Tumour with PDGFRA p.1843-D846 Deletion in an 88-Year-Old Male. 38576303 2024
GIST Avapritinib therapeutic Avapritinib is the only drug for adult patients with PDGFRA exon 18 mutated unresectable or metastatic gastrointestinal stromal tumor (GIST). 38803186 2024
GIST imatinib therapeutic PDGFRA mutations can explain response and sensitivity to imatinib in some GISTs lacking KIT mutations. 37890277 2023
GIST Avapritinib therapeutic|therapeutic Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation. 36155864 2023
GIST imatinib therapeutic|therapeutic Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation. 36155864 2023
GIST Avapritinib therapeutic To create an in vivo model of PDGFRA D842V-mutant gastrointestinal stromal tumor (GIST) and identify the mechanism of tumor persistence following avapritinib therapy. 36971786 2023

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=10000)

Figure 12: The Gene-Chemical Network for PDGFRA

Searching by disease

The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), ID is the identifier in the vocabulary, and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.8 to 1.

results <- disease2gene( disease  = "UMLS_C0036341",
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0036341 
##  . Results:  130

In Table 9, the top 20 genes associated to UMLS CUI C0036341.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )
knitr::kable(tab[1:10,], caption = "Top 10 genes associated to Schizophrenia") 
Table 9: Top 10 genes associated to Schizophrenia
gene_symbol disease_name score yearInitial yearFinal
COMT Schizophrenias 1 2005 2010
GABBR1 Schizophrenias 1 2007 2013
DISC1 Schizophrenias 1 2010 2011
DRD3 Schizophrenias 1 1999 1999
ZNF804A Schizophrenias 1 2008 2018
HTR2A Schizophrenias 1 2004 2008
CHRNA7 Schizophrenias 1 2011 2014
TNF Schizophrenias 1 2006 2006
GRIN2D Schizophrenias 1 2010 2010
RTN4R Schizophrenias 1 2004 2017

Visualizing the genes associated to a single disease

There are two options to visualize the results from searching a single disease: a Gene-Disease Network showing the genes related to the disease of interest (Figure 13), and a Disease-Protein Class Network with the genes grouped grouped by the the Drug Target Ontology Protein Class (Figure 14).

Figure 13 shows the default Gene-Disease Network for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.

plot ( results,
       prop = 10, interactive=TRUE)

Figure 13: The Gene-Disease Network for genes associated to Schizophrenia

Alternatively, in the Disease-Protein Class Network, genes are grouped by the the Drug Target Ontology Protein Class (Figure 14). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class argument ProteinClass. The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are G-protein coupled receptors. Notice again that not all genes have annotations to Protein classes.

plot( results,
      class="ProteinClass",
      interactive=TRUE)

Figure 14: The Protein Class-Disease Network for genes associated to Schizophrenia

The same results are obtained when querying DISGENET with the MeSH identifier for Schizophrenia (D012559).

results <- disease2gene( disease  = "MESH_D012559",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        MESH_D012559 
##  . Results:  130

The same results are obtained when querying DISGENET with the OMIM identifier for Schizophrenia (181500).

results <- disease2gene( disease  = "OMIM_181500",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        OMIM_181500 
##  . Results:  130

The same results are obtained when querying DISGENET with the ICD9-CM identifier for Schizophrenia (295).

results <- disease2gene( disease  = "ICD9CM_295",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        ICD9CM_295 
##  . Results:  130

The same results are obtained when querying DISGENET with the NCI identifier for Schizophrenia (C3362).

results <- disease2gene( disease  = "NCI_C3362", 
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        NCI_C3362 
##  . Results:  130

The same results are obtained when querying DISGENET with the DO identifier for Schizophrenia (5419).

results <- disease2gene( disease  = "HPO_HP:0100753", 
                          database = "CURATED",
                         score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        HPO_HP:0100753 
##  . Results:  130

Searching by disease and chemical

You can filter the results to find associations that are mentioned in the context of a chemical, like the example below.

results <- disease2gene( disease  = "UMLS_C0006142", chemical = "C0039286",
                          database = "ALL" , n_pags = 1 )
## Notice that your query has a maximum of 9 pages.
## By indicating n_pags = 1, your query of 9 pages has been reduced to 1 pages.
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        UMLS_C0006142 
##  . Results:  100
tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "chemical_name", "chemicalid")] )%>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to breast cancer") 
Table 10: Top GDAs associated to breast cancer
gene_symbol disease_name score chemical_name chemicalid
BRCA1 Cancer, Breast 1 tamoxifen C0039286
BRCA2 Cancer, Breast 1 tamoxifen C0039286
CDH1 Cancer, Breast 1 tamoxifen C0039286
ESR1 Cancer, Breast 1 tamoxifen C0039286
FGFR2 Cancer, Breast 1 tamoxifen C0039286
PIK3CA Cancer, Breast 1 tamoxifen C0039286
PTEN Cancer, Breast 1 tamoxifen C0039286
RAD51 Cancer, Breast 1 tamoxifen C0039286
TP53 Cancer, Breast 1 tamoxifen C0039286
CHEK2 Cancer, Breast 1 tamoxifen C0039286

Retrieving the chemicals associated to a disease

For GDAs that have a chemical annotation, we can perform a query with a disease, or list of disease, to retrieve the chemicals annotated to this associations.

results <- disease2chemical( disease = "UMLS_C0010674", 
                           database = "TEXTMINING_MODELS" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_MODELS 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0010674 
##  . Results:  19
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type =="PMID") %>% dplyr::select(gene_symbol, chemical_name,chemical_effect ,sentence, reference, pmYear) 
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                          `Chemical Effect`=chemical_effect ,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top chemicals associated to Cystic Fibrosis" ) 
Table 11: Top chemicals associated to Cystic Fibrosis
Gene Chemical Chemical Effect Sentence pmid Year
CFTR linaclotide other|other These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion. 38869953 2024
CFTR phenobarbital other|other These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion. 38869953 2024
CFTR dinoprostone other Additionally, the A140D polymorphism of GSTO1-1 was associated with lower levels of the antiinflammatory mediators PGE2 and 15(S)-HETE, and with lower values of the FEV1/FVC ratio in CF subjects with the homozygous CFTR ΔF508 mutation. 33583732 2021
CFTR lumacaftor therapeutic|therapeutic For CF patients and CF mice, we developed a HCO3- drinking test to assess the role of the cystic fibrosis transmembrane conductance regulator (CFTR) in urinary HCO3-excretion and applied it in the patients before and after treatment with the novel CFTR modulator drug, lumacaftor-ivacaftor. β-Intercalated cells express basolateral secretin receptors and apical CFTR and pendrin. 32703846 2020
CFTR ivacaftor therapeutic|therapeutic For CF patients and CF mice, we developed a HCO3- drinking test to assess the role of the cystic fibrosis transmembrane conductance regulator (CFTR) in urinary HCO3-excretion and applied it in the patients before and after treatment with the novel CFTR modulator drug, lumacaftor-ivacaftor. β-Intercalated cells express basolateral secretin receptors and apical CFTR and pendrin. 32703846 2020
CFTR lumacaftor therapeutic Activity of lumacaftor is not conserved in zebrafish Cftr bearing the major cystic fibrosis-causing mutation. 32123813 2019
CFTR lumacaftor therapeutic|therapeutic|therapeutic The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. 31300729 2019
CFTR Tezacaftor therapeutic|therapeutic|therapeutic The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. 31300729 2019
CFTR ivacaftor therapeutic|therapeutic|therapeutic The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. 31300729 2019
TNF digitoxin therapeutic The cardiac glycoside digitoxin, which has been shown to inhibit TNFα/NFκB signaling in CF lung epithelial cells, may serve as such a therapy. 31864360 2019

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 15: The Disease-Chemical Network associated to Cystic Fibrosis

Exploring the attributes of a disease

The disease2attribute function allows to retrieve the information for a specific disease

results <- disease2attribute( disease  = "UMLS_C0036341"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0036341 
##  . Results:  12

The results (Table 12) show the mappings to different disease vocabularies, and the disease type.

tab <- unique(results@qresult )
knitr::kable(tab[1:10,], caption = "Disease attributes for Schizophrenia") 
Table 12: Disease attributes for Schizophrenia
vocabulary code disease_name type diseaseClasses_UMLS_ST diseaseClasses_HPO diseaseClasses_DO diseaseClasses_MSH
MSH D012559 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
ICD10 F20 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
ICD10 F20.9 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
OMIM 181500 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
ICD9CM 295.90 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
HPO HP:0100753 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
NCI C3362 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
ICD9CM 295.9 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
ICD9CM 295 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)
DO 5419 Schizophrenias disease Mental or Behavioral Dysfunction (T048) Abnormality of the nervous system (00707) disease of mental health (150) Mental Disorders (F03)

Retrieving the UMLS CUIs via other vocabularies

It is possible to obtain the CUIs that map to an identifier of interest (example, ICD9CM, MSH, or OMIM) using the the get_umls_from_vocabulary function.

results <- get_umls_from_vocabulary(
            disease  = "MSH_D012559",  vocabulary = "MSH" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        MSH_D012559 
##  . Results:  2

The results are shown in Table 13.

tab <-results@qresult
knitr::kable(tab, caption = "Retrieving the UMLS CUI from MeSH", row.names=F) 
Table 13: Retrieving the UMLS CUI from MeSH
VOCABULARIES code disease_name
MSH D012559 Schizophrenias
UMLS C0036341 Schizophrenias

Finding the CUI associated to the name of a disease of interest

It is possible to obtain the CUIS that correspond to a disease(s) of interest using the the get_umls_from_vocabulary function. For that, we should specify the parameter vocabulary = "NAME". Use the the parameter limit to change the number of CUIs that are retrieved.

results <- get_umls_from_vocabulary(
  disease  = "long QT",  vocabulary = "NAME" ,  limit =10)
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        long QT 
##  . Results:  10

The results are shown in Table 14.

tab <-results@qresult
knitr::kable(tab, caption = "List of CUIs that map to long QT", row.names = F) 
Table 14: List of CUIs that map to long QT
VOCABULARIES code disease_name
UMLS C1141890 Inherited long QT syndrome
UMLS C0023976 Long QT Syndrome
UMLS C2678485 LQT9
UMLS C1832916 TIMOTHY SYNDROME
UMLS C2732979 Aquired long QT syndrome (disorder)
UMLS C1867904 LONG QT SYNDROME 5
UMLS C1859062 LONG QT SYNDROME 3
UMLS C0151878 Prolonged QT interval on EKG
UMLS C1833154 LQT4
UMLS C5687394 Long QT syndrome type 6

Exploring the evidences associated to a disease

To explore the evidences supporting the associations for Schizophrenia use the function disease2evidence.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           type = "GDA",
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0036341 
##  . Results:  369

A selection of evidences is shown in Table 15.

tab <- results@qresult
tab <-tab[tab$reference_type == "PMID" & tab$pmYear > 2013 & tab$source =="PSYGENET", ] 
tab <- tab[ order(-tab$pmYear), c("gene_symbol","source", "associationType", "sentence", "reference", "pmYear")][1:5,]
tab <- tab %>% dplyr::rename(Gene = gene_symbol,  Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association for Schizophrenia" )  
Table 15: Evidences supporting the association for Schizophrenia
Gene source associationType Sentence pmid Year
GRIN2A PSYGENET Biomarker GRIN2A (GT)21 may play a significant role in the etiology of schizophrenia among the Chinese Han population of Shaanxi. 25958346 2015
NOTCH4 PSYGENET Biomarker Our data indicate that NOTCH4 polymorphism can influence clinical symptoms in Slovenian patients with schizophrenia. 25529856 2015
PPARA PSYGENET Biomarker We report significant increases in PPAR?, SREBP1, IL-6 and TNF?, and decreases in PPAR? and C/EPB? and mRNA levels from patients with schizophrenia, with additional BMI interactions, characterizing dysregulation of genes relating to metabolic-inflammation in schizophrenia. 25433960 2015
MAGI2 PSYGENET Biomarker One of the rare CNVs found in SZ cohorts is the duplication of Synaptic Scaffolding Molecule (S-SCAM, also called MAGI-2), which encodes a postsynaptic scaffolding protein controlling synaptic AMPA receptor levels, and thus the strength of excitatory synaptic transmission. 25653350 2015
NCAM1 PSYGENET Biomarker A growing body of evidence links aberrant levels of NCAM and polySia as well as variation in the ST8SIA2 gene to neuropsychiatric disorders, including schizophrenia. 24057454 2015

Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           gene = c("DRD2", "DRD3"),
                           type = "GDA",
                          database = "ALL",
                          score    = c( 0.5,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0036341 
##  . Results:  565

The more recent papers are shown in the Table 16.

tab <- results@qresult
tab <-  tab %>%
    filter(reference_type == "PMID") %>%
    select(gene_symbol, associationType, reference, sentence, pmYear) %>% arrange(desc(pmYear)) %>% head(10)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between C0036341 & DRD2,DRD3" )  
Table 16: Evidences supporting the association between C0036341 & DRD2,DRD3
Gene associationType pmid Sentence Year
DRD2 CausalOrContributing 37422511 We focus on schizophrenia and the dopamine D2 receptor (DRD2), hot flashes and the neurokinin B receptor (TACR3), cigarette smoking and receptors bound by nicotine (CHRNA5, CHRNA3, CHRNB4), and alcohol use and enzymes that help to break down alcohol (ADH1B, ADH1C, ADH7). 2024
DRD3 PostTranslationalModification 38648100 Schizophrenia subjects exhibited thousands of neuronal and non-neuronal epigenetic differences at regions that included several susceptibility genetic loci, such as NRG1, DISC1, and DRD3. 2024
DRD2 GeneticVariation 38421437 Our significant polymorphism findings, mainly those in DRD2 (rs1800497, rs1799978, and rs2734841), HTR2C (rs3813929), and HTR2A (rs6311), were largely consistent with earlier findings (predictors of RIS effectiveness in adult schizophrenia patients), confirming their validity for identifying ASD children with a greater likelihood of core symptom improvement compared to noncarriers/wild types. 2024
DRD2 GeneticVariation 38598465 Adult patients with schizophrenia will be randomized (2: 1) to receive PGx-assisted treatment (drug and regimen selection depending on the results of single-nucleotide polymorphisms in genes DRD2, HTR1A, HTR2C, ABCB1, CYP2D6, CYP3A5, and CYP1A2) or the standard of care. 2024
DRD2 CausalOrContributing 39098130 Clinically, DRD2 inhibitors demonstrate efficacy in managing positive symptoms of schizophrenia, manic episodes in bipolar disorder, and dopaminergic imbalance in Parkinson’s disease. 2024
DRD2 GeneticVariation 38810489 Six loci including neurexin-1(NRXN1) (rs1045881), dopamine D1 receptor (DRD1) (rs686, rs4532), chitinase-3-like protein 1 (CHI3L1) (rs4950928), velocardiofacial syndrome (ARVCF) (rs165815), dopamine D2 receptor (DRD2) (rs1076560) were identified higher expression with significant difference in individuals converted into schizophrenia after two years. 2024
DRD2 CausalOrContributing 39036710 TAAR1 agonists may be less efficacious than dopamine D 2 receptor antagonists already licensed for schizophrenia. 2024
DRD2 CausalOrContributing 38114631 The Drd2 gene, encoding the dopamine D2 receptor (D2R), was recently indicated as a potential target in the etiology of lowered sociability (i.e., social withdrawal), a symptom of several neuropsychiatric disorders such as Schizophrenia and Major Depression. 2024
DRD2 CausalOrContributing 39127265 According to the well-documented dysregulation of endocannabinoid and dopaminergic system genes in schizophrenia, we investigated DNA methylation cannabinoid type 1 receptor (CNR1) and dopamine D2 receptor (DRD2) genes in saliva samples from psychotic subjects using pyrosequencing. 2024
DRD2 GeneticVariation 34524581 The study explored whether schizophrenia risk alleles of the DRD2 rs2514218 and ZNF804A rs1344706 polymorphisms also influenced the risk and severity of childhood-onset schizophrenia (COS) and differentiated it from autism spectrum disorders (ASD). 2023

Searching multiple diseases

The disease2gene function also accepts as input a list of diseases (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 17 shows the UMLS CUIs and the corresponding disease names.

Table 17: Disease list selected for illustrating the disease2gene multiple search
UMLS_CUI Disease_Name
C0036341 Schizophrenia
C0036341 Alzheimer’s Disease
C0030567 Parkinson Disease
C0005586 Bipolar Disorder

Creating the vector with the list of diseases.

diseasesOfInterest <- paste0("UMLS_",c("C0036341", "C0002395", "C0030567","C0005586"))

In the example, we will search in CURATED data, using a score range of 0.8-1.

results <- disease2gene(
  disease = diseasesOfInterest,
  database = "CURATED",
  score =c(0.8,1),
  verbose  = TRUE )
## Your query has 3 pages.

In table 18, the top 20 genes associated to the list of diseases.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score),(yearInitial))
knitr::kable(tab[1:10,], caption = "Top Genes associated to a list of diseases") 
Table 18: Top Genes associated to a list of diseases
gene_symbol disease_name score yearInitial yearFinal
GBA1 Parkinson Disease 1 1987 2021
SNCA Parkinson Disease 1 1989 2021
APP Alzheimer Disease 1 1989 2023
PSEN1 Alzheimer Disease 1 1993 2020
PSEN2 Alzheimer Disease 1 1993 2020
GRN Alzheimer Disease 1 1993 2020
LRRK2 Parkinson Disease 1 1993 2021
MAPT Alzheimer Disease 1 1993 2020
APOE Alzheimer Disease 1 1993 2020
PRKN Parkinson Disease 1 1998 2022

Visualizing the genes associated to multiple diseases

The default plot of the results of querying DISGENET with a list of diseases produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 16).

plot( results,
      type = "Network",
      prop = 10, interactive=T)

Figure 16: The Gene-Disease Network associated to a list of diseases

To visualize the results as a Gene-Disease Heatmap (Figure 17) change the argument class to “Heatmap”. In this plot, the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type="Heatmap",
      limit =20,
      cutoff=0.2, interactive=TRUE)
## [1] "Dataframe of 290 rows has been reduced to 20 rows."

Figure 17: The Gene-Disease Heatmap for genes associated to a list of diseases

A third visualization option is a Protein Class-Disease Heatmap (Figure 18), in which genes are grouped by protein class. This plot is obtained by setting the class argument to “ProteinClass”. In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.

plot( results,
      class="ProteinClass", type = "Heatmap", interactive=TRUE)

Figure 18: The Protein Class-Disease Heatmap for genes associated to a list of diseases

A Protein Class-Disease Network visualization is also possible (Figure 19).

plot( results,
      class="ProteinClass", type = "Network", interactive=TRUE)

Figure 19: The Protein Class-Disease Network for genes associated to a list of diseases

To explore the evidences supporting the associations, use the function disease2evidence.

results <- disease2evidence( disease  = diseasesOfInterest,
                           type = "GDA",
                           score=c(0.5,1),
                          database = "CURATED" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:       UMLS_C0036341 ... UMLS_C0005586 
##  . Results:  3404

To visualize the results use the argument Points (Figure 20).

plot( results,  
      type = "Points", limit=10000 )
The **Evidences plot** for a list of diseases

Figure 20: The Evidences plot for a list of diseases

Searching by disease and chemical

The disease2gene function can also be used to retrieve genes mentioned in the context of a specific disease and chemical (Table 19)

results <- disease2gene( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN",
                          chemical = "C0023570")
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        UMLS_C0030567 
##  . Results:  105
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Parkinson and levodopa") 
Table 19: Top GDAs associated to Parkinson and levodopa
gene_symbol disease_name chemical_name score
BDNF Parkinson Disease levodopa 1.00
GBA1 Parkinson Disease levodopa 1.00
GDNF Parkinson Disease levodopa 1.00
MAOB Parkinson Disease levodopa 1.00
PRKN Parkinson Disease levodopa 1.00
SNCA Parkinson Disease levodopa 1.00
PARK7 Parkinson Disease levodopa 1.00
PINK1 Parkinson Disease levodopa 1.00
LRRK2 Parkinson Disease levodopa 1.00
DDC Parkinson Disease levodopa 0.95

To visualize the results use the function plot (Figure 20)

plot( results, interactive= T )

Figure 21: The Gene Disease Chemical Network for a disease and a drug

Retrieving the chemicals associated to a disease

To retrieve the chemicals mentioned in the GDAs involving a specific disease, we can use the disease2chemical function.

results <- disease2chemical( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN" , score = c(0.5,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0030567 
##  . Results:  174
tab <- results@qresult
tab <-tab%>% dplyr::filter(reference_type == "PMID")  %>% dplyr::select(gene_symbol, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                    `Chemical Effect` = chemical_effect,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid))) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top Chemicals associated to Parkinson" ) 
Table 20: Top Chemicals associated to Parkinson
Gene Chemical Chemical Effect Sentence pmid Year
MAPT luteinizing hormone other|other|other Main motor and non-motor scores, blood levels of estradiol, testosterone, follicle-stimulating hormone, and luteinizing hormone, CSF levels of total α-synuclein, amyloid-β-42, amyloid-β-40, total tau, and phosphorylated-181-tau were examined in 45 women with postmenopausal-onset PD and 40 age-matched controls. 38492015 2024
MAPT testosterone other|other|other Main motor and non-motor scores, blood levels of estradiol, testosterone, follicle-stimulating hormone, and luteinizing hormone, CSF levels of total α-synuclein, amyloid-β-42, amyloid-β-40, total tau, and phosphorylated-181-tau were examined in 45 women with postmenopausal-onset PD and 40 age-matched controls. 38492015 2024
MAPT estradiol other|other|other Main motor and non-motor scores, blood levels of estradiol, testosterone, follicle-stimulating hormone, and luteinizing hormone, CSF levels of total α-synuclein, amyloid-β-42, amyloid-β-40, total tau, and phosphorylated-181-tau were examined in 45 women with postmenopausal-onset PD and 40 age-matched controls. 38492015 2024
SNCA dopamine therapeutic In the context of Parkinson’s disease (PD), recent advancements have been made in the development of Midbrain organoids (MBOs) models that consider key pathophysiological mechanisms such as alpha-synuclein (α-Syn), Lewy bodies, dopamine loss, and microglia activation. 38580194 2024
PINK1 kinetin other Nucleotide analogs such as kinetin triphosphate (KTP) were reported to enhance PINK1 activity and may represent a therapeutic strategy for the treatment of Parkinson’s disease. 38241364 2024
SNCA dopamine therapeutic Aggregation of α-synuclein causes disruptions in cellular processes in Parkinson’s disease (PD), leading to loss of dopamine-producing neurons and motor symptoms. 38435303 2024
SNCA dopamine therapeutic Dopamine loss and alpha-synuclein accumulation, two hallmarks of Parkinson’s disease (PD) pathology, contribute to synaptic dysfunction and reduced synaptic density in PD. 37814917 2024
LRRK2 levodopa therapeutic This case illustrates that levodopa-responsive clinical PD caused by G2019S LRRK2 mutations can occur without Lewy bodies. 38757351 2024
SNCA dopamine therapeutic Although evidence indicates that the abnormal accumulation of α-synuclein (α-syn) in dopamine neurons of the substantia nigra is the main pathological feature of Parkinson’s disease (PD), no compounds that have both α-syn antiaggregation and α-syn degradation functions have been successful in treating the disease in the clinic. 38696266 2024
SNCA Ganglioside GM1 other Research on GM1 ganglioside and its neuroprotective role in Parkinson’s disease (PD), particularly in mitigating the aggregation of α-Synuclein (aSyn), is well established across various model organisms. 38542297 2024

To visualize the results use the function plot

plot( results )
The **Evidences plot** for a list of diseases

Figure 22: The Evidences plot for a list of diseases

Retrieving Variant-Disease Associations from DISGENET

Searching by variant

The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN, with Type = "variant-disease".

results <- variant2disease( variant= "rs113488022",
                         database = "CURATED", score = c(0.7,1)) 
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0.7-1 
##  . Term:        rs113488022 
##  . Results:  16

The results are shown in Table 21.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Top diseases associated to variant rs113488022") 
Table 21: Top diseases associated to variant rs113488022
variantid disease_name score yearInitial yearFinal
rs113488022 CRC 0.9 1993 2024
rs113488022 Melanoma 0.9 2002 2021
rs113488022 Malignant melanoma of skin 0.9 2016 2021
rs113488022 CARCINOMA OF COLON 0.9 2002 2020
rs113488022 Carcinoma, Non Small Cell Lung 0.9 2002 2019
rs113488022 CARCINOMA OF LUNG 0.9 2002 2019
rs113488022 Lung adenocarcinoma 0.9 2013 2018
rs113488022 Papillary Thyroid Carcinoma 0.9 2002 2018
rs113488022 Glioblastoma 0.9 2016 2016
rs113488022 Brain Neoplasms 0.9 2011 2016

Visualizing the diseases associated to a single variant

The disgenet2r package offers several options to visualize the results of querying DISGENET for a single variant: a Variant-Disease Network (Figure 23) showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network, Figure 24). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Variant-Disease Network on a DataGeNET.DGN object (Figure 23). In the Variant-Disease Network the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.

plot( results, 
      type = "Network", interactive=T,
      prop  = 10)

Figure 23: The Variant-Disease Network for the variant rs113488022

plot(results, class="DiseaseClass" , interactive=T)

Figure 24: The Variant-Disease Class Network for the variant rs113488022

Exploring the evidences associated to a variant

You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease.

results <- variant2evidence( variant = "rs10795668",
                disease ="UMLS_C0009402",
                       database = "ALL",
                       score =c(0,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs10795668 
##  . Results:  23

The results are shown in table 22.

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>% select(associationType, reference, pmYear, sentence) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid=reference) %>% dplyr::arrange(desc(Year))
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting the association between C0009402 & rs10795668")  
Table 22: Evidences supporting the association between C0009402 & rs10795668
associationType pmid Year Sentence
GeneticVariation 26101521 2015 We focused on 6 SNPs (rs380284, rs4464148, rs4779584, rs4939827, rs6983267, and rs10795668), already described as risk markers, and tested their possible independent and combined contribution to CRC predisposition.
GeneticVariation 23712746 2013 In conclusion, CRC susceptibility variants rs9929218 and rs10795668 may exert some influence in modulating patient’s survival and they deserve to be further tested in additional CRC cohorts in order to confirm their potential as prognosis or predictive biomarkers.
GeneticVariation 23717594 2013 Results from our case-control study and the followed meta-analysis confirmed the significant association of rs10795668 with CRC risk.
GeneticVariation 24066093 2013 We genotyped four variants previously associated with CRC: rs10795668, rs16892766, rs3802842 and rs4939827.
GeneticVariation 22363440 2012 We observed an association between the low colorectal cancer risk allele (A) for rs10795668 at 10p14 and increased expression of ATP5C1 (q = 0.024) and between the colorectal cancer high risk allele (C) for rs4444235 at 14q22.2 and increased expression of DLGAP5 (q = 0.041), both in tumor samples.
GeneticVariation 22235025 2012 Risk allele carriers for rs3802842 [Odds ratio (OR) = 1.5, 95% confidence interval (CI) 1.1-2.05, P = 0.0096, dominant model) and rs4779584 (OR = 1.39, 95% CI 1.02-1.9, P = 0.0396, dominant model) were more frequent in the CRC<50 group, whereas homozygotes for rs10795668 risk allele were also more frequent in the early-onset CRC (P = 0.02, codominant model).
GeneticVariation 23359760 2012 However, no associations with CRC risk were detected for six other loci (rs9929218, rs10411210, rs12701937, rs7014346, rs6983267, and rs10795668), and one SNP, rs16892766, was not polymorphic in any of the study participants.
GeneticVariation 22045029 2012 Recent genome-wide association studies have identified single-nucleotide polymorphisms at 16 genetic loci associated with colorectal cancer risk: rs6691170 (1q41), rs10936599 (3q26.2), rs16892766 (8q23.3), rs6983267 (8q24.21), rs10795668 (10p14), rs3802842 (11q23.1), rs11169552 (12q13.13), rs4444235, rs1957636 (14q22.2), rs4779584 (15q13.3), rs9929218 (16q22.1), rs4939827 (18q21.1), rs10411210 (19q13.11), rs961253 and rs4813802 (20p12.3) and rs4925386 (20q13.33).
GeneticVariation 21402474 2011 Our data suggested that rs10795668, a CRC susceptibility variant identified by GWA studies, might be used as a biomarker to identify CRC patients with high risk of recurrence after chemotherapy.
GeneticVariation 21071539 2011 We studied the generalizability of the associations with 11 risk variants for CRC on 8q23 (rs16892766), 8q24 (rs6983267), 9p24 (rs719725), 10p14 (rs10795668), 11q23 (rs3802842), 14q22 (rs4444235), 15q13 (rs4779584), 16q22 (rs9929218), 18q21 (rs4939827), 19q13 (rs10411210), and 20p12 (rs961253) in a multiethnic sample of 2,472 CRC cases, 839 adenoma cases and 4,466 controls comprised of European American, African American, Native Hawaiian, Japanese American, and Latino men and women.

The results can be visualized using the plot function with the argument Points. This will show the number of publications per year associated to this variant. It is important to set the parameter limit to 10,000 in order to include all the results in the plot.

results <- variant2evidence( variant = "rs1800629",
                       database = "ALL",
                       score =c(0,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs1800629 
##  . Results:  1877
plot( results,  
      type = "Points", limit=10000 )
The **Evidence plot** for the variant rs1800629

Figure 25: The Evidence plot for the variant rs1800629

Exploring the information associated to a variant

The variant2attribute function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN with attributes of the variant(s) such as the allelic frequency according to GNOMAD data, the most severe consequence type from the Variant Effect Predictor and the DPI, and DSI.

results <- variant2attribute( variant= "rs113488022")

results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant 
##  . Database:     ALL 
##  . Score:         
##  . Term:        rs113488022

The results are shown in table 23.

tab <- unique(results@qresult )
tab <- tab %>% dplyr::select(-threeletterID,-source, -var_gene_symbol)
knitr::kable(tab, caption = "Attributes for variant rs113488022") 
Table 23: Attributes for variant rs113488022
variantid ref alt polyphen_score sift_score chromosome coord mostSevereConsequences geneid geneEnsemblID gene_symbol dbsnpclass variantDSI variantDPI exome
rs113488022 A C 0.958 0 7 140753336 missense_variant 673 ENSG00000157764 BRAF snv 0.33 0.045
rs113488022 A G 0.958 0 7 140753336 missense_variant 673 ENSG00000157764 BRAF snv 0.33 0.045
rs113488022 A T 0.958 0 7 140753336 missense_variant 673 ENSG00000157764 BRAF snv 0.33 0.045 1.4e-06

Searching multiple variants

The variant2disease function retrieves the information in DISGENET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database. By default, variant2disease function uses as source database CURATED.

results <- variant2disease(
         variant  = c("rs121913013", "rs1060500621",
              "rs199472709", "rs72552293",
              "rs74315445", "rs199472795"),
         database = "ALL")
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        variant-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:       rs121913013 ... rs199472795 
##  . Results:  21

In table 24, the top 20 diseases associated to the list of variants.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )%>% dplyr::arrange(desc(score), desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to the list of variants")  
Table 24: Top diseases associated to the list of variants
variantid disease_name score yearInitial yearFinal
rs74315445 LONG QT SYNDROME 5 0.8 1993 2023
rs199472709 Romano Ward Syndrome 0.7 1993 2022
rs199472795 Romano Ward Syndrome 0.7 1993 2022
rs72552293 BRUGADA SYNDROME 2 0.7 1993 2007
rs74315445 JLNS2 0.7 1993 1998
rs199472709 Beckwith Wiedemann Syndrome 0.6 1993 2020
rs199472795 Beckwith Wiedemann Syndrome 0.6 1993 2020
rs1060500621 Long QT Syndrome 0.6 1999 2016
rs74315445 [D]Sudden death, cause unknown (context-dependent category) 0.6 1997 2015
rs74315445 Brugada Syndrome 0.6 1993 2015

Visualizing the diseases associated to multiple variants

The results of querying DISGENET with a list of variants can be visualized as a Variant-Disease Network (Figure 26), as a Variant-Gene-Disease Network (Figure 27), as Variant-Disease Heatmap (Figure 28), as Variant-Disease Class Network (Figure 29) and as a Variant-Disease Class Heatmap (Figure 30).

plot( results,
      type = "Network", interactive=T)

Figure 26: The Variant-Disease Network for a list of variants

To obtain the Variant-Gene-Disease Network (Figure 27), change the showGenes argument to “TRUE”.

plot( results,
      type = "Network", 
      showGenes= T,
      interactive=T)

Figure 27: The Variant-Gene-Disease Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network by changing the type argument to Heatmap (Figure 28).

plot( results,
      type = "Heatmap",
      prop = 10, interactive = TRUE, nchar=50)

Figure 28: The Variant-Disease Heatmap for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network by changing the class argument to DiseaseClass (Figure 29).

plot( results,
      class = "DiseaseClass", interactive=T)

Figure 29: The Variant-Disease Class Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Heatmap by changing the type argument to Heatmap (Figure 30).

plot( results,  type = "Heatmap",
      class = "DiseaseClass", interactive=T)

Figure 30: The Variant-Disease Class Heatmap for a list of variants

Searching by disease

The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  152

In Table 25, the variants associated to Timothy syndrome according to ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )

knitr::kable(tab[1:10,], caption = " Variants associated to Timothy syndrome according to ClinVar") 
Table 25: Variants associated to Timothy syndrome according to ClinVar
variantid disease_name score yearInitial yearFinal
rs786205745 TIMOTHY SYNDROME 0.8 1993 2004
rs79891110 TIMOTHY SYNDROME 0.8 1993 2018
rs786205753 TIMOTHY SYNDROME 0.8 1993 2019
rs797044881 TIMOTHY SYNDROME 0.7 1993 2015
rs374528680 TIMOTHY SYNDROME 0.7 1993 2015
rs80315385 TIMOTHY SYNDROME 0.7 1993 2015
rs549476254 TIMOTHY SYNDROME 0.7 1993 2019
rs786205748 TIMOTHY SYNDROME 0.7 1993 2020
rs587782933 TIMOTHY SYNDROME 0.7 1993 1993
rs1178438128 TIMOTHY SYNDROME 0.6 1993 1993

The results can be further restricted to keep variants predicted to be deleterious by SIFT and PolyPhen scores, by passing ranges of these scores to the function, using sift and polyphen arguments, like in the example below. Remember that genetic variants with SIFT scores smaller than 0.05 are predicted to be deleterious, while values of PolyPhen greater than 0.908 are classified as Probably Damaging.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR", sift = c(0,0.05), polyphen = c(0.9,1) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  84

In Table 26, the deleterious variants associated to Timothy syndrome repored in ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "polyphen_score", "sift_score", "yearInitial", "yearFinal")] )

knitr::kable(tab[1:10,], caption = "Deleterious variants associated to Timothy syndrome according to ClinVar") 
Table 26: Deleterious variants associated to Timothy syndrome according to ClinVar
variantid disease_name score polyphen_score sift_score yearInitial yearFinal
rs786205745 TIMOTHY SYNDROME 0.8 1.000 0.01 1993 2004
rs79891110 TIMOTHY SYNDROME 0.8 1.000 0.00 1993 2018
rs786205753 TIMOTHY SYNDROME 0.8 0.999 0.00 1993 2019
rs797044881 TIMOTHY SYNDROME 0.7 1.000 0.00 1993 2015
rs80315385 TIMOTHY SYNDROME 0.7 1.000 0.00 1993 2015
rs549476254 TIMOTHY SYNDROME 0.7 0.999 0.00 1993 2019
rs786205748 TIMOTHY SYNDROME 0.7 1.000 0.00 1993 2020
rs587782933 TIMOTHY SYNDROME 0.7 1.000 0.00 1993 1993
rs1243482248 TIMOTHY SYNDROME 0.6 0.994 0.00 1993 1993
rs1467561684 TIMOTHY SYNDROME 0.6 1.000 0.00 1993 1993

Visualizing the variants associated to a single disease

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network (Figure 31).

plot( results,     
      type = "Network", interactive=T)

Figure 31: The Variant-Disease Network for a single disease

The Variant-Disease Network can be displayed as a Variant-Disease-Gene Network, by setting the showGenes parameter to TRUE (Figure 32).

plot( results, 
      type = "Network",
      showGenes = T)
The **Variant-Gene-Disease Network** for a single disease

Figure 32: The Variant-Gene-Disease Network for a single disease

Explore the evidences associated to a single disease

To explore the evidences supporting the VDAs for Timothy syndrome, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Timothy syndrome.

results <- disease2evidence( disease  = "UMLS_C1832916",
                           type = "VDA",
                          database = "ALL",
                          score    = c( 0.5,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:        UMLS_C1832916 
##  . Results:  236
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>%
    select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear)) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting associations") 
Table 27: Evidences supporting associations
pmid associationType Year Sentence
39079396 GeneticVariation 2024 In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
39079396 GeneticVariation 2024 In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
38968219 GeneticVariation 2024 Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
38826393 GeneticVariation 2024 Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
38826393 GeneticVariation 2024 Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
38968219 GeneticVariation 2024 Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
37271119 GeneticVariation 2023 Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS.
36162529 GeneticVariation 2022 The CaV1.2 G406R mutation decreases synaptic inhibition and alters L-type Ca2+ channel-dependent LTP at hippocampal synapses in a mouse model of Timothy Syndrome.
36347939 GeneticVariation 2022 A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.
36347939 GeneticVariation 2022 A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.

If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene.

results <- disease2evidence( disease  = "UMLS_C1832916",
                   gene = "775", vocabulary = "ENTREZ",
                   type = "VDA",  database = "TEXTMINING_HUMAN",
                   score    = c( 0.7,1 ) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.7-1 
##  . Term:        UMLS_C1832916 
##  . Results:  24
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID")%>%
    select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))%>% head(10)

results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Selection of evidences supporting associations between C0036341 & CACNA1C") 
Table 28: Selection of evidences supporting associations between C0036341 & CACNA1C
pmid associationType Year Sentence
39079396 GeneticVariation 2024 In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
39079396 GeneticVariation 2024 In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
38968219 GeneticVariation 2024 Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
38826393 GeneticVariation 2024 Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
38826393 GeneticVariation 2024 Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
38968219 GeneticVariation 2024 Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
37271119 GeneticVariation 2023 Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS.
36162529 GeneticVariation 2022 The CaV1.2 G406R mutation decreases synaptic inhibition and alters L-type Ca2+ channel-dependent LTP at hippocampal synapses in a mouse model of Timothy Syndrome.
36347939 GeneticVariation 2022 A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.
36347939 GeneticVariation 2022 A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.

Searching multiple diseases

results <- disease2variant(
              disease = paste0("UMLS_",c("C3150943",  "C1859062", "C1832916", "C4015695")),
              database = "CURATED", 
              score = c(0.7, 1) )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-variant 
##  . Database:     CURATED 
##  . Score:        0.7-1 
##  . Term:       UMLS_C3150943 ... UMLS_C4015695 
##  . Results:  154

Table 29 shows the variants associated to a list of Long QT syndromes in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )

knitr::kable(tab[1:10,], caption = "Variants associated to a list of Long QT syndromes") 
Table 29: Variants associated to a list of Long QT syndromes
variantid disease_name score yearInitial yearFinal
rs137854600 LONG QT SYNDROME 3 0.9 1993 2022
rs786205745 TIMOTHY SYNDROME 0.8 1993 2004
rs79891110 TIMOTHY SYNDROME 0.8 1993 2018
rs786205753 TIMOTHY SYNDROME 0.8 1993 2019
rs199473317 LONG QT SYNDROME 3 0.8
rs199472916 LONG QT SYNDROME 2 0.8
rs199473428 LONG QT SYNDROME 2 0.8 1993 2022
rs199472961 LONG QT SYNDROME 2 0.8 1993 2022
rs9333649 LONG QT SYNDROME 2 0.8 1993 2022
rs137854601 LONG QT SYNDROME 3 0.8 1993 2022

Visualizing the variants associated to multiple diseases

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 33), by changing the class argument from “Network” to “Heatmap”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 33: The Variant-Disease Network for a list of diseases

The results can be visualized as a Heatmap (Figure 34).

plot( results,
      type = "Heatmap",
      interactive=T)

Figure 34: The Variant-Disease Heatmap for a list of diseases

Searching by gene

results <- gene2vda(
              gene = "APP",
              database = "CURATED" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        APP 
##  . Results:  17

Table 30 shows the top variants associated to the APP gene in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "gene_symbols", "disease_name","score", "yearInitial", "yearFinal")] )

knitr::kable(tab[1:10,], caption = "Top variants associated to APP") 
Table 30: Top variants associated to APP
variantid gene_symbols disease_name score yearInitial yearFinal
rs63750264 APP Alzheimer Disease 0.9 1991 2020
rs63750066 APP Alzheimer Disease 0.8 1992 2020
rs63750734 APP Alzheimer Disease 0.8 1993 2020
rs63750579 APP Alzheimer Disease 0.8 1990 2020
rs193922916 APP Alzheimer Disease 0.8 1993 2020
rs63750579 APP CEREBRAL AMYLOID ANGIOPATHY, APP-RELATED 0.7 1990 2019
rs63750066 APP AD1 0.7 1993 2020
rs63750643 APP AD1 0.7 1993 2020
rs63750973 APP AD1 0.7 1993 2020
rs63749964 APP AD1 0.7 1991 2020

Visualizing the variant-disease associations retrieved for a gene

The results of querying DISGENET variant information with a gene can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 35), if the input is a list of genes, by changing the class argument from Network to Heatmap. The genes can be shown by setting the showGenes argument to “TRUE”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 35: The Variant-Disease Network for a gene

Searching by variant and chemical

results <- variant2disease( variant   = "rs121434568",
                          database = "TEXTMINING_HUMAN",
                          chemical = "C2987648")
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        rs121434568 
##  . Results:  13

Table 31 shows the VDAs associated to rs121434568 and afatinib.

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score)

knitr::kable(tab[1:10,], caption = "VDAs associated to rs121434568 and afatinib") 
Table 31: VDAs associated to rs121434568 and afatinib
variantid disease_name chemical_name score
rs121434568 Lung adenocarcinoma afatinib 0.9
rs121434568 CARCINOMA OF LUNG afatinib 0.9
rs121434568 Carcinoma, Non Small Cell Lung afatinib 0.9
rs121434568 Advanced Lung Adenocarcinoma afatinib 0.3
rs121434568 Lung Neoplasm afatinib 0.3
rs121434568 Cancer, Lung afatinib 0.3
rs121434568 Metastatic Neoplasm to the Brain afatinib 0.3
rs121434568 Metastatic Lung Adenocarcinoma afatinib 0.2
rs121434568 Adenocarcinoma of lung, stage IV afatinib 0.2
rs121434568 Metastatic Neoplasm to the Leptomeninges afatinib 0.2

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 36: VDAs associated to rs121434568 and afatinib

Retrieving the chemicals associated to a variant

The variant2chemical function allows to retrieve the chemicals associated to a variant

results <- variant2chemical( variant =  "rs1801133",
                          database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.8-1 
##  . Term:        rs1801133 
##  . Results:  5
tab <- results@qresult
tab <-tab%>% dplyr::select( disease_name, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                        `Chemical Effect`=chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Chemicals associated to rs1801133" ) 
Table 32: Chemicals associated to rs1801133
Disease Chemical Chemical Effect Sentence pmid Year
Multiple Sclerosis vitamin B12 other|other|other The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. 38648773 2024
Multiple Sclerosis pyridoxine other|other|other The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. 38648773 2024
Multiple Sclerosis vitamin B6 other|other|other The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. 38648773 2024
Schizophrenias risperidone therapeutic C677T Polymorphism in the MTHFR Gene Is Associated With Risperidone-Induced Weight Gain in Schizophrenia. 32714219 2020
Schizophrenias dopamine other A second polymorphism, methylenetetrahydrofolate reductase (MTHFR) 677C –> T (rs1801133), has been associated with overall schizophrenia risk and executive function impairment in patients, and may influence dopamine signaling through mechanisms upstream of COMT effects. 18988738 2008

To visualize the results use the plot function.

plot(results, 
     type="Network",   
     interactive=T, limit=50)

Figure 37: Chemicals associated to rs1801133

Retrieving associations involving Chemicals from DISGENET

Retrieving genes, variants, and diseases associated to chemicals

The chemical2gene function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gene( chemical  = "C0023570" , database = "ALL" , n_pags = 5)
## Notice that your query has a maximum of 17 pages.
## By indicating n_pags = 5, your query of 17 pages has been reduced to 5 pages.
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gene 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        C0023570 
##  . Results:  93
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol,gene_type , chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Genes associated to levodopa") 
Table 33: Genes associated to levodopa
gene_symbol gene_type chemical_name pmids_chemical
COMT protein-coding levodopa 45
DDC protein-coding levodopa 31
GCH1 protein-coding levodopa 20
SLC6A3 protein-coding levodopa 20
GH1 protein-coding levodopa 18
MAOB protein-coding levodopa 18
DRD2 protein-coding levodopa 17
PRKN protein-coding levodopa 15
TH protein-coding levodopa 13
SNCA protein-coding levodopa 12

The results can be visualized as a Chemical-Gene Network (Figure 38).

plot( results,
      type = "Network", interactive=T)

Figure 38: The Chemical-Gene Network for a single chemical

The chemical2disease function allows to retrieve the diseases for a specific chemical, or list of chemicals, and the information cab be extracted from GDAs or VDAs. To specify from where, use the type parameter.

results <- chemical2disease( chemical  = "C0023570" , type = "GDA" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        C0023570 
##  . Results:  45
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Diseases associated to levodopa, type GDA", align= "lllc") 
Table 34: Diseases associated to levodopa, type GDA
diseaseid disease_name chemical_name pmids_chemical
C0013386 Drug-Induced Dyskinesia levodopa 12
C0268467 GTP cyclohydrolase I deficiency (disorder) levodopa 7
C1851920 DRD levodopa 6
C0013421 Dystonia levodopa 3
C0013384 Dyskinesia levodopa 2
C0026650 Movement Disorders levodopa 2
C0030567 Parkinson Disease levodopa 2
C0393593 Dystonia levodopa 2
C1291564 AROMATIC L-AMINO ACID DECARBOXYLASE DEFICIENCY levodopa 2
C0007194 Hypertrophic cardiomyopathy levodopa 1
plot( results,
      type = "Network",
      interactive=T)

Figure 39: The Chemical-Disease Network for a chemical

A DiseaseClass plot is also available.

plot( results,
      type = "Network",
      class = "DiseaseClass",
      interactive=T)

Figure 40: The Chemical-Disease Class Network for a chemical

For VDAs

results <- chemical2disease( chemical  = "C0165032" , type = "VDA", database =  "ALL" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        C0165032 
##  . Results:  5
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical)  %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Diseases associated to imiquimod, type VDA",  align= "lllc") 
Table 35: Diseases associated to imiquimod, type VDA
diseaseid disease_name chemical_name pmids_chemical
C4721806 Basal cell carcinoma imiquimod 2
C0025202 Melanoma imiquimod 1
C0151779 Malignant melanoma of skin imiquimod 1
C0524910 Chronic viral hepatitis C imiquimod 1
C0596263 carcinogenesis imiquimod 1
plot( results,
      type = "Network", interactive=T)

Figure 41: The Chemical-Disease Network for a chemical

The chemical2variant function allows to retrieve the variants for a specific chemical, or list of chemicals.

results <- chemical2variant( chemical  = "C0006949", database = "ALL"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        C0006949 
##  . Results:  53
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, most_severe_consequence, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc") 
Table 36: VDAs for carbamazepine
variantid gene_symbols most_severe_consequence chemical_name pmids_chemical
rs3812718 SCN1A splice_donor_5th_base_variant carbamazepine 9
rs1061235 HLA-A 3_prime_UTR_variant carbamazepine 6
rs776746 CYP3A5 , ZSCAN25 splice_acceptor_variant carbamazepine 6
rs1045642 ABCB1 missense_variant carbamazepine 5
rs1801133 MTHFR missense_variant carbamazepine 4
rs2298771 LOC102724058, SCN1A missense_variant carbamazepine 4
rs2032582 ABCB1 missense_variant carbamazepine 3
rs1051740 EPHX1 missense_variant carbamazepine 2
rs1057910 CYP2C9 missense_variant carbamazepine 2
rs1389503611 EPHX1 missense_variant carbamazepine 2

The chemical2variant function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2variant( chemical  = "C0006949", database = "ALL", sift = c(0,0.05), polyphen = c(0.9,1)  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        C0006949 
##  . Results:  14
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, sift_score, polyphen_score, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc") 
Table 37: VDAs for carbamazepine
variantid gene_symbols sift_score polyphen_score chemical_name pmids_chemical
rs1045642 ABCB1 0.02 0.998 carbamazepine 5
rs1051740 EPHX1 0.00 0.987 carbamazepine 2
rs1389503611 EPHX1 0.01 0.995 carbamazepine 2
rs762468188 TMEM63A, EPHX1 0.00 1.000 carbamazepine 2
rs118192218 KCNQ2 , LOC105372721 0.01 0.999 carbamazepine 1
rs121912438 SOD1 0.00 0.967 carbamazepine 1
rs140908982 GRIA3 0.00 0.996 carbamazepine 1
rs1553491169 SCN9A , SCN1A-AS1 0.00 0.956 carbamazepine 1
rs1555085798 KCNA1 0.00 1.000 carbamazepine 1
rs201682634 ABCC8 0.00 1.000 carbamazepine 1
plot( results,
      type = "Network", interactive=T)

Figure 42: The Chemical-Variant Network for carbamazepine

Retrieving GDAs and VDAs associated to chemicals

Exploring the GDAs of a chemical

The chemical2gda function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gda( chemical  = "C0074393", database = "ALL"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        C0074393 
##  . Results:  151
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score, pmids_chemical)
knitr::kable(tab[1:10,], caption = "GDAs for sertraline") 
Table 38: GDAs for sertraline
gene_symbol disease_name chemical_name score pmids_chemical
IL6 Depression sertraline 1.00 30
BDNF Chorea, Huntington sertraline 1.00 9
BCHE Alzheimer Disease sertraline 1.00 158
BDNF Depression sertraline 1.00 98
SLC6A4 Depression sertraline 1.00 68
HTT Chorea, Huntington sertraline 1.00 44
SLC6A4 Depressive neurosis sertraline 1.00 73
NR3C1 Depressive neurosis sertraline 1.00 18
ZBTB20 Primrose syndrome sertraline 1.00 2
SLC6A4 Anxiety Disorder sertraline 0.95 28

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 43: Network for LEPR and metformin

Exploring the VDAs of a chemical

The chemical2vda function allows to retrieve the VDAS for a specific chemical, or list of chemicals.

results <- chemical2vda( chemical  = "C3264621"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        C3264621 
##  . Results:  370

The chemical2vda function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2vda( chemical  = "C3264621", sift = c(0,0.05) , polyphen = c(0.9,1)  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        C3264621 
##  . Results:  146
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score,pmids_chemical)
knitr::kable(tab[1:10,], caption = "VDAs associated ivacaftor") 
Table 39: VDAs associated ivacaftor
variantid disease_name chemical_name score pmids_chemical
rs80034486 Cystic Fibrosis ivacaftor 1.0 1
rs75527207 Cystic Fibrosis ivacaftor 1.0 3
rs78655421 Cystic Fibrosis ivacaftor 1.0 3
rs77834169 Cystic Fibrosis ivacaftor 0.9 2
rs193922525 Cystic Fibrosis ivacaftor 0.9 2
rs77010898 Cystic Fibrosis ivacaftor 0.9 1
rs121909047 Cystic Fibrosis ivacaftor 0.9 1
rs121908752 Cystic Fibrosis ivacaftor 0.9 1
rs121908755 Cystic Fibrosis ivacaftor 0.9 1
rs74503330 Cystic Fibrosis ivacaftor 0.9 2

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 44: Network of VDAs

Exploring the GDA evidences of a chemical

The chemical2evidence function allows to retrieve the evidences for the GDAS or VDAs for a specific chemical, or list of chemicals.

results <- chemical2evidence( chemical  = "C0023570", type = "GDA"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        C0023570 
##  . Results:  113
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Disease = disease_name, Chemical = chemical_name,  `Chemical Effect` =chemical_effect,    Year=pmYear, Sentence = sentence, pmid = reference)
tab <- tab[ order(-tab$Year),]
tab[1:10, ] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for levodopa" ) 
Table 40: Evidences for levodopa
Gene Disease Chemical Sentence Chemical Effect pmid Year
PNPLA6 SPASTIC PARAPLEGIA 39, AUTOSOMAL RECESSIVE levodopa PNPLA6-Related Disorder with Levodopa-Responsive Parkinsonism. other 36825042 2023
PNPLA6 SPASTIC PARAPLEGIA 39, AUTOSOMAL RECESSIVE levodopa PNPLA6-Related Disorder with Levodopa-Responsive Parkinsonism. other 36825042 2023
CLN6 CEROID LIPOFUSCINOSIS, NEURONAL, 6B (KUFS TYPE) levodopa Pearls & Oy-sters: Levodopa-Responsive Adult NCL (Type B Kufs Disease) Due to CLN6 Mutation. other 33875558 2021
GCH1 DRD levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
GCH1 DRD levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
GCH1 GTP cyclohydrolase I deficiency (disorder) levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
GCH1 GTP cyclohydrolase I deficiency (disorder) levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
LOC130055692 DRD levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
LOC130055692 GTP cyclohydrolase I deficiency (disorder) levodopa Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. other 31213404 2019
SPG7 SPASTIC PARAPLEGIA 7, AUTOSOMAL RECESSIVE levodopa SPG7 with parkinsonism responsive to levodopa and dopaminergic deficit. other 29246844 2018

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 45: Chemicals associated to Parkinson

Exploring the VDA evidences of a chemical

results <- chemical2evidence( chemical  = "C0042291", type = "VDA" , database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        C0042291 
##  . Results:  222
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                            `Chemical Effect` =chemical_effect,  Year=pmYear, Sentence = sentence, pmid = reference )
tab <- tab[ order(-tab$Year),]
tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for valproic acid" ) 
Table 41: Evidences for valproic acid
variantid Disease Chemical Sentence Chemical Effect pmid Year
rs776746 Epilepsies valproic acid Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. therapeutic|therapeutic 38381330 2024
rs2298771 Epilepsies valproic acid Our study suggests the findings of this investigation indicate that the polymorphisms SCN1A rs2298771 and SCN2A rs17183814 could potentially act as predictive biomarkers for the responsiveness to VPA among Chinese epilepsy patients. therapeutic 38837984 2024
rs3812718 Epilepsies valproic acid Five single nucleotide polymorphisms (SNPs), including SCN1A (rs10188577, rs2298771, rs3812718) and SCN2A (rs2304016, rs17183814), were genotyped in 233 epilepsy patients undergoing VPA therapy. therapeutic 38837984 2024
rs56411402 Epilepsies valproic acid Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. therapeutic|therapeutic 38381330 2024
rs17183814 Epilepsies valproic acid Our study suggests the findings of this investigation indicate that the polymorphisms SCN1A rs2298771 and SCN2A rs17183814 could potentially act as predictive biomarkers for the responsiveness to VPA among Chinese epilepsy patients. therapeutic 38837984 2024
rs1401813450 Epilepsies valproic acid Patients with epilepsy carrying the UGT1A6 A541G mutant genotype may have VPA-induced tremors, and early detection of this genotype will help guide the clinical individualizsation of VPA treatment. therapeutic 38908142 2024
rs28365083 Epilepsies valproic acid Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. therapeutic|therapeutic 38381330 2024
rs2304016 Epilepsies valproic acid Five single nucleotide polymorphisms (SNPs), including SCN1A (rs10188577, rs2298771, rs3812718) and SCN2A (rs2304016, rs17183814), were genotyped in 233 epilepsy patients undergoing VPA therapy. therapeutic 38837984 2024
rs1458644938 Epilepsies valproic acid Patients with epilepsy carrying the UGT1A6 A541G mutant genotype may have VPA-induced tremors, and early detection of this genotype will help guide the clinical individualizsation of VPA treatment. therapeutic 38908142 2024
rs2070959 Epilepsies valproic acid Patients with epilepsy carrying the UGT1A6 A541G mutant genotype may have VPA-induced tremors, and early detection of this genotype will help guide the clinical individualizsation of VPA treatment. therapeutic 38908142 2024

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 46: Evidence network

Exploring the attributes of a chemical

The chemical2attribute function allows to retrieve the information for a specific chemical, or list of chemicals.

results <- chemical2attribute( chemical  = "C0023570"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical 
##  . Database:     ALL 
##  . Score:         
##  . Term:        C0023570 
##  . Results:  1
tab <-results@qresult
knitr::kable(tab, caption = "Attributes for levodopa") 
Table 42: Attributes for levodopa
chemicalid chemical_name numPmids numGDAs numVDAs
C0023570 levodopa 690 1043 174

Retrieving Disease-Disease Associations from DISGENET

The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).

Searching DDAs by genes for a single disease

To obtain disease-disease associations, use the disease2disease function. This function uses as input a disease, in the same format that in disease2gene, the database to perform the search (by default, CURATED), and the argument relationship, to indicate the type of relationship of the disease pair. If the relationship is set to “has_shared_genes”, arguments such as min_genes, the minimum number of shared genes between the disease(s) of interest, and jg, the Jaccard Index for genes, can be defined. By default min_genes = 0. If the relationship is set to “has_shared_variants”, similar arguments to filter the results of the search can be defined.

The output is a DataGeNET.DGN object that contains the top diseases that share genes with the disease that has been searched.

The DataGeNET.DGN object produced by the disease2disease function also contains the Jaccard Index, also known as the Jaccard similarity coefficient for each disease pair. The Jaccard Coefficient is a similarity metric, computed as the size of the intersection divided by the size of the union of two sample sets, in this case, the genes associates to each disease:

\[\begin{equation*} J(A, B) = \frac{\mid A \cap B \mid}{\mid A \cup B \mid} \end{equation*}\]

We calculate a p value to estimate the significance of the Jaccard coefficient for a list of disease pairs. The p value is estimated using a Fisher exact test. The pvalue column displays the minus logarithm of the p value for the Jaccard Index, and is available for disease-disease associations by shared genes and by shared variants.

results <- disease2disease(
  disease  = "UMLS_C0010674", relationship = "has_shared_genes",
  database = "CURATED" ,   min_genes =2 )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:        UMLS_C0010674 
##  . Results:  11

Table 43 shows the diseases that share at least a gene with Cystic Fibrosis (UMLS_C0010674) in DISGENET curated.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share genes with Cystic Fibrosis") 
Table 43: Diseases that share genes with Cystic Fibrosis
disease1_Name disease2_Name jaccard_genes shared_genes pvalue_jaccard_genes
Cystic Fibrosis High blood pressure 0.04971 17 14.4
Cystic Fibrosis COPD 0.11972 17 22.6
Cystic Fibrosis Adult-Onset Diabetes Mellitus 0.04278 16 12.4
Cystic Fibrosis SYSTEMIC LUPUS ERYTHEMATOSIS 0.08589 14 16.3
Cystic Fibrosis Alzheimer Disease 0.05534 14 12.8
Cystic Fibrosis BESC1 0.13793 8 19.2
Cystic Fibrosis Cardiomyopathy 0.02941 8 5.4
Cystic Fibrosis Hereditary pancreatitis 0.12308 8 15.4
Cystic Fibrosis CBAVD 0.11864 7 15.8
Cystic Fibrosis Obstructive azoospermia 0.05085 3 6.5

Visualizing the diseases associated to a single disease

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share genes with it (Figure 47). The node size is proportional to the number of genes associated to each disease.

plot( results, 
      type = "Network",
      interactive=T )

Figure 47: The Disease-Disease Network by shared genes for Cystic Fibrosis

Searching DDAs via genes for multiple diseases

The function disease2disease can also use as an input a list of diseases in any of the previously described vocabularies. It will retrieve the top diseases that share genes with each of the diseases in the input list.

Table 44 shows the disease list selected for illustrating the disease2disease function

Table 44: Examples of Congenital metabolic diseases
UMLS_CUI Disease_Name
C0162671 MELAS Syndrome
C0023264 Leigh Disease
C0917796 Optic Atrophy, Hereditary, Leber
diseasesOfInterest <-  paste0("UMLS_", c("C0162671", "C0023264", "C0917796"))
results <- disease2disease(
              disease = diseasesOfInterest, relationship = "has_shared_genes",
              database = "CURATED",
              min_genes  = 20, 
              order_by = "JACCARD_GENES" )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0162671 ... UMLS_C0917796 
##  . Results:  35

Table 45 shows the diseases that share at least 20 genes with the diseases of interest.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share at list 20 genes with the diseases of interest") 
Table 45: Diseases that share at list 20 genes with the diseases of interest
disease1_Name disease2_Name jaccard_genes shared_genes pvalue_jaccard_genes
Leber’s optic atrophy MC5DM1 0.60465 26 69
Leber’s optic atrophy NEUROPATHY, ATAXIA, AND RETINITIS PIGMENTOSA 0.60465 26 69
Leber’s optic atrophy Camptodactyly of proximal interphalangeal joint 0.59091 26 68
Leber’s optic atrophy Wide spaced nipples (finding) 0.55319 26 65
Leber’s optic atrophy Scrotal hypoplasia 0.55319 26 65
Leber’s optic atrophy postaxial polydactyly hands (physical finding) 0.54167 26 64
Leber’s optic atrophy cleft palate with cleft lip bilateral 0.53061 26 63
Leber’s optic atrophy MELAS Syndrome 0.52830 28 66
MELAS Syndrome Leber’s optic atrophy 0.52830 28 66
Leber’s optic atrophy rod cone dystrophy 0.44444 28 62

To obtain the network, set the class argument of the plot function to Network(Figure 48). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.

plot( results,
      type = "Network",
      interactive=TRUE)

Figure 48: The Disease-Disease Network by shared genes for a list of diseases

Searching DDAs via shared variants for a single disease

To obtain disease-disease associations via shared genetic variants, use the disease2disease function with the argument relationship equal to “has_shared_variants”, the database to perform the search (by default, CURATED), and the argument min_vars, the minimum number of shared variants between the disease(s) of interest. By default min_vars = 0. The output is a DataGeNET.DGN object that contains the top diseases that share variants with the disease that has been searched.
In the example, we have specified a minimum value for the Jaccard Index computed from the shared variants (jv = 0.05).

results <- disease2disease(
  disease  = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_shared_variants",
  database = "CURATED", jv = 0.01 )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-variant 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0028754 
##  . Results:  34

Table 46 shows the top diseases that share variants with Obesity and NIDDM.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_variants","shared_variants", "pvalue_jaccard_variants")] )
tab <- tab[ order(-tab$shared_variants),]

knitr::kable(tab[1:10,], caption = "Top diseases that share variants with Obesity and NIDDM", row.names = F) 
Table 46: Top diseases that share variants with Obesity and NIDDM
disease1_Name disease2_Name jaccard_variants shared_variants pvalue_jaccard_variants
Adult-Onset Diabetes Mellitus WOLFRAM SYNDROME 1 0.04687 170 314
Adult-Onset Diabetes Mellitus DFNA38 0.04509 163 305
Adult-Onset Diabetes Mellitus HYPERINSULINEMIC HYPOGLYCEMIA, FAMILIAL, 1 0.04254 160 254
Adult-Onset Diabetes Mellitus WOLFRAM-LIKE SYNDROME, AUTOSOMAL DOMINANT 0.04522 160 330
Adult-Onset Diabetes Mellitus CTRCT41 0.04509 159 330
Adult-Onset Diabetes Mellitus Decreased HDL 0.01764 150 63
Adult-Onset Diabetes Mellitus Maturity onset diabetes mellitus in young 0.02742 122 120
Adult-Onset Diabetes Mellitus DIABETES MELLITUS, TRANSIENT NEONATAL, 2 0.02612 93 181
Adult-Onset Diabetes Mellitus NAFLD - Nonalcoholic Fatty Liver Disease 0.02405 88 139
Adult-Onset Diabetes Mellitus HYPOGLYCEMIA, LEUCINE-INDUCED 0.02497 88 200

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share variants with it (Figure 49). The node size is proportional to the number of variants associated to each disease.

plot( results, 
      type = "Network",
       interactive=T )

Figure 49: The Disease-Disease Network by shared variants

Searching DDAs via semantic relationships

To obtain disease-disease associations via semantic relationships, use the disease2disease function with the argument relationship equal to one of the following types of semantic relations: has_manifestation, has_associated_morphology, manifestation_of, associated_morphology_of, is_finding_of_disease, due_to, has_definitional_manifestation, has_associated_finding, definitional_manifestation_of, disease_has_finding, cause_of, associated_finding_of.

The output is a DataGeNET.DGN object that contains the diseases that have the type of relationship defined in the query with the query disease.

results <- disease2disease(
  disease  = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_manifestation", min_sokal = 0.7, order_by = "SOKAL",
  database = "CURATED"  )
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-rela 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0028754 
##  . Results:  20

Table 47 shows the diseases associated with Obesity and Diabetes Mellitus non Insulin dependent (NIDDM) by the relation type “has_manifestation”.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","ddaRelation","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab , caption = "Diseases associated with Obesity and NIDDM") 
Table 47: Diseases associated with Obesity and NIDDM
disease1_Name disease2_Name ddaRelation shared_genes pvalue_jaccard_genes
Obesity OBESITY, HYPERPHAGIA, AND DEVELOPMENTAL DELAY has_manifestation 1 1.65
Obesity Pseudo Pseudohypoparathyroidism has_manifestation 1 1.65
Obesity BBS1 has_manifestation 2 1.43
Obesity CHOPS SYNDROME has_manifestation 1 1.65
Obesity PHP1C has_manifestation 1 1.65
Obesity Bardet-Biedl syndrome 2 has_manifestation 1 1.19
Obesity CORTRD2 has_manifestation 1 1.36
Obesity BARDET-BIEDL SYNDROME 6 has_manifestation 1 1.19
Obesity Bardet-Biedl syndrome 4 has_manifestation 1 1.65
Obesity HYPOGONADOTROPIC HYPOGONADISM 27 WITHOUT ANOSMIA has_manifestation 1 1.36
Obesity DiGeorge’s syndrome has_manifestation 1 0.46
Adult-Onset Diabetes Mellitus MODY, TYPE 13 has_manifestation 1 1.61
Obesity PSEUDOHYPOPARATHYROIDISM, TYPE IA has_manifestation 1 1.36
Obesity WAGR Syndrome has_manifestation 1 0.90
Adult-Onset Diabetes Mellitus KERATODERMA-ICHTHYOSIS-DEAFNESS SYNDROME, AUTOSOMAL RECESSIVE has_manifestation 2 2.75
Obesity 9q- Syndrome has_manifestation 1 0.84
Obesity BARDET-BIEDL SYNDROME 18 has_manifestation 1 1.65
Obesity SBIDDS has_manifestation 1 1.65
Obesity PWLS has_manifestation 1 1.36
Adult-Onset Diabetes Mellitus IDDHH has_manifestation 1 1.31

Searching diseases similar to a disease of interest

It is possible to obtain the most similar diseases according to the Sokal-Sneath semantic similarity distance using the the get_similar_diseases function. The disease similarity between concepts is computed using the Sokal-Sneath semantic similarity distance (Sánchez and Batet 2011) on the taxonomic relations provided by the Unified Medical Language System Metathesaurus. Only the relationships of type is-a (which describe the taxonomy in any ontology) are taken into account. The get_similar_diseases function uses as input a disease, and as an optional argument min_sokal, a minimum value for the Sokal distance. By default min_sokal = 0.1.

results <- get_similar_diseases(
  disease  = "UMLS_C0011860",
    min_sokal = 0.6)
results
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-sokal 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0011860 
##  . Results:  142

In the Table 48, the top diseases associated to the disease, by Sokal distance

tab <- unique(results@qresult[  ,c("disease1_Name",  "disease2_Name","sokal")] )
knitr::kable(tab[1:10,], caption = "Diseases semantically similar to NIDDM") 
Table 48: Diseases semantically similar to NIDDM
disease1_Name disease2_Name sokal
Adult-Onset Diabetes Mellitus Maturity onset diabetes mellitus in young 0.946
Adult-Onset Diabetes Mellitus Diabetes mellitus without complication 0.945
Adult-Onset Diabetes Mellitus Diabetes Mellitus, Lipoatrophic 0.945
Adult-Onset Diabetes Mellitus Koberling Dunnigan Syndrome 0.944
Adult-Onset Diabetes Mellitus Diabetes mellitus type 2 in obese 0.943
Adult-Onset Diabetes Mellitus MODY9 0.943
Adult-Onset Diabetes Mellitus MODY6 0.943
Adult-Onset Diabetes Mellitus MODY1 0.943
Adult-Onset Diabetes Mellitus MODY4 0.943
Adult-Onset Diabetes Mellitus MODY3 0.943

Disease enrichment

The disease_enrichment function performs a disease enrichment (or over-representation) analysis. It determines whether a user-defined set of genes is statistically significantly associated with a disease gene set in DISGENET.

The function takes as input a list of entities, either genes or variants. They are compared against the gene/variant-disease associations in the selected database (by default, ALL) to determine the diseases associated with the given gene list. The genes can be identified with HGNC, ENSEMBL or Entrez identifiers.

The database parameter allows users to choose which data source to use: CURATED for curated gene-disease associations (the default option), CLINICALTRIALS for associations extracted from ClinicalTrials.gov, or ALL to include all available databases. The number of genes on the selected data source is used as background or universe of the over-representation test.

The common_entities parameter sets the minimum number of entities that must be shared with a disease for it to be considered in the analysis; the default is 1. The max_pvalue parameter sets a threshold for the p-value from the Fisher test (default is 0.05).

For genes

Below, an example of how to perform a disease enrichment with a list of genes extracted associated to Autism from the Developmental Brain Disorder Gene Database (Gonzalez-Mantilla et al. 2016).

genes <- c("ADNP", "ANKRD11", "ANKRD17",  "ASXL1",  "BCKDK",  "BRSK2",  "CDK13",  "CDK8",  "CHD2",  "CHD7",  "CHD8",  "CLCN2",  "CREBBP",  "CSDE1",  "CTCF",  "CTNNB1",  "DDX3X",  "FOXP1",  "GFER",  "H4C3",  "HNRNPUL2",  "IQSEC2",  "ITSN1",  "JARID2",  "LRP2",  "MARK2",  "MBOAT7",  "MYT1L",  "NAA15",  "NALCN",  "NAV3",  "NEXMIF" ,  "NSD1",  "PHF21A",  "POGZ",  "PRR12",  "QRICH1",  "SCAF1",  "SCN1A",  "SCN2A",  "SETD5",  "SHANK3",  "SIN3A",  "SOX11",  "SOX6",  "TANC2",  "TBCD",  "TCF20" ,  "TCF4",  "TCF7L2",  "TRAF7",  "TRIP12",  "WAC",  "WDR26",  "ZEB2",  "ZMYM2",  "ZNF292",  "ZSWIM6" )
results <- disease_enrichment(
   entities  = genes,
    vocabulary = "HGNC", database = "CURATED",)
## Your query has 1 page.
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       ADNP ... ZSWIM6

In the Table 49, the top diseases associated to the list of genes.

tab <- unique(results@qresult[  ,c("diseaseName",  "geneRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of genes") 
Table 49: Diseases significantly associated with the list of genes
diseaseName geneRatio bgRatio pvalue
Mental retardation, nonspecific 44/58 44/13460 0
Neurodevelopmental Disorder 36/58 36/13460 0
Neurodevelopmental delay 24/58 24/13460 0
Childhood autism 22/58 22/13460 0
Non-specific syndromic intellectual disability 17/58 17/13460 0
Seizure 22/58 22/13460 0
AUTISM SPECTRUM DISORDER 22/58 22/13460 0
Child Development Disorder 14/58 14/13460 0
Global developmental delay 19/58 19/13460 0
Rare genetic intellectual disability 8/58 8/13460 0

For variants

Below, an example of how to perform a disease enrichment with a list of variants extracted from the publication Genomic Landscape and Mutational Signatures of Deafness-Associated Genes (Azaiez et al. 2018).

results <- disease_enrichment(
   entities  =  c("rs80338902","rs397516871","rs368341987","rs375050157","rs111033280","rs140884994","rs201076440","rs111033439","rs1296612982","rs41281314","rs397516875","rs143282422","rs142381713","rs35818432","rs111033225","rs200104362","rs201004645","rs34988750","rs373169422","rs397517356","rs188376296","rs199897298","rs200263980","rs200416912","rs184866544","rs397517344","rs41281310","rs727503066","rs727504710","rs143240767","rs145771342","rs376898963","rs397516878","rs181255269","rs188498736","rs111033192","rs117966637","rs914189193","rs181611778","rs111033194","rs111033248","rs111033262","rs111033333","rs111033529","rs146824138","rs483353055","rs528089082","rs747131589","rs111033536","rs45629132","rs371142158","rs727504654","rs192524347","rs527236122","rs111033186","rs111033287","rs139889944","rs200454015","rs397517328","rs111033275","rs150822759","rs200038092","rs201709513","rs370155266","rs45500891","rs111033196","rs111033360","rs397517322","rs111033524","rs727505166","rs79444516","rs35730265","rs45549044","rs111033361","rs370696868","rs727504309","rs533231493"),
    vocabulary = "DBSNP", database = "CURATED",)
## Your query has 1 page.
results
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       rs80338902 ... rs533231493

In the Table 50, the top diseases associated to the list of variants

tab <- unique(results@qresult[  ,c("diseaseName",  "variantRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of variants") 
Table 50: Diseases significantly associated with the list of variants
diseaseName variantRatio bgRatio pvalue
USH2A 28/77 28/687727 0
USH1A, FORMERLY 26/77 26/687727 0
RETINITIS PIGMENTOSA 39 21/77 21/687727 0
DFNB1A 15/77 15/687727 0
USHER SYNDROME, TYPE ID 12/77 12/687727 0
DFNB2 12/77 12/687727 0
DFNA3A 8/77 8/687727 0
DFNB12 10/77 10/687727 0
Usher syndrome 9/77 9/687727 0
Senter syndrome 6/77 6/687727 0

Versions

Get DISGENET data version

get_disgenet_version()
## [1] "{ status : OK , payload :{ lastUpdate : 26 Sep 2024 , version : DISGENET v24.3 }, httpStatus :200}"

disgenet2r version

## Version: 1.2.1

License

disgenet2r is distributed under the GPL-2 license.

References

Azaiez, Hela, Kevin T. Booth, Sean S. Ephraim, Bradley Crone, Elizabeth A. Black-Ziegelbein, Robert J. Marini, A. Eliot Shearer, et al. 2018. “Genomic Landscape and Mutational Signatures of Deafness-Associated Genes.” The American Journal of Human Genetics 103 (4): 484–97. https://doi.org/10.1016/j.ajhg.2018.08.006.
Gonzalez-Mantilla, Andrea J., Andres Moreno-De-Luca, David H. Ledbetter, and Christa Lese Martin. 2016. A Cross-Disorder Method to Identify Novel Candidate Genes for Developmental Brain Disorders.” JAMA Psychiatry 73 (3): 275–83. https://doi.org/10.1001/jamapsychiatry.2015.2692.
Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. 2019. The DisGeNET knowledge platform for disease genomics: 2019 update.” Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkz1021.
Piñero, Janet, Josep Saüch, Ferran Sanz, and Laura I. Furlong. 2021. “The DisGeNET Cytoscape App: Exploring and Visualizing Disease Genomics Data.” Computational and Structural Biotechnology Journal 19: 2960–67. https://doi.org/https://doi.org/10.1016/j.csbj.2021.05.015.
Sánchez, David, and Montserrat Batet. 2011. Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.” Journal of Biomedical Informatics 44 (5): 749–59. https://doi.org/10.1016/j.jbi.2011.03.013.