disgenet2r: An R package to explore the molecular underpinnings of human diseases

1 Introduction

The disgenet2r package contains a set of functions to retrieve, visualize and expand DISGENET data (Piñero et al. 2021, 2019, 2026). DISGENET is a disease genomics intelligence platform that transforms fragmented biomedical evidence into structured, evidence-ranked, and provenance-aware knowledge. It brings together information on genes, variants, diseases, phenotypes, drugs, chemicals, and therapeutic evidence into a unified semantic framework. This helps researchers and R&D teams move from scattered biomedical data to more reliable, traceable, and defensible decisions. The information in DISGENET has been extracted from specialized resources and from the literature using state-of-the-art text mining technologies (Table 1.1). For a detailed description about the Natural Language processing tool powering DISGENET, read our whitepaper “Unlocking Biomedical Knowledge at Scale: Transforming Scientific Literature into Structured Intelligence” (MedBioInformatics Solutions 2026).

To use DISGENET and the disgenet2r package, you need to acquire a license. Please contact us at info@disgenet.com for license conditions and pricing.

Table 1.1: Sources of DISGENET data
Source	Type	Description
CLINGEN	GDAs/VDAs	The Clinical Genome Resource
CLINPGX	GDAs/VDAs	The Clinical Pharmacogenomics Resource
CLINVAR	GDAs/VDAs	The ClinVar database
GENCC	GDAs	The Gene Curation Coalition
MGD_HUMAN	GDAs	Mouse Genome Database, human data
ORPHANET	GDAs	The portal for rare diseases and orphan drugs (Orphanet)
PSYGENET	GDAs	Psychiatric disorders Gene Association NETwork (PsyGeNET)
RGD_HUMAN	GDAs	Rat Genome Database, human data
UNIPROT	GDAs/VDAs	The Universal Protein Resource (UniProt)
CURATED	GDAs/VDAs	Human curated sources: ClinGen, ClinVar, ClinPGX, GenCC, UniProt, Orphanet, PsyGeNET, MGD, and RGD
FINNGEN	GDAs/VDAs	FinnGen data
GWASCAT	GDAs/VDAs	The NHGRI-EBI GWAS Catalog
PHEWASCAT	GDAs/VDAs	The PHEWAS Catalog
UK BIOBANK	GDAs/VDAs	UK Biobank GWAS data
CHEMBL	GDAs	The ChEMBL database
HPO	GDAs	Human Phenotype Ontology
INFERRED	GDAs	Inferred data from the HPO, CHEMBL, and the GWAS and PHEWAS Catalogs, and from UK and FinnGen biobanks
MGD_MOUSE	GDAs	Mouse Genome Database, mouse data
RGD_RAT	GDAs	Rat Genome Database, rat data
TEXTMINING_MODELS	GDAs	Data from text mining of Medline abstracts (animal models)
MODELS	GDAs	Data from animal models: MGD mouse, RGD rat, and text-mining models
CLINICALTRIALS	GDAs	Data from ClinicalTrials.gov
TEXTMINING_HUMAN	GDAs/VDAs	Data from text mining of Medline abstracts (human)
ALL	GDAs/VDAs	All data sources

You can test DISGENET and the disgenet2r package by registering for a free trial account here.

In the following document, we illustrate how to use the disgenet2r package through a series of examples.

2 Getting Started

2.1 Installation

The package disgenet2r is available through GitLab. The package requires an R version > 3.5.

Install disgenet2r by typing in R:

library(devtools)
install_gitlab("medbio/disgenet2r")

To load the package:

library(disgenet2r)

2.2 Authentication

Once you have completed the registration process, go to your user profile…

… and retrieve your API key

After retrieving the API key from your user profile, run the lines below so the key is available for all the disgenet2r functions.

api_key <- "enter your API key here"

Sys.setenv(DISGENET_API_KEY= api_key)

2.3 Quick Start

The functions in the disgenet2r package receive as parameters one entity (gene, disease, variant, and chemical), or a list of entities (up to 100) and combinations of them. In addition, they have the following common parameters:

score: A vector with two elements: 1) initial value of score 2) final value of score. Default 0-1. Note that the score refers to the normalized score.
database: Name of the database that will be queried. Default CURATED. It can take the values: ‘CLINGEN’, ‘CLINPGX’, ‘CLINVAR’,‘GENCC’, ‘ORPHANET’, ‘PSYGENET’, ‘UNIPROT’, ‘CURATED’, ‘CHEMBL’, ‘HPO’, ‘GWASCAT’, ‘PHEWASCAT’, ‘UKBIOBANK’, ‘FINNGEN’, ‘INFERRED’, ‘MGD_HUMAN’, ‘MGD_MOUSE’, ‘RGD_HUMAN’, ‘RGD_RAT’, ‘TEXTMINING_MODELS’, ‘MODELS’, ‘TEXTMINING_HUMAN’, “CLINICALTRIALS”, and ‘ALL’.
n_pags: A number between 1 and 100 indicating the number of pages to retrieve from the results of the query. Default 100.
verbose: By default FALSE. Change it to TRUE to enable real-time logging from the function.
order_by: By default score. Depending on the type of query, it can accept the following values: score, dsi, dpi, pli, pmYear, ei, yearInitial, yearFinal, numCTsupportingAssociation.

Below, an example of a query for the BRCA1 gene in ALL the data. Notice that this query retrieves over 300 pages of results. Only the first 10,000 results will be retrieved (100 pages, 100 results per page).

results <- gene2evidence( gene = "BRCA1", vocabulary = "HGNC", database = "ALL")

## Notice that your query has a maximum of 237 pages.
## By using the default n_pags (100), your query of 237 pages has been reduced to 100 pages.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        BRCA1 
##  . Results:  10000

3 Usage Limits

3.1 Trial account

Please note that the trial account enables you to test all the functions of the disgenet2r package, but the queries to DISGENET database have the following restrictions:

Only the top-30 results ordered by descending DISGENET score are returned (pagination is not supported).
Multiple-entity queries support at most 10 entities (genes, diseases, variants).
The access to DISGENET with a TRIAL account will expire after 7 days from the day of activation.

3.2 Academic account

Academics can access our expert-curated dataset.

3.3 Other plans

There are limits in place for the disgenet2r package to ensure smooth performance for all users. These limits apply to academics, advanced, and premium users, mirroring the limits of the DISGENET REST API.

Here’s a breakdown of the limitations:

A maximum of 100 pages of results are returned.
Multiple-entity queries support at most 100 entities (genes, diseases, variants).

Important Note: The package will display a warning message if you exceed these limits.

3.4 Recommendations for Efficient Use

To improve performance and avoid exceeding limits, consider querying with smaller batches of entities. You can also use DISGENET metrics and annotations to refine your search and reduce the number of returned results.

4 Entity Normalization

The entity_normalization function maps free-text biomedical terms to standardized identifiers. It takes an entity_type as a parameter, specifying the target namespace (e.g., disease, gene, chemical), and a term_list containing one or more free-text expressions separated by “,” for matching. Users can control match quality through minimum_similarity_threshold, which sets the cosine similarity cutoff between 0.0 and 1.0 (default 0.8), and can define how many candidates to return using results, which accepts values from 0 to 25 (default 5).

4.1 Genes

results <- entity_normalization(entity_type = "gene", term_list = "p53", 
                            minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Gene Normalization Example")

Table 4.1: Gene Normalization Example
term	entityType	normalizedId	normalizedName	similarity	matchedText
p53	gene	7157	TP53	1.00000	p53
p53	gene	10042	HMGXB4	0.94705	P53N
p53	gene	8925	HERC1	0.91898	p532
p53	gene	7158	TP53BP1	0.90215	p53B

4.2 Diseases

results <- entity_normalization(entity_type = "disease", term_list = c("ALS", "MS"), 
                            minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Disease Normalization Example")

Table 4.2: Disease Normalization Example
term	entityType	normalizedId	normalizedName	similarity	matchedText
ALS	disease	C0002736	Amyotrophic Lateral Sclerosis	1.00000	ALS
ALS	disease	C0268425	Alstrom Syndrome	0.91667	ALSS
MS	disease	C0026769	Multiple Sclerosis	1.00000	MS
MS	disease	C0026269	Mitral Valve Stenosis	1.00000	MS
MS	disease	C1868685	MULTIPLE SCLEROSIS, SUSCEPTIBILITY TO	1.00000	MS

4.3 Chemicals

results <- entity_normalization(entity_type = "chemical", 
                                term_list = c("aspirin", "paracetamol"),  
                                minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Chemical Normalization Example")

Table 4.3: Chemical Normalization Example
term	entityType	normalizedId	normalizedName	similarity	matchedText
aspirin	chemical	CHEMBL25	Acetylsalicylic acid	1	aspirin
paracetamol	chemical	CHEMBL112	Acetaminophen	1	paracetamol

5 Gene-Disease Associations (GDAs)

5.1 Searching by gene

The gene2disease function retrieves the GDAs in DISGENET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary. By default, vocabulary = "HGNC". To switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.

The function also requires the user to specify the source database using the argument database. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from ClinGen, ClinVar, ClinPGX, MGD (Human data), RGD (Human data), GenCC, PsyGeNET, UniProt, and Orphanet.

The information can be filtered using the DISGENET score. The argument score consists of a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1).

5.1.1 Single gene

In the example, the query for the Leptin Receptor (Gene Symbol LEPR, and Entrez NCBI Identifier 3953) is performed in the curated data in DISGENET.

results <- gene2disease( gene = 3953, vocabulary = "ENTREZ",
                       database = "CURATED")

The function gene2disease produces an object DataGeNET.DGN that contains the results of the query.

class(results)

## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"

Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single or list), the type of entity (gene-disease), the selected database (CURATED), the score range used in the search (0-1), and the gene NCBI identifier (3953).

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        3953 
##  . Results:  76

To obtain the data frame with the results of the query

tab <- results@qresult
head( tab, 3 )

##   gene_symbol geneid       ensemblid   geneNcbiType geneDSI geneDPI    genepLI
## 1        LEPR   3953 ENSG00000116678 protein-coding   0.432   0.875 8.8607e-05
## 2        LEPR   3953 ENSG00000116678 protein-coding   0.432   0.875 8.8607e-05
## 3        LEPR   3953 ENSG00000116678 protein-coding   0.432   0.875 8.8607e-05
##   uniprotids protein_classid protein_class_name
## 1     P48357    DTO_05007599          Signaling
## 2     P48357    DTO_05007599          Signaling
## 3     P48357    DTO_05007599          Signaling
##                               disease_name diseaseType diseaseUMLSCUI
## 1                                  Obesity   [disease]       C0028754
## 2 Diabetes Mellitus, Non-Insulin-Dependent   [disease]       C0011860
## 3                              Hyperphagia [phenotype]       C0020505
##                                                                            diseaseClasses_MSH
## 1 Nutritional and Metabolic Diseases (C18), Pathological Conditions, Signs and Symptoms (C23)
## 2                   Endocrine System Diseases (C19), Nutritional and Metabolic Diseases (C18)
## 3                                           Pathological Conditions, Signs and Symptoms (C23)
##       diseaseClasses_UMLS_ST
## 1 Disease or Syndrome (T047)
## 2 Disease or Syndrome (T047)
## 3             Finding (T033)
##                                        diseaseClasses_DO
## 1                        disease of metabolism (0014667)
## 2 genetic disease (630), disease of metabolism (0014667)
## 3                                                       
##                                                                           diseaseClasses_HPO
## 1                                                                 Growth abnormality (01507)
## 2 Abnormality of the endocrine system (00818), Abnormality of metabolism/homeostasis (01939)
## 3                                                  Abnormality of the nervous system (00707)
##   disease_prevalence_class disease_prevalence_geo_area disease_prevalence_type
## 1                                                                             
## 2                                                                             
## 3                                                                             
##   disease_inheritance numDBSNPsupportingAssociation numCTsupportingAssociation
## 1                                                 3                         19
## 2                                                 2                          4
## 3                                                 0                          1
##   numPMIDs chemsIncludedInEvidenceBySource numChemsIncludedInEvidences
## 1       17                              NA                          NA
## 2        6                              NA                          NA
## 3        3                              NA                          NA
##   numPMIDSWithChemsIncludedInEvidences numNCTSWithChemsIncludedInEvidences
## 1                                   NA                                  NA
## 2                                   NA                                  NA
## 3                                   NA                                  NA
##   score yearInitial yearFinal
## 1  1.35        1986      2025
## 2  1.00        2010      2025
## 3  0.95        1986      2007
##                                                                                                                                                                                    scoreBreakdown
## 1 1.35, CLINPGX, CLINVAR, RGD, MGD_HUMAN, 4, 0.6, CLINICALTRIALS, 1, 0.1, GWASCAT, HPO, 2, 0.15, MGD_MOUSE, TEXTMINING_MODELS, RGD_RAT, 3, 0.1, NA, TEXTMINING_HUMAN, TEXTMINING_MODELS, 384, 0.4
## 2                           1, RGD, MGD_HUMAN, 2, 0.45, CLINICALTRIALS, 1, 0.1, GWASCAT, 1, 0.05, MGD_MOUSE, TEXTMINING_MODELS, RGD_RAT, 3, 0.1, NA, TEXTMINING_HUMAN, TEXTMINING_MODELS, 48, 0.3
## 3                                                   0.95, RGD, 1, 0.4, CLINICALTRIALS, 1, 0.1, HPO, 1, 0.05, TEXTMINING_MODELS, RGD_RAT, 2, 0.1, NA, TEXTMINING_HUMAN, TEXTMINING_MODELS, 21, 0.3
##   normalized_score evidence_index evidence_level diseaseid
## 1        0.8709677      0.8822222           <NA>  C0028754
## 2        0.6451613      0.9578947           <NA>  C0011860
## 3        0.6129032      1.0000000           <NA>  C0020505

The same query can be performed using the Gene Symbol (LEPR) and the data source (TEXTMINING_HUMAN). Notice how the number of diseases associated to the Leptin Receptor has increased.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  435

The same query can be performed using the ENSEMBL gene identifier of the LEPR gene (ENSG00000116678) by setting the vocabulary to ENSEMBL.

results <- gene2disease( gene = "ENSG00000116678",
                        vocabulary = "ENSEMBL",
                       database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        ENSG00000116678 
##  . Results:  435

Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.3,1) is used. Notice how the number of diseases associated to the Leptin Receptor drops when the score is restricted.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "ALL",
                       score =c(0.3,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.3-1 
##  . Term:        LEPR 
##  . Results:  45

In Table 5.1 are shown the top 10 diseases associated to the LEPR gene

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score","normalized_score", "yearInitial", "yearFinal")] )
tab$normalized_score  <- round(tab$normalized_score, digits = 2)
knitr::kable(tab[1:10,], caption = "Top diseases associated to LEPR" )

Table 5.1: Top diseases associated to LEPR
gene_symbol	disease_name	score	normalized_score	yearInitial	yearFinal
LEPR	Obesity	1.35	0.87	1966	2026
LEPR	Diabetes Mellitus, Non-Insulin-Dependent	1.00	0.65	1966	2026
LEPR	Hyperphagia	0.95	0.61	1986	2026
LEPR	Diabetes Mellitus	0.90	0.58	1985	2025
LEPR	Hypertensive disease	0.85	0.55	1999	2025
LEPR	Morbid obesity	0.85	0.55	1997	2022
LEPR	Hyperinsulinism	0.85	0.55	1986	2023
LEPR	Metabolic Syndrome X	0.85	0.55	2000	2024
LEPR	Liver carcinoma	0.80	0.52	1996	2024
LEPR	Hyperglycemia	0.80	0.52	1986	2025

5.1.1.1 Visualizing the diseases associated to a single gene

The disgenet2r package offers two options to visualize the results of querying a single gene in DISGENET: a network showing the diseases associated to the gene of interest (Gene-Disease Network), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Gene-Disease Network on a DataGeNET.DGN object (Figure 5.1). In the Gene-Disease Network the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop parameter allows to adjust the size of the nodes, while the eprop parameter adjusts the width of the edges while keeping the proportionality to the score.

plot( results,
      type = "Network",
      prop = 20, eprop =5, verbose = T)

Figure 5.1: The Gene-Disease Network for the Leptin Receptor gene

Use interactive = TRUE to display an interactive plot (Figure 5.2).

plot( results,
      type = "Network",
       interactive = TRUE)

Figure 5.2: The interactive Gene-Disease Network for the Leptin Receptor gene

The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to DiseaseClass (Gene-Disease Class Network, Figure 5.3). In the Gene-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Leptin Receptor is associated mainly to Nutritional and Metabolic Diseases. There diseases that do not have annotations to MeSH disease class will be shown as a warning.

plot( results,
      class = "DiseaseClass",
       interactive=T, verbose = T)

Figure 5.3: The Disease Class Network for the Leptin Receptor Gene

5.1.1.2 Exploring the evidences associated to a gene

You can extract the evidences associated to a particular gene using the function gene2evidence. The evidence types in DISGENET are scientific publications (PMIDs), and clinical trials (NCTIDs).

Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease.

results <- gene2evidence( gene = "LEPR", 
                          vocabulary = "HGNC",
                          disease ="UMLS_C3554225",
                          database = "ALL")

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  25

The results are shown in Table 5.2.

tab <- results@qresult
tab <-  tab %>%
  filter(reference_type == "PMID") %>%
  select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))

tab <- tab %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(  pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) ) ) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY" )

Table 5.2: Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY
pmid	associationType	Year	Sentence
29545012	CausalMutation	2018	Potential role of gender specific effect of leptin receptor deficiency in an extended consanguineous family with severe early-onset obesity.
25751111	CausalMutation	2015	Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect.
25751111	GeneticVariation	2015	Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect.
24319006	CausalMutation	2014	Novel LEPR mutations in obese Pakistani children identified by PCR-based enrichment and next generation sequencing.
24611737	CausalMutation	2014	Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population.
23616257	CausalMutation	2014	Whole-exome sequencing identifies novel LEPR mutations in individuals with severe early onset obesity.
24611737	GeneticVariation	2014	Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population.
22810975	GeneticVariation	2012	Variants in the LEPR gene are nominally associated with higher BMI and lower 24-h energy expenditure in Pima Indians.
18703626	GeneticVariation	2008	Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor.
18703626	CausalMutation	2008	Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor.
17229951	GeneticVariation	2007	Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
17229951	GeneticVariation	2007	Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
17229951	CausalMutation	2007	Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
17229951	CausalMutation	2007	Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
16284652	CausalMutation	2005	Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes.
12646666	GeneticVariation	2003	Binge eating as a major phenotype of melanocortin 4 receptor gene mutations.
9537324	CausalMutation	1998	A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9537324	GeneticVariation	1998	A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9860295	GeneticVariation	1998	Transmission disequilibrium and sequence variants at the leptin receptor gene in extremely obese German children and adolescents.
9537324	GeneticVariation	1998	A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9144432	GeneticVariation	1997	Amino acid variants in the human leptin receptor: lack of association to juvenile onset obesity.

To visualize the results when there are many evidences, we suggest to use plot the results using the argument Points (Figure 5.4). It is important to set the parameter limit to 10,000, in order to include all the evidences in the plot.

results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
                        database = "ALL", score=c(0.7,1) )
plot(results, type="Points",   interactive=T, limit=10000)

Figure 5.4: The Evidences plot for the Leptin Receptor gene

5.1.2 Multiple genes

The gene2disease function can also receive as input a list of genes, either as Entrez NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 5.3) and then, we apply the function gene2disease.

Table 5.3: Example of voltage-gated potassium channel family members
Name	Description
KCNE1	potassium channel, voltage gated subfamily E regulatory beta subunit 1
KCNE2	potassium channel, voltage gated subfamily E regulatory beta subunit 2
KCNH1	potassium channel, voltage gated eag related subfamily H, member 1
KCNH2	potassium channel, voltage gated eag related subfamily H, member 2
KCNG1	potassium voltage-gated channel modifier subfamily G member 1

Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.

myListOfGenes <- c( "KCNE1", "KCNE2", "KCNH1", "KCNH2", "KCNG1")

The gene2disease function also requires the user to specify the source database using the argument database, and optionally, the DISGENET score can also be applied to filter the results.

results <- gene2disease(
  gene     = myListOfGenes,
 database = "ALL",
 score =c(0.5, 1),
  verbose  = TRUE
)

## Your query has 1 page.

## Warning in gene2disease(gene = myListOfGenes, database = "ALL", score = c(0.5, : 
##  One or more of the genes in the list is not in DISGENET ( 'ALL' ):
##    - KCNG1

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:       KCNE1 ... KCNH2 
##  . Results:  17

In Table 5.4, the top 10 diseases associated to the list of genes belonging to the voltage-gated potassium channel family.

tab <- results@qresult[, c("gene_symbol", "disease_name", "score",
                           "normalized_score", "yearInitial", "yearFinal")] %>%
  unique() %>%
  mutate(normalized_score = round(normalized_score, 2)) %>%
  arrange(desc(score), yearInitial)

knitr::kable(tab[1:10,], caption = "Top GDAs for the list of genes belonging to the voltage-gated potassium channel family")

Table 5.4: Top GDAs for the list of genes belonging to the voltage-gated potassium channel family
gene_symbol	disease_name	score	normalized_score	yearInitial	yearFinal
KCNH2	Long QT Syndrome	1.30	0.84	1970	2026
KCNH2	Cardiac Arrhythmia	1.25	0.81	1975	2026
KCNE2	Long QT Syndrome	1.10	0.71	1999	2024
KCNH2	Atrial Fibrillation	1.05	0.68	2001	2025
KCNH2	Long Qt Syndrome 2	1.00	0.65	1990	2025
KCNE1	Jervell-Lange Nielsen Syndrome	1.00	0.65	1993	2025
KCNH2	Short QT Syndrome 1	1.00	0.65	1995	2025
KCNE1	LONG QT SYNDROME 5	0.90	0.58	1991	2022
KCNH2	Prolonged QT interval	0.90	0.58	1995	2026
KCNE1	Long QT Syndrome	0.90	0.58	1997	2025

5.1.2.1 Visualizing the diseases associated to multiple genes

By default, plotting a DataGeNET.DGN resulting of the query with a list of genes produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5.5).

plot( results,
      type = "Network",
      prop = 10, verbose = T)

Figure 5.5: The Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Set the argument interactive = TRUE to see an interactive network (Figure 5.6).

plot( results,
      type = "Network",
      prop = 10,  interactive=TRUE)

Figure 5.6: The interactive Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Setting the argument type to Heatmap produces a Gene-Disease Heatmap (Figure 5.7), where the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs. The argument nchars can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type  ="Heatmap",
      limit  = 100,
      nchars = 50, 
      interactive =T, 
      verbose = T)

Figure 5.7: The Gene-Disease Heatmap for a list of genes belonging to the voltage-gated potassium channel family

These results can also be visualized as a Gene-Disease Class Heatmap by setting the argument type to Heatmap and class to DiseaseClass (Figure 5.8). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.

plot( results, type="Heatmap",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 5.8: The Gene-Disease Class Heatmap for a list of genes belonging to the voltage-gated potassium channel family

Alternative, set the arguments type to Network and class to DiseaseClass to generate a Gene-Disease Class Network (Figure 5.9).

plot( results, type="Network",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 5.9: The Gene-Disease Class Network for a list of genes belonging to the voltage-gated potassium channel family

5.1.2.2 Exploring the evidences associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = myListOfGenes, 
                       database = "TEXTMINING_HUMAN", verbose  = TRUE)

## Your query has 28 pages.

To visualize the results set the argument class=Points (Figure 5.10).

plot(results, type="Points",   interactive=T, limit=10000)

Figure 5.10: The Evidences plot for a list of genes belonging to the voltage-gated potassium channel family

5.1.2.3 Exploring the Clinical trials associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = c("MMP1", "MMP2", "MMP3", "MMP9", "MMP10"), 
                       database = "CLINICALTRIALS", verbose  = TRUE )

## Your query has 13 pages.

To visualize the results set the argument class=Points and the argument reference_type to NCTID (Figure 5.11).

plot(results, type="Points",  reference_type= "NCTID",  interactive=T, limit=10000)

Figure 5.11: The Evidences plot for a list of MMPs in clinical trials

5.1.3 Filtering chemical

You can search GDAs by chemicals by specifying a chemical identifier using the chemical filter in the gene2disease function. Table 5.5 shows the diseases associated to LEPR associated to metformin.

results <- gene2disease( gene = "LEPR", vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN", 
                       chemical = "CHEMBL_CHEMBL1431" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  4

tab <- results@qresult
tab <-tab%>% dplyr::select(chemical_name, gene_symbol, disease_name,  score, normalized_score) %>%     mutate(normalized_score = round(normalized_score, 2))  %>%
    arrange(desc(score))
knitr::kable(tab, caption = "GDAs for LEPR and metformin")

Table 5.5: GDAs for LEPR and metformin
chemical_name	gene_symbol	disease_name	score	normalized_score
Metformin	LEPR	Hyperinsulinism	0.85	0.55
Metformin	LEPR	Increased insulin level	0.35	0.23
Metformin	LEPR	Steatohepatitis	0.35	0.23
Metformin	LEPR	Fatty degeneration	0.20	0.13

5.1.3.1 Retrieving the chemicals associated to a gene

For GDAs that have a chemical annotation, we can perform a query with a gene, or list of genes, to retrieve the chemicals annotated to this associations.

results <- gene2chemical( gene  = "PDGFRA", 
                        vocabulary = "HGNC",
                        database = "TEXTMINING_HUMAN" , 
                        score = c(0.8,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.8-1 
##  . Term:        PDGFRA 
##  . Results:  15

tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type == "PMID") %>%   dplyr::select(disease_name, chemical_name, chemical_effect,sentence,  reference, pmYear)
tab <- tab %>% dplyr::rename(  Disease = disease_name, 
                             Chemical = chemical_name, `Chemical effect` =  chemical_effect,
                             Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid )  )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Selection of chemicals associated to PDGFRA" )

Table 5.6: Selection of chemicals associated to PDGFRA
Disease	Chemical	Chemical effect	Sentence	pmid	Year
Gastrointestinal Stromal Tumors	Imatinib	therapeutic	Imatinib is the first-line treatment for advanced gastrointestinal stromal tumors (GISTs) harboring KIT or PDGFRA mutations.	41559406	2026
Gastrointestinal Stromal Tumors	Imatinib	therapeutic	First-line imatinib therapy can be employed to treat GISTs harboring mutations in the tyrosine-protein kinase KIT (KIT) and platelet-derived growth factor receptor α (PDGFRα) genes to reduce the tumor size to resectable levels and minimize surgical risks.	40276085	2025
Gastrointestinal Stromal Tumors	Imatinib	therapeutic	Patients with unresectable or metastatic GISTs harboring the D842V mutation in the PDGFRA gene have a poor prognosis due to intrinsic resistance to imatinib and all other approved tyrosine kinase inhibitors.	40349140	2025
Gastrointestinal Stromal Tumors	Imatinib	therapeutic	In NF1-associated GIST, KIT and PDGFRA mutations are frequently absent and imatinib is ineffective.	39811049	2025
Gastrointestinal Stromal Tumors	Ripretinib	therapeutic	Ripretinib, a broad-spectrum inhibitor of the KIT and PDGFRA receptor tyrosine kinases, is designated as a fourth-line treatment for gastrointestinal stromal tumor (GIST).	38973363	2024
Gastrointestinal Stromal Tumors	Avapritinib	therapeutic	Avapritinib is the only drug for adult patients with PDGFRA exon 18 mutated unresectable or metastatic gastrointestinal stromal tumor (GIST).	38803186	2024
Gastrointestinal Stromal Tumors	Imatinib	therapeutic	In NF1-associated GIST, KIT and PDGFRA mutations are frequently absent and imatinib is ineffective.	37122520	2023
Gastrointestinal Stromal Tumors	IMATINIB MESYLATE	therapeutic	Most gastrointestinal stromal tumors (GISTs) express constitutively activated mutant isoforms of KIT or kinase platelet-derived growth factor receptor alpha (PDGFRA) that are potential therapeutic targets for imatinib mesylate.	37890277	2023
Gastrointestinal Stromal Tumors	Avapritinib	therapeutic\|therapeutic	Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation.	36155864	2023
Gastrointestinal Stromal Tumors	Imatinib	therapeutic\|therapeutic	Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation.	36155864	2023

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=10000)

Figure 5.12: The Gene-Chemical Network for PDGFRA

5.2 Searching by disease

The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), ID is the identifier in the vocabulary, and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

5.2.1 Single disease

In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.8 to 1.

results <- disease2gene( disease  = "UMLS_C0036341", 
                          database = "CURATED",
                          score    = c( 0.5,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0036341 
##  . Results:  152

In Table 5.7, the top 10 genes associated to UMLS CUI C0036341.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score","normalized_score", "yearInitial", "yearFinal")] )  %>%
    mutate(normalized_score = round(normalized_score, 2)) %>%
  arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top 10 genes associated to Schizophrenia")

Table 5.7: Top 10 genes associated to Schizophrenia
gene_symbol	disease_name	score	normalized_score	yearInitial	yearFinal
DRD3	Schizophrenia	1.35	0.87	1992	2003
HTR2A	Schizophrenia	1.20	0.77	2004	2008
DRD2	Schizophrenia	1.15	0.74	2000	2011
COMT	Schizophrenia	1.15	0.74	2005	2010
MTHFR	Schizophrenia	1.15	0.74	2006	2009
AKT1	Schizophrenia	1.05	0.68	2004	2011
TNF	Schizophrenia	1.05	0.68	2006	2024
DISC1	Schizophrenia	1.05	0.68	2010	2011
NRXN1	Schizophrenia	1.05	0.68	2011	2018
CNTNAP2	Schizophrenia	1.05	0.68	2011	2011

5.2.1.1 Visualizing the genes associated to a single disease

There are two options to visualize the results from searching a single disease: a Gene-Disease Network showing the genes related to the disease of interest (Figure 5.13), and a Disease-Protein Class Network with the genes grouped grouped by the the Drug Target Ontology Protein Class (Figure 5.14).

Figure 5.13 shows the default Gene-Disease Network for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.

plot ( results,
       prop = 10, interactive=TRUE)

Figure 5.13: The Gene-Disease Network for genes associated to Schizophrenia

Alternatively, in the Disease-Protein Class Network, genes are grouped by the the Drug Target Ontology Protein Class (Figure 5.14). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class argument ProteinClass. The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are G-protein coupled receptors. Notice again that not all genes have annotations to Protein classes.

plot( results,
      class="ProteinClass",
      interactive=TRUE)

Figure 5.14: The Protein Class-Disease Network for genes associated to Schizophrenia

The same results are obtained when querying DISGENET with the MeSH identifier for Schizophrenia (D012559).

results <- disease2gene( disease  = "MESH_D012559",  
                          database = "CURATED",
                          score    = c( 0.5,1  ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        MESH_D012559 
##  . Results:  152

The same results are obtained when querying DISGENET with the OMIM identifier for Schizophrenia (181500).

results <- disease2gene( disease  = "OMIM_181500",  
                          database = "CURATED",
                          score    = c(  0.5,1  ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        OMIM_181500 
##  . Results:  152

The same results are obtained when querying DISGENET with the ICD9-CM identifier for Schizophrenia (295).

results <- disease2gene( disease  = "ICD9CM_295",  
                          database = "CURATED",
                          score    = c( 0.5,1  ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        ICD9CM_295 
##  . Results:  152

The same results are obtained when querying DISGENET with the NCI identifier for Schizophrenia (C3362).

results <- disease2gene( disease  = "NCI_C3362", 
                          database = "CURATED",
                          score    = c(  0.5,1  ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        NCI_C3362 
##  . Results:  152

The same results are obtained when querying DISGENET with the DO identifier for Schizophrenia (5419).

results <- disease2gene( disease  = "HPO_HP:0100753", 
                          database = "CURATED",
                         score    = c(  0.5,1  ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:        HPO_HP:0100753 
##  . Results:  152

5.2.1.2 Exploring the evidences associated to a disease

To explore the evidences supporting the associations for Schizophrenia use the function disease2evidence.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           type = "GDA",
                          database = "CURATED",
                          score    = c( 0.6,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.6-1 
##  . Term:        UMLS_C0036341 
##  . Results:  107

A selection of evidences is shown in Table 5.8.

tab <- results@qresult
tab <-tab[tab$reference_type == "PMID" & tab$pmYear > 2013 & tab$source =="PSYGENET", ] 
tab <- tab[ order(-tab$pmYear), c("gene_symbol","source", "associationType", "sentence", "reference", "pmYear")][1:5,]
tab <- tab %>% dplyr::rename(Gene = gene_symbol,  Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association for Schizophrenia" )

Table 5.8: Evidences supporting the association for Schizophrenia
Gene	source	associationType	Sentence	pmid	Year
ERBB4	PSYGENET	Biomarker	These findings suggest that some regions of NRG1 and ErbB4 are functionally involved in biological processes that underlie some of the phenotypic manifestations of schizophrenia.	25142529	2014
ERBB4	PSYGENET	Biomarker	ERBB4 has previously been associated with schizophrenia; further, it is located within an established schizophrenia linkage locus and within a linkage locus for a smoker phenotype identified in this sample.	23752247	2014
CHRNA7	PSYGENET	Biomarker	The ?7 nicotinic acetylcholine receptor gene (CHRNA7) is linked to schizophrenia.	25056953	2014
ERBB4	PSYGENET	Biomarker	Moreover, we demonstrate that Gomafu binds directly to the splicing factors QKI and SRSF1 (serine/arginine-rich splicing factor 1) and dysregulation of Gomafu leads to alternative splicing patterns that resemble those observed in SZ for the archetypal SZ-associated genes DISC1 and ERBB4.	23628989	2014
GRM5	PSYGENET	Biomarker	We posit that brain region- and cell type-specific alterations exist in mGluR5 in schizophrenia and depression, with evidence pointing towards altered regulation of this receptor in psychiatric pathology.	24472577	2014

Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           gene = c("DRD2", "DRD3"),
                           type = "GDA",
                          database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        UMLS_C0036341 
##  . Results:  497

The more recent papers are shown in the Table 5.9.

tab <- results@qresult
tab <-  tab %>%
    filter(reference_type == "PMID") %>%
    select(gene_symbol, associationType, reference, sentence, pmYear) %>% arrange(desc(pmYear)) %>% head(10)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between C0036341 & DRD2,DRD3" )

Table 5.9: Evidences supporting the association between C0036341 & DRD2,DRD3
Gene	associationType	pmid	Sentence	Year
DRD3	Therapeutic Target	41588897	Novel antipsychotic drugs with partial agonism at D2 and D3 receptors improve positive and negative schizophrenia symptoms, as well as cognitive symptoms, more effectively than second- generation antipsychotic drugs.	2026
DRD2	Epigenomic Alterations	41720812	In PSD95, mean methylation levels were higher in the CSF than in the blood of patients with SZ, whereas no difference was detected in the blood between SZ and Co. For MAPT and DRD2, no significant differences in mean methylation rates were observed between groups.	2026
DRD2	Pharmacogenomics	41862122	In our analysis of the SZ population, mutations in the DRD2 gene were most frequently associated with clozapine and risperidone treatment response, whereas HTR2A mutations were more commonly linked to olanzapine response.	2026
DRD2	Susceptibility Mutation	41505003	The role of possible DRD2 genotype-related striatal changes in prefrontal cortex dysfunction in schizophrenia was suggested.	2026
DRD3	Susceptibility Mutation	39993143	For DRD3 polymorphisms, the rs7631540 TC genotype was associated with schizophrenia in the female subgroup.	2025
DRD2	Protective Mutation	40881611	Additionally, we propose that the DRD2 Taq1 A2 allele could offer protection against SUD in certain individuals with schizophrenia, whereas the Taq1 A1 allele may heighten susceptibility to SUD due to impaired dopaminergic reward processing.	2025
DRD2	Epigenomic Alterations	40665271	These results suggest that hypermethylation and low expression of the DRD2 gene may be related to SCZ risk.	2025
DRD2	Therapeutic Target	40056428	Most antipsychotics approved for schizophrenia interact with D2 DA receptors as an important part of their mechanism of action.	2025
DRD2	Protective Mutation	39993143	In addition, the DRD2 rs1800497 genotype GA showed a reduced risk of schizophrenia in the male subgroup and the late-onset subgroup (>27 years of age).	2025
DRD3	Pharmacogenomics	39187246	DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia.	2024

5.2.2 Multiple diseases

The disease2gene function also accepts as input a list of diseases (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 5.10 shows the UMLS CUIs and the corresponding disease names.

Table 5.10: Disease list selected for illustrating the **disease2gene** multiple search
UMLS_CUI	Disease_Name
C0036341	Schizophrenia
C0036341	Alzheimer’s Disease
C0030567	Parkinson Disease
C0005586	Bipolar Disorder

Creating the vector with the list of diseases.

diseasesOfInterest <- paste0("UMLS_",c("C0036341", "C0002395", "C0030567","C0005586"))

In the example, we will search in CURATED data, using a score range of 0.8-1.

results <- disease2gene(
  disease = diseasesOfInterest,
  database = "CURATED",
  score =c(0.6,1),
  verbose  = TRUE )

## Your query has 1 page.

In table 5.11, the top 10 genes associated to the list of diseases.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score",
                           "normalized_score", "yearInitial", "yearFinal")] %>%
    mutate(normalized_score = round(normalized_score, 2)) %>%
    arrange(desc(score), yearInitial)) 
knitr::kable(tab[1:10,], caption = "Top Genes associated to a list of diseases")

Table 5.11: Top Genes associated to a list of diseases
gene_symbol	disease_name	score	normalized_score	yearInitial	yearFinal
SNCA	Parkinson Disease	1.55	1.00	1989	2021
GBA1	Parkinson Disease	1.45	0.94	1987	2021
APP	Alzheimer’s Disease	1.35	0.87	1990	2023
DRD3	Schizophrenia	1.35	0.87	1992	2003
APOE	Alzheimer’s Disease	1.35	0.87	1993	2020
LRRK2	Parkinson Disease	1.35	0.87	1993	2025
PRKN	Parkinson Disease	1.35	0.87	1993	2022
MAPT	Alzheimer’s Disease	1.25	0.81	1993	2020
GRN	Alzheimer’s Disease	1.20	0.77	1993	2020
PARK7	Parkinson Disease	1.20	0.77	2003	2019

5.2.2.1 Visualizing the genes associated to multiple diseases

The default plot of the results of querying DISGENET with a list of diseases produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5.15).

plot( results,
      type = "Network",
      prop = 10, interactive=T)

Figure 5.15: The Gene-Disease Network associated to a list of diseases

To visualize the results as a Gene-Disease Heatmap (Figure 5.16) change the argument class to “Heatmap”. In this plot, the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type="Heatmap",
      limit =65,
      cutoff=0.95, interactive=TRUE)

## [1] "Dataframe of 97 rows has been reduced to 65 rows."

Figure 5.16: The Gene-Disease Heatmap for genes associated to a list of diseases

A third visualization option is a Protein Class-Disease Heatmap (Figure 5.17), in which genes are grouped by protein class. This plot is obtained by setting the class argument to “ProteinClass”. In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.

plot( results,
      class="ProteinClass", type = "Heatmap", interactive=TRUE)

Figure 5.17: The Protein Class-Disease Heatmap for genes associated to a list of diseases

A Protein Class-Disease Network visualization is also possible (Figure 5.18).

plot( results,
      class="ProteinClass", type = "Network", interactive=TRUE)

Figure 5.18: The Protein Class-Disease Network for genes associated to a list of diseases

To explore the evidences supporting the associations, use the function disease2evidence.

results <- disease2evidence( disease  = diseasesOfInterest,
                           type = "GDA",
                           score=c(0.5,1),
                          database = "CURATED" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:       UMLS_C0036341 ... UMLS_C0005586 
##  . Results:  1584

To visualize the results use the argument Points (Figure 5.19).

plot( results,  
      type = "Points", limit=10000 )

Figure 5.19: The Evidences plot for a list of diseases

5.2.3 Filtering by chemical

You can filter the results to find associations that are mentioned in the context of a chemical, like the example below.

results <- disease2gene( disease  = "UMLS_C0678222", chemical = "CHEMBL_CHEMBL83",
                          database = "ALL" , n_pags = 1 )

## Notice that your query has a maximum of 8 pages.
## By indicating n_pags = 1, your query of 8 pages has been reduced to 1 pages.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        UMLS_C0678222 
##  . Results:  100

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score","normalized_score", "chemical_name", "chemicalid")] )%>%mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Breast Carcinoma")

Table 5.12: Top GDAs associated to Breast Carcinoma
gene_symbol	disease_name	score	normalized_score	chemical_name	chemicalid
BRCA2	Breast Carcinoma	1.0	0.65	Tamoxifen	CHEMBL83
ESR1	Breast Carcinoma	1.0	0.65	Tamoxifen	CHEMBL83
TP53	Breast Carcinoma	1.0	0.65	Tamoxifen	CHEMBL83
CHEK2	Breast Carcinoma	1.0	0.65	Tamoxifen	CHEMBL83
ATM	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83
BRCA1	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83
CAV1	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83
CDH1	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83
EGFR	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83
PIK3CA	Breast Carcinoma	0.9	0.58	Tamoxifen	CHEMBL83

5.2.3.1 Retrieving the chemicals associated to a disease

For GDAs that have a chemical annotation, we can perform a query with a disease, or list of disease, to retrieve the chemicals annotated to this associations.

results <- disease2chemical( disease = "UMLS_C0010674", 
                           database = "TEXTMINING_MODELS" ,
                           score = c(0.8,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_MODELS 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0010674 
##  . Results:  38

tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type =="PMID") %>% dplyr::select(gene_symbol, chemical_name,chemical_effect ,sentence, reference, pmYear) 
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                          `Chemical Effect`=chemical_effect ,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top chemicals associated to Cystic Fibrosis" )

Table 5.13: Top chemicals associated to Cystic Fibrosis
Gene	Chemical	Chemical Effect	Sentence	pmid	Year
CFTR	BELNACASAN	other	Breeding this reporter line with CFTRG551D CF ferret resulted in a novel CF model, CFTRint1-eGFP(lsl)/G551D, with disease onset manageable via the administration of CFTR modulator VX770.	39791230	2025
CFTR	Tezacaftor	therapeutic\|therapeutic\|therapeutic	Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease.	39882833	2025
CFTR	Elexacaftor	therapeutic\|therapeutic\|therapeutic	Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease.	39882833	2025
CFTR	Ivacaftor	therapeutic\|therapeutic\|therapeutic	Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease.	39882833	2025
CFTR	Linaclotide	other	These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion.	38869953	2024
CFTR	Tezacaftor	therapeutic	The CFTR modulator Trikafta has markedly improved lung disease for Cystic Fibrosis (CF) patients carrying the common delta F508 (F508del-CFTR) CFTR mutation.	38925289	2024
CFTR	2,6-DIAMINOPURINE	other	The ability of DAP to correct various endogenous UGA nonsense mutations in the CFTR gene and to restore its function in mice, in organoids derived from murine or patient cells, and in cells from patients with cystic fibrosis reveals the potential of such readthrough-stimulating molecules in developing a therapeutic approach.	36641622	2023
CFTR	BICARBONATE	other	CFTR, the cystic fibrosis (CF) gene-encoded epithelial anion channel, has a prominent role in driving chloride, bicarbonate and fluid secretion in the ductal cells of the exocrine pancreas.	35011616	2021
CFTR	Ivacaftor	therapeutic	Ivacaftor is a CFTR potentiator that improves Cl- transport in CF patients with at least 1 copy of the G551D mutation.	30152192	2019
CFTR	Lumacaftor	therapeutic	Activity of lumacaftor is not conserved in zebrafish Cftr bearing the major cystic fibrosis-causing mutation.	32123813	2019

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 5.20: The Disease-Chemical Network associated to Cystic Fibrosis

5.2.3.2 Searching by disease and chemical

The disease2gene function can also be used to retrieve genes mentioned in the context of a specific disease and chemical (Table 5.14)

results <- disease2gene( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN",
                          chemical = "CHEMBL_CHEMBL1009")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        UMLS_C0030567 
##  . Results:  72

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score, normalized_score) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score)) 
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Parkinson and levodopa")

Table 5.14: Top GDAs associated to Parkinson and levodopa
gene_symbol	disease_name	chemical_name	score	normalized_score
SNCA	Parkinson Disease	Levodopa	1.55	1.00
GBA1	Parkinson Disease	Levodopa	1.45	0.94
PRKN	Parkinson Disease	Levodopa	1.35	0.87
LRRK2	Parkinson Disease	Levodopa	1.35	0.87
PINK1	Parkinson Disease	Levodopa	1.20	0.77
MAOB	Parkinson Disease	Levodopa	1.10	0.71
DRD2	Parkinson Disease	Levodopa	1.05	0.68
TH	Parkinson Disease	Levodopa	1.05	0.68
BDNF	Parkinson Disease	Levodopa	1.00	0.65
DDC	Parkinson Disease	Levodopa	1.00	0.65

To visualize the results use the function plot (Figure 5.19)

plot( results, interactive= T )

Figure 5.21: The Gene Disease Chemical Network for a disease and a drug

5.2.3.2.1 Retrieving the chemicals associated to a disease

To retrieve the chemicals mentioned in the GDAs involving a specific disease, we can use the disease2chemical function.

results <- disease2chemical( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN" , score = c(0.5,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0030567 
##  . Results:  270

tab <- results@qresult
tab <-tab%>% dplyr::filter(reference_type == "PMID")  %>% dplyr::select(gene_symbol, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                    `Chemical Effect` = chemical_effect,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid))) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top Chemicals associated to Parkinson" )

Table 5.15: Top Chemicals associated to Parkinson
Gene	Chemical	Chemical Effect	Sentence	pmid	Year
SNCA	L-Acetylleucine	other	These findings highlight the therapeutic potential of NALL for PD by its protective effects on α-synuclein pathology and synaptic function in vulnerable dopaminergic neurons.	41766663	2026
SNCA	Riboflavin	other	Possibly through mitochondrial modulation, riboflavin appeared to reduce α-synuclein aggregation in Parkinson’s disease, increase the number of tyrosine-hydroxylase-positive neurons in Alzheimer’s disease models, enhance neuronal survival in Brown-Vialetto-Van Laere and Huntington’s disease models, and normalize neuronal excitability in ataxia and migraine.	41720188	2026
SNCA	Copper	other	Mounting evidence implicates that copper ions are playing a critical role in PD pathogenesis, particularly in regulating dopaminergic neuron survival and the aggregation dynamics of α-Syn.	41772865	2026
SNCA	CALCIUM	other	Various molecular mechanisms are involved in the pathogenesis of PD, including α-syn aggregation, lysosomal and chaperone-mediated autophagy, mitochondrial dysfunction, and abnormal regulation of calcium homeostasis.	41968682	2026
SNCA	Betulinic Acid	other	Betulinic acid exacerbates biomolecular condensation of α-synuclein: possible role in Parkinson’s disease.	41801138	2026
SNCA	Dopamine	therapeutic	Parkinson’s disease (PD) is characterized by α-synuclein accumulation and dopaminergic neuron degeneration, with dopamine (DA) oxidation emerging as a key pathological driver.	41671379	2026
SNCA	Copper	other	Copper, an essential trace element, plays a role in α-synuclein aggregation and PD pathogenesis.	42076898	2026
SNCA	Dopamine	therapeutic	Parkinson’s disease (PD) is characterized by alpha-synuclein (α-syn) aggregation, dopaminergic (DA) neuron loss, and neuroinflammation.	41629683	2026
SNCA	Acteoside	other\|other	Collectively, these findings demonstrate that the phenylethanoid glycosides VER and ECH can directly interfere with α-syn amyloidogenesis, providing experimental support for the development of α-syn-targeted therapeutic strategies for Parkinson’s disease.	41830769	2026
SNCA	ECHINACOSIDE	other\|other	Collectively, these findings demonstrate that the phenylethanoid glycosides VER and ECH can directly interfere with α-syn amyloidogenesis, providing experimental support for the development of α-syn-targeted therapeutic strategies for Parkinson’s disease.	41830769	2026

To visualize the results use the function plot

plot( results )

Figure 5.22: The Network plot for chemicals associated to Parkinson Disease

5.3 Exploring a GDA timeline

To display the evolution of publications first create a timeline object containing all evidences for a GDA using the timeline function.

results <- timeline( disease  = "UMLS_C0002395", 
                     gene = "APOE",
                          database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      GDA 
##  . Type:        timeline 
##  . Database:     ALL 
##  . Score:        - 
##  . Term:        UMLS_C0002395 - APOE 
##  . Results:  35

To visualize the results use the function plot with the argument Type = "Points".

plot( results, type =  "Points" )

Figure 5.23: The timeline plot of PMIDs for APOE and Alzheimer’s Disease

plot( results, type =  "Points" ,reference_type= "NCTID", interactive = T)

Figure 5.24: The timeline plot of CTs for APOE and Alzheimer’s Disease

5.4 Compute the Cumulative Score for a GDA

results <- historical_score( disease  = "UMLS_C0030567", 
                     gene = "SNCA",
                      database = "ALL", 
                     minYear = 2000,  
                     maxYear = 2024)
results

## Object of class 'DataGeNET.DGN'
##  . Search:      GDA 
##  . Type:        historical-score 
##  . Database:     ALL 
##  . Score:        - 
##  . Term:        UMLS_C0030567 - SNCA 
##  . Results:  26

tab <- unique(results@qresult[  ,c("pmYear", "cumulative_score", "yearly_score")] ) 
knitr::kable(tab, caption = paste0("Cumulative score for ", results@term))

Table 5.16: Cumulative score for UMLS_C0030567 - SNCA
pmYear	cumulative_score	yearly_score
1999	0.05	0.05
2000	0.85	0.85
2001	1.00	0.85
2002	1.00	0.85
2003	1.00	0.85
2004	1.00	0.95
2005	1.00	0.80
2006	1.00	0.90
2007	1.00	0.90
2008	1.10	1.10
2009	1.15	1.10
2010	1.15	1.10
2011	1.15	1.05
2012	1.15	1.00
2013	1.15	1.00
2014	1.25	1.10
2015	1.25	1.10
2016	1.25	1.05
2017	1.25	0.60
2018	1.25	0.95
2019	1.25	1.00
2020	1.25	0.60
2021	1.35	1.15
2022	1.35	0.60
2023	1.45	0.70
2024	1.45	0.65

To visualize the results use the function plot with the argument Type = "Points".

plot( results, type =  "Points" )

Figure 5.25: The historical score plot of PMIDs for SNCA and Parkinson’s Disease

6 Variant-Disease Associations (VDAs)

6.1 Searching by variant

6.1.1 Single variant

The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN, with Type = "variant-disease".

results <- variant2disease( variant= "rs113488022",
                            database = "CURATED",
                            score = c(0.2,1)) 
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0.2-1 
##  . Term:        rs113488022 
##  . Results:  13

The results are shown in Table 6.1.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "normalized_score", "yearInitial", "yearFinal")] ) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Top diseases associated to variant rs113488022")

Table 6.1: Top diseases associated to variant rs113488022
variantid	disease_name	score	normalized_score	yearInitial	yearFinal
rs113488022	Colorectal Carcinoma	0.8	0.67	1993	2024
rs113488022	Non-Small Cell Lung Carcinoma	0.8	0.67	2002	2019
rs113488022	melanoma	0.8	0.67	2002	2018
rs113488022	Papillary thyroid carcinoma	0.8	0.67	2002	2018
rs113488022	Colon Carcinoma	0.7	0.58	2002	2020
rs113488022	Multiple Myeloma	0.7	0.58
rs113488022	RASopathy	0.6	0.50	2011	2018
rs113488022	Nephroblastoma	0.6	0.50
rs113488022	Nongerminomatous Germ Cell Tumor	0.4	0.33	2002	2018
rs113488022	ASTROCYTOMA, LOW-GRADE, SOMATIC	0.4	0.33	2002	2018

6.1.1.1 Visualizing the diseases associated to a single variant

The disgenet2r package offers several options to visualize the results of querying DISGENET for a single variant: a Variant-Disease Network (Figure 6.1) showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network, Figure 6.2). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Variant-Disease Network on a DataGeNET.DGN object (Figure 6.1). In the Variant-Disease Network the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.

plot( results, 
      type = "Network", interactive=T,
      prop  = 10)

Figure 6.1: The Variant-Disease Network for the variant rs113488022

plot(results, class="DiseaseClass" , interactive=T)

Figure 6.2: The Variant-Disease Class Network for the variant rs113488022

6.1.1.2 Exploring the evidences associated to a variant

You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease.

results <- variant2evidence( variant = "rs10795668",
                disease ="UMLS_C0009402",
                       database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs10795668 
##  . Results:  15

The results are shown in table 6.2.

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>% select(associationType, reference, pmYear, sentence) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid=reference) %>% dplyr::arrange(desc(Year))
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting the association between C0009402 & rs10795668")

Table 6.2: Evidences supporting the association between C0009402 & rs10795668
associationType	pmid	Year	Sentence
GeneticVariation	36653562	2023	FinnGen provides genetic insights from a well-phenotyped isolated population.
Susceptibility Mutation	34676053	2021	Increasing risk of CRC was noted for rs10795668 in log-additive model (OR = 1.45, 95% CI: 1.05-1.99, p = 0.023); for rs1035209 in log-additive model (OR = 1.79, 95% CI: 1.18-2.72, p = 0.003); for rs11190164 in log-additive model (OR = 1.67, 95% CI: 1.17-2.38, p = 0.004).
Susceptibility Mutation	30194776	2019	In conclusion, some variants associated with CRC risk (rs10505477, rs6983267, rs10795668 and rs11255841) are also involved in the susceptibility to CRA and specific subtypes.
Susceptibility Mutation	23717594	2013	Results from our case-control study and the followed meta-analysis confirmed the significant association of rs10795668 with CRC risk.
Susceptibility Mutation	23712746	2013	In conclusion, CRC susceptibility variants rs9929218 and rs10795668 may exert some influence in modulating patient’s survival and they deserve to be further tested in additional CRC cohorts in order to confirm their potential as prognosis or predictive biomarkers.
Susceptibility Mutation	22045029	2012	Recent genome-wide association studies have identified single-nucleotide polymorphisms at 16 genetic loci associated with colorectal cancer risk: rs6691170 (1q41), rs10936599 (3q26.2), rs16892766 (8q23.3), rs6983267 (8q24.21), rs10795668 (10p14), rs3802842 (11q23.1), rs11169552 (12q13.13), rs4444235, rs1957636 (14q22.2), rs4779584 (15q13.3), rs9929218 (16q22.1), rs4939827 (18q21.1), rs10411210 (19q13.11), rs961253 and rs4813802 (20p12.3) and rs4925386 (20q13.33).
Susceptibility Mutation	22235025	2012	In conclusion, variants at 10p14 (rs10795668), 11q23.1 (rs3802842) and 15q13.3 (rs4779584) may have a predominant role in predisposition to early-onset CRC.
Susceptibility Mutation	23359760	2012	However, no associations with CRC risk were detected for six other loci (rs9929218, rs10411210, rs12701937, rs7014346, rs6983267, and rs10795668), and one SNP, rs16892766, was not polymorphic in any of the study participants.
Susceptibility Mutation	21351697	2010	Five SNPs (rs6983267, rs4939827, rs3802842, rs4444235, rs10795668) showed an association with colon and rectal cancer.
Susceptibility Mutation	18372905	2008	In addition to the previously reported 8q24, 15q13 and 18q21 CRC risk loci, we identified two previously unreported associations: rs10795668, located at 10p14 (P = 2.5 x 10(-13) overall; P = 6.9 x 10(-12) replication), and rs16892766, at 8q23.3 (P = 3.3 x 10(-18) overall; P = 9.6 x 10(-17) replication), which tags a plausible causative gene, EIF3H.

The results can be visualized using the plot function with the argument Points. This will show the number of publications per year associated to this variant. It is important to set the parameter limit to 10,000 in order to include all the results in the plot.

results <- variant2evidence( variant = "rs1800629",
                       database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs1800629 
##  . Results:  1773

plot( results,  
      type = "Points", limit=10000 )

Figure 6.3: The Evidence plot for the variant rs1800629

6.1.2 Multiple variants

The variant2disease function retrieves the information in DISGENET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database. By default, variant2disease function uses as source database CURATED.

results <- variant2disease(
         variant  = c("rs121913013", "rs1060500621",
              "rs199472709", "rs72552293",
              "rs74315445", "rs199472795"),
         database = "ALL")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        variant-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:       rs121913013 ... rs199472795 
##  . Results:  21

In table 6.3, the top 10 diseases associated to the list of variants.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score","normalized_score", "yearInitial", "yearFinal")] )%>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to the list of variants")

Table 6.3: Top diseases associated to the list of variants
variantid	disease_name	score	normalized_score	yearInitial	yearFinal
rs199472709	Romano-Ward Syndrome	0.6	0.50	1993	2022
rs74315445	LONG QT SYNDROME 5	0.6	0.50	1993	2022
rs199472795	Romano-Ward Syndrome	0.6	0.50	1993	2022
rs74315445	Jervell And Lange-Nielsen Syndrome 2	0.6	0.50	1993	2011
rs72552293	Brugada Syndrome 2	0.6	0.50	1993	2007
rs74315445	Jervell-Lange Nielsen Syndrome	0.5	0.42	1993	2015
rs74315445	Long QT Syndrome	0.5	0.42	1997	2014
rs199472795	Long QT Syndrome	0.4	0.33	2000	2021
rs199472709	Beckwith-Wiedemann Syndrome	0.4	0.33	1993	2020
rs121913013	Cardiomyopathy, Dilated, 1BB	0.4	0.33	2007	2020

6.1.2.1 Visualizing the diseases associated to multiple variants

The results of querying DISGENET with a list of variants can be visualized as a Variant-Disease Network (Figure 6.4), as a Variant-Gene-Disease Network (Figure 6.5), as Variant-Disease Heatmap (Figure 6.6), as Variant-Disease Class Network (Figure 6.7) and as a Variant-Disease Class Heatmap (Figure 6.8).

plot( results,
      type = "Network", interactive=T)

Figure 6.4: The Variant-Disease Network for a list of variants

To obtain the Variant-Gene-Disease Network (Figure 6.5), change the showGenes argument to “TRUE”.

plot( results,
      type = "Network", 
      showGenes= T,
      interactive=T)

Figure 6.5: The Variant-Gene-Disease Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network by changing the type argument to Heatmap (Figure 6.6).

plot( results,
      type = "Heatmap",
      prop = 10, interactive = TRUE, nchar=45)

Figure 6.6: The Variant-Disease Heatmap for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network by changing the class argument to DiseaseClass (Figure 6.7).

plot( results,
      class = "DiseaseClass", interactive=T)

Figure 6.7: The Variant-Disease Class Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Heatmap by changing the type argument to Heatmap (Figure 6.8).

plot( results,  type = "Heatmap",
      class = "DiseaseClass", interactive=T)

Figure 6.8: The Variant-Disease Class Heatmap for a list of variants

6.2 Searching by disease

6.2.1 Single disease

The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  172

In Table 6.4, the variants associated to Timothy syndrome according to ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score","normalized_score", "yearInitial", "yearFinal")] ) %>%  mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = " Variants associated to Timothy syndrome according to ClinVar")

Table 6.4: Variants associated to Timothy syndrome according to ClinVar
variantid	disease_name	score	normalized_score	yearInitial	yearFinal
rs79891110	Timothy syndrome	0.7	0.58	1993	2016
rs786205748	Timothy syndrome	0.6	0.50	1993	2020
rs549476254	Timothy syndrome	0.6	0.50	1993	2019
rs786205753	Timothy syndrome	0.6	0.50	1993	2019
rs80315385	Timothy syndrome	0.6	0.50	1993	2015
rs797044881	Timothy syndrome	0.5	0.42	1993	2021
rs786205745	Timothy syndrome	0.5	0.42	1993	2018
rs374528680	Timothy syndrome	0.5	0.42	1993	2015
rs199473391	Timothy syndrome	0.4	0.33	1993	2023
rs764212214	Timothy syndrome	0.4	0.33	1993	2022

The results can be further restricted to keep variants predicted to be deleterious by SIFT and PolyPhen scores, by passing ranges of these scores to the function, using sift and polyphen arguments, like in the example below. Remember that genetic variants with SIFT scores smaller than 0.05 are predicted to be deleterious, while values of PolyPhen greater than 0.908 are classified as Probably Damaging.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR", sift = c(0,0.05), polyphen = c(0.9,1) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  94

In Table 6.5, the deleterious variants associated to Timothy syndrome repored in ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "normalized_score", "polyphen_score", "sift_score", "yearInitial", "yearFinal")] ) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Deleterious variants associated to Timothy syndrome according to ClinVar")

Table 6.5: Deleterious variants associated to Timothy syndrome according to ClinVar
variantid	disease_name	score	normalized_score	polyphen_score	sift_score	yearInitial	yearFinal
rs79891110	Timothy syndrome	0.7	0.58	1.000	0.00	1993	2016
rs786205748	Timothy syndrome	0.6	0.50	1.000	0.00	1993	2020
rs549476254	Timothy syndrome	0.6	0.50	0.999	0.00	1993	2019
rs786205753	Timothy syndrome	0.6	0.50	0.999	0.00	1993	2019
rs80315385	Timothy syndrome	0.6	0.50	1.000	0.00	1993	2015
rs797044881	Timothy syndrome	0.5	0.42	1.000	0.00	1993	2021
rs786205745	Timothy syndrome	0.5	0.42	1.000	0.01	1993	2018
rs199473391	Timothy syndrome	0.4	0.33	1.000	0.00	1993	2023
rs755846732	Timothy syndrome	0.4	0.33	1.000	0.00	1993	2021
rs761966966	Timothy syndrome	0.4	0.33	1.000	0.00	1993	2019

6.2.1.1 Visualizing the variants associated to a single disease

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network (Figure 6.9).

plot( results,     
      type = "Network", interactive=T)

Figure 6.9: The Variant-Disease Network for a single disease

The Variant-Disease Network can be displayed as a Variant-Disease-Gene Network, by setting the showGenes parameter to TRUE (Figure 6.10).

plot( results, 
      type = "Network",
      showGenes = T)

Figure 6.10: The Variant-Gene-Disease Network for a single disease

6.2.1.2 Explore the evidences associated to a single disease

To explore the evidences supporting the VDAs for Timothy syndrome, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Timothy syndrome.

results <- disease2evidence( disease  = "UMLS_C1832916",
                           type = "VDA",
                          database = "ALL",
                          score    = c( 0.5,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:        UMLS_C1832916 
##  . Results:  52

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>%
    select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear)) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting associations")

Table 6.6: Evidences supporting associations
pmid	associationType	Year	Sentence
40568156	Causal Mutation	2025	Most TS cases are caused by a de novo single amino acid substitution G406R in the CACNA1C gene that encodes the pore-forming subunit of the voltage-gated L-type calcium channel CaV1.2.
39420001	Causal Mutation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
38826393	Causal Mutation	2024	Timothy syndrome patients were first identified as having a cardiac presentation of Long QT and syndactyly of the fingers and/or toes, and an identical variant in CACNA1C , Gly406Arg.
38968219	Causal Mutation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
37271119	Causal Mutation	2023	Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS.
36523353	Susceptibility Mutation	2022	TS showed a high degree of genetic homogeneity, as the p.Gly406Arg mutation either in exon 8 or exon 8A alone was responsible for 70% of the cases.
36347939	Causal Mutation	2022	A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.
36162529	Causal Mutation	2022	Individuals with Timothy Syndrome (TS), a genetic disorder caused by CaV1.2 L-type Ca2+ channel (LTCC) gain-of function mutations, such as G406R, exhibit social deficits, repetitive behaviors, and cognitive impairments characteristic of ASD that are phenocopied in TS2-neo mice expressing G406R.
33797204	Causal Mutation	2021	In 2015, a variant in CACNA1C (p.R518C) was reported to cause cardiac-only Timothy syndrome, a genetic disorder with a mixed phenotype of congenital heart disease, hypertrophic cardiomyopathy (HCM), and LQTS that lacked extra-cardiac features.
32437834	Causal Mutation	2020	Timothy syndrome (TS) is a neurodevelopmental disorder caused by mutations in the pore-forming subunit α11.2 of the L-type voltage-gated Ca2+-channel Cav1.2, at positions G406R or G402S.

If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene.

results <- disease2evidence( disease  = "UMLS_C1832916",
                   gene = "775", vocabulary = "ENTREZ",
                   type = "VDA",  database = "TEXTMINING_HUMAN",
                   score    = c( 0.5,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.5-1 
##  . Term:        UMLS_C1832916 
##  . Results:  23

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID")%>%
    select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear))%>% head(10)

results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Selection of evidences supporting associations between C0036341 & CACNA1C")

Table 6.7: Selection of evidences supporting associations between C0036341 & CACNA1C
pmid	associationType	Year	Sentence
40568156	Causal Mutation	2025	Most TS cases are caused by a de novo single amino acid substitution G406R in the CACNA1C gene that encodes the pore-forming subunit of the voltage-gated L-type calcium channel CaV1.2.
39420001	Causal Mutation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
38826393	Causal Mutation	2024	Timothy syndrome patients were first identified as having a cardiac presentation of Long QT and syndactyly of the fingers and/or toes, and an identical variant in CACNA1C , Gly406Arg.
38968219	Causal Mutation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
37271119	Causal Mutation	2023	Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS.
36523353	Susceptibility Mutation	2022	TS showed a high degree of genetic homogeneity, as the p.Gly406Arg mutation either in exon 8 or exon 8A alone was responsible for 70% of the cases.
36347939	Causal Mutation	2022	A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS.
36162529	Causal Mutation	2022	Individuals with Timothy Syndrome (TS), a genetic disorder caused by CaV1.2 L-type Ca2+ channel (LTCC) gain-of function mutations, such as G406R, exhibit social deficits, repetitive behaviors, and cognitive impairments characteristic of ASD that are phenocopied in TS2-neo mice expressing G406R.
33797204	Causal Mutation	2021	In 2015, a variant in CACNA1C (p.R518C) was reported to cause cardiac-only Timothy syndrome, a genetic disorder with a mixed phenotype of congenital heart disease, hypertrophic cardiomyopathy (HCM), and LQTS that lacked extra-cardiac features.
32437834	Causal Mutation	2020	Timothy syndrome (TS) is a neurodevelopmental disorder caused by mutations in the pore-forming subunit α11.2 of the L-type voltage-gated Ca2+-channel Cav1.2, at positions G406R or G402S.

6.2.2 Multiple diseases

results <- disease2variant(
              disease = paste0("UMLS_",c("C3150943",  "C1859062", "C1832916", "C4015695")),
              database = "CURATED", 
              score = c(0.5, 1) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-variant 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:       UMLS_C3150943 ... UMLS_C4015695 
##  . Results:  160

Table 6.8 shows the variants associated to a list of Long QT syndromes in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score","normalized_score", "yearInitial", "yearFinal")] ) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
tab[is.na(tab)] <- ""
knitr::kable(tab[1:10,], caption = "Variants associated to a list of Long QT syndromes")

Table 6.8: Variants associated to a list of Long QT syndromes
variantid	disease_name	score	normalized_score	yearInitial	yearFinal
rs137854601	LONG QT SYNDROME 3	0.7	0.58	1993	2022
rs121912507	Long Qt Syndrome 2	0.7	0.58	1993	2022
rs137854600	LONG QT SYNDROME 3	0.7	0.58	1993	2022
rs79891110	Timothy syndrome	0.7	0.58	1993	2016
rs199472916	Long Qt Syndrome 2	0.7	0.58
rs76420733	Long Qt Syndrome 2	0.6	0.50	1990	2022
rs199473099	LONG QT SYNDROME 3	0.6	0.50	1991	2015
rs199473435	Long Qt Syndrome 2	0.6	0.50	1993	2023
rs121912508	Long Qt Syndrome 2	0.6	0.50	1993	2023
rs199472899	Long Qt Syndrome 2	0.6	0.50	1993	2023

6.2.2.1 Visualizing the variants associated to multiple diseases

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 6.11), by changing the class argument from “Network” to “Heatmap”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 6.11: The Variant-Disease Network for a list of diseases

The results can be visualized as a Heatmap (Figure 6.12).

plot( results,
      type = "Heatmap", 
      limit = 100, 
      interactive=T)

Figure 6.12: The Variant-Disease Heatmap for a list of diseases

6.3 Searching by gene

results <- gene2vda(
              gene = "APP",
              database = "CURATED" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        APP 
##  . Results:  15

Table 6.9 shows the top variants associated to the APP gene in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "gene_symbols", "disease_name","score", "normalized_score", "yearInitial", "yearFinal")] ) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Top variants associated to APP")

Table 6.9: Top variants associated to APP
variantid	gene_symbols	disease_name	score	normalized_score	yearInitial	yearFinal
rs63750264	APP	Alzheimer’s Disease	0.7	0.58	1991	2020
rs63750579	APP	Alzheimer’s Disease	0.6	0.50	1990	2020
rs63750579	APP	CEREBRAL AMYLOID ANGIOPATHY, APP-RELATED	0.6	0.50	1990	2019
rs63749964	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1991	2020
rs63750264	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1991	2020
rs63750066	APP	Alzheimer’s Disease	0.6	0.50	1992	2020
rs63750671	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1992	2020
rs63751039	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1992	2020
rs63750066	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1993	2020
rs63750973	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	0.50	1993	2020

6.3.1 Visualizing the variant-disease associations retrieved for a gene

The results of querying DISGENET variant information with a gene can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 6.13), if the input is a list of genes, by changing the class argument from Network to Heatmap. The genes can be shown by setting the showGenes argument to “TRUE”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 6.13: The Variant-Disease Network for a gene

6.3.2 Filtering by chemical

6.3.2.1 Searching by variant and chemical

results <- variant2disease( variant   = "rs121434568",
                          database = "TEXTMINING_HUMAN",
                          chemical = "CHEMBL_CHEMBL1173655")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        rs121434568 
##  . Results:  6

Table 6.10 shows the VDAs associated to rs121434568 and afatinib.

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score , normalized_score) %>% mutate(normalized_score = round(normalized_score, 2)) %>% dplyr::arrange(desc(score))

knitr::kable(tab, caption = "VDAs associated to rs121434568 and afatinib")

Table 6.10: VDAs associated to rs121434568 and afatinib
variantid	disease_name	chemical_name	score	normalized_score
rs121434568	Carcinoma of lung	Afatinib	0.7	0.58
rs121434568	Adenocarcinoma of lung (disorder)	Afatinib	0.7	0.58
rs121434568	Non-Small Cell Lung Carcinoma	Afatinib	0.4	0.33
rs121434568	Malignant neoplasm of lung	Afatinib	0.3	0.25
rs121434568	Dyspnea	Afatinib	0.1	0.08
rs121434568	Coughing	Afatinib	0.1	0.08

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 6.14: VDAs associated to rs121434568 and afatinib

6.3.2.2 Retrieving the chemicals associated to a variant

The variant2chemical function allows to retrieve the chemicals associated to a variant

results <- variant2chemical( variant =  "rs1801133",
                          database = "TEXTMINING_HUMAN" , score = c(0.3,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.3-1 
##  . Term:        rs1801133 
##  . Results:  19

tab <- results@qresult
tab <-tab%>% dplyr::select( disease_name, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab[1:10, ] %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                        `Chemical Effect`=chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Chemicals associated to rs1801133" )

Table 6.11: Chemicals associated to rs1801133
Disease	Chemical	Chemical Effect	Sentence	pmid	Year
Multiple Sclerosis	VITAMIN B12	therapeutic\|other\|other	The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS).	40929924	2025
Multiple Sclerosis	HOMOCYSTEINE	therapeutic\|other\|other	The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS).	40929924	2025
Multiple Sclerosis	Cyanocobalamin	therapeutic\|other\|other	The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS).	40929924	2025
Folic Acid Deficiency	HOMOCYSTEINE	other	Genetic analysis revealed a significant association between homozygous TT genotype of the MTHFR C677T polymorphism, elevated Hcy levels (20.4 ± 7.07; p=0.001) and vitamin B9 deficiency (4.9±3.9; p=0.001).	39545031	2024
Schizophrenia	Risperidone	therapeutic	C677T Polymorphism in the MTHFR Gene Is Associated With Risperidone-Induced Weight Gain in Schizophrenia.	32714219	2020
Leukopenia	Pemetrexed	toxicity	Therefore, the MTHFR C677T polymorphism could be a predictive factor for leukopenia, neutropenia, nausea, and fatigue toxicities in non-sq NSCLC patients treated with single-agent PEM.	29186089	2017
Folic Acid Deficiency	HOMOCYSTEINE	other	The MTHFR C677T polymorphism, folate deficiency, and B12 deficiency were significantly associated with elevated serum tHcy levels.	28094233	2017
Leukopenia	Methotrexate	toxicity	Patients with MTHFR 677TT and 677CT + 1298AC were associated with lower frequency of 6-MP and MTX dose reduction due to leukopenia (p < 0.05).	23865834	2014
Schizophrenia	HOMOCYSTEINE	other	Folate, homocysteine, interleukin-6, and tumor necrosis factor alfa levels, but not the methylenetetrahydrofolate reductase C677T polymorphism, are risk factors for schizophrenia.	19939410	2010
Coronary Artery Disease	HOMOCYSTEINE	other	The 5,10-methylenetetrahydrofolate reductase gene (MTHFR) 677C–>T polymorphism modifies the risk of coronary artery disease and colon cancer and is related to plasma concentrations of total homocysteine (tHcy).	15447919	2004

To visualize the results use the plot function.

plot(results, 
     type="Network",   
     interactive=T, limit=50)

Figure 6.15: Chemicals associated to rs1801133

7 Associations involving Chemicals

7.1 Retrieving genes, variants, and diseases associated to chemicals

The chemical2gene function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gene( chemical  = "CHEMBL_CHEMBL1009" , database = "ALL" , n_pags = 5)

## Notice that your query has a maximum of 9 pages.
## By indicating n_pags = 5, your query of 9 pages has been reduced to 5 pages.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gene 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL1009 
##  . Results:  120

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol,gene_type , chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Genes associated to levodopa")

Table 7.1: Genes associated to levodopa
gene_symbol	gene_type	chemical_name	pmids_chemical
COMT	protein-coding	Levodopa	21
DRD1	protein-coding	Levodopa	16
DRD3	protein-coding	Levodopa	16
SNCA	protein-coding	Levodopa	15
DRD2	protein-coding	Levodopa	14
PRKN	protein-coding	Levodopa	14
TH	protein-coding	Levodopa	14
GCH1	protein-coding	Levodopa	13
GH1	protein-coding	Levodopa	12
SLC6A3	protein-coding	Levodopa	10

The results can be visualized as a Chemical-Gene Network (Figure 7.1).

plot( results,
      type = "Network", interactive=T)

Figure 7.1: The Chemical-Gene Network for a single chemical

The chemical2disease function allows to retrieve the diseases for a specific chemical, or list of chemicals, and the information cab be extracted from GDAs or VDAs. To specify from where, use the type parameter.

results <- chemical2disease( chemical  = "CHEMBL_CHEMBL1009" , type = "GDA", database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL1009 
##  . Results:  173

tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Diseases associated to levodopa, type GDA", align= "lllc")

Table 7.2: Diseases associated to levodopa, type GDA
diseaseid	disease_name	chemical_name	pmids_chemical
C0030567	Parkinson Disease	Levodopa	194
C0013384	Dyskinetic syndrome	Levodopa	149
C0242422	Parkinsonian Disorders	Levodopa	54
C0393593	Dystonia Disorders	Levodopa	20
C0013421	Dystonia	Levodopa	19
C1851920	Dopa-Responsive Dystonia	Levodopa	11
C0392702	Abnormal involuntary movements	Levodopa	8
C5979810	Motor dysfunction	Levodopa	8
C0033975	Psychotic Disorders	Levodopa	7
C0349204	Nonorganic psychosis	Levodopa	7

plot( results,
      type = "Network",
      interactive=T)

Figure 7.2: The Chemical-Disease Network for a chemical

A DiseaseClass plot is also available.

plot( results,
      type = "Network",
      class = "DiseaseClass",
      interactive=T)

Figure 7.3: The Chemical-Disease Class Network for a chemical

For VDAs

results <- chemical2disease( chemical  = "CHEMBL_CHEMBL1282" , type = "VDA", database =  "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL1282 
##  . Results:  2

tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical)  %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Diseases associated to imiquimod, type VDA",  align= "lllc")

Table 7.3: Diseases associated to imiquimod, type VDA
diseaseid	disease_name	chemical_name	pmids_chemical
C0025202	melanoma	Imiquimod	1
C4721806	Skin Basal Cell Carcinoma	Imiquimod	1

plot( results,
      type = "Network", interactive=T)

Figure 7.4: The Chemical-Disease Network for a chemical

The chemical2variant function allows to retrieve the variants for a specific chemical, or list of chemicals.

results <- chemical2variant( chemical  = "CHEMBL_CHEMBL108", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  40

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, most_severe_consequence, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc")

Table 7.4: VDAs for carbamazepine
variantid	gene_symbols	most_severe_consequence	chemical_name	pmids_chemical
rs1045642	ABCB1	missense_variant	Carbamazepine	8
rs3812718	SCN1A	splice_donor_5th_base_variant	Carbamazepine	6
rs2298771	SCN1A , LOC102724058	missense_variant	Carbamazepine	5
rs1801133	MTHFR	missense_variant	Carbamazepine	4
rs776746	CYP3A5 , ZSCAN25	splice_acceptor_variant	Carbamazepine	4
rs2032582	ABCB1	missense_variant	Carbamazepine	3
rs2234922	EPHX1	missense_variant	Carbamazepine	2
rs2273697	ABCC2	missense_variant	Carbamazepine	2
rs28365083	CYP3A5 , ZSCAN25	missense_variant	Carbamazepine	2
rs28383479	CYP3A5 , ZSCAN25	missense_variant	Carbamazepine	2

The chemical2variant function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2variant( chemical  = "CHEMBL_CHEMBL108", database = "ALL", sift = c(0,0.05), polyphen = c(0.7,1)  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  8

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, sift_score, polyphen_score, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Deleterious VDAs for carbamazepine", align= "llllc")

Table 7.5: Deleterious VDAs for carbamazepine
variantid	gene_symbols	sift_score	polyphen_score	chemical_name	pmids_chemical
rs1045642	ABCB1	0.02	0.998	Carbamazepine	8
rs1043620	HSPA1L, HSPA1A	0.00	0.997	Carbamazepine	1
rs1051740	EPHX1	0.00	0.987	Carbamazepine	1
rs121912438	SOD1	0.00	0.967	Carbamazepine	1
rs140288103	SCN10A	0.00	0.888	Carbamazepine	1
rs211037	GABRG2	0.02	0.977	Carbamazepine	1
rs71428908	SCN9A	0.00	0.995	Carbamazepine	1
rs796052508	GABRG2	0.03	0.997	Carbamazepine	1

plot( results,
      type = "Network", interactive=T)

Figure 7.5: The Chemical-Variant Network for carbamazepine

7.2 Retrieving GDAs and VDAs associated to chemicals

7.2.1 Exploring the GDAs of a chemical

The chemical2gda function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gda( chemical  = "CHEMBL_CHEMBL809", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL809 
##  . Results:  227

tab <- results@qresult

tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score,normalized_score, pmids_chemical) %>% mutate(normalized_score = round(normalized_score, 2))
knitr::kable(tab[1:10,], caption = "GDAs for sertraline ")

Table 7.6: GDAs for sertraline
gene_symbol	disease_name	chemical_name	score	normalized_score	pmids_chemical
SLC6A4	Mental Depression	Sertraline	1.25	0.81	1
IL6	Mental Depression	Sertraline	1.20	0.77	6
BDNF	Mental Depression	Sertraline	1.10	0.71	6
CRP	Acute Coronary Syndrome	Sertraline	1.05	0.68	2
CRP	Inflammation	Sertraline	1.05	0.68	4
CCL2	Inflammation	Sertraline	1.00	0.65	1
IL1B	Inflammation	Sertraline	1.00	0.65	1
USP7	Hao Fountain syndrome (disorder)	Sertraline	1.00	0.65	1
IL10	Inflammation	Sertraline	1.00	0.65	1
ICAM1	Inflammation	Sertraline	1.00	0.65	1

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 7.6: Network for LEPR and metformin

7.2.2 Exploring the VDAs of a chemical

The chemical2vda function allows to retrieve the VDAS for a specific chemical, or list of chemicals.

results <- chemical2vda( chemical  = "CHEMBL_CHEMBL2010601", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL2010601 
##  . Results:  20

The chemical2vda function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2vda( chemical  = "CHEMBL_CHEMBL2010601", 
                         database = "ALL", 
                         sift = c(0,0.05) , polyphen = c(0.9,1)  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL2010601 
##  . Results:  16

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score,normalized_score, pmids_chemical) %>% mutate(normalized_score = round(normalized_score, 2)) 
knitr::kable(tab[1:10,], caption = "VDAs associated ivacaftor")

Table 7.7: VDAs associated ivacaftor
variantid	disease_name	chemical_name	score	normalized_score	pmids_chemical
rs75527207	Cystic Fibrosis	Ivacaftor	0.9	0.75	26
rs78655421	Cystic Fibrosis	Ivacaftor	0.9	0.75	2
rs74503330	Cystic Fibrosis	Ivacaftor	0.8	0.67	1
rs139304906	Cystic Fibrosis	Ivacaftor	0.8	0.67	1
rs368505753	Cystic Fibrosis	Ivacaftor	0.8	0.67	1
rs397508442	Cystic Fibrosis	Ivacaftor	0.5	0.42	1
rs75527207	Lung diseases	Ivacaftor	0.2	0.17	2
rs75527207	Weight Gain	Ivacaftor	0.2	0.17	3
rs75527207	Rhinosinusitis	Ivacaftor	0.1	0.08	1
rs75527207	Inflammation	Ivacaftor	0.1	0.08	1

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 7.7: Network of VDAs

7.2.3 Exploring the GDA evidences of a chemical

The chemical2evidence function allows to retrieve the evidences for the GDAS or VDAs for a specific chemical, or list of chemicals.

results <- chemical2evidence( chemical  = "CHEMBL_CHEMBL1069", type = "GDA" , database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        CHEMBL1069 
##  . Results:  633

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Disease = disease_name, Chemical = chemical_name,  `Chemical Effect` =chemical_effect,    Year=pmYear, Sentence = sentence, pmid = reference)
tab <- tab[ order(-tab$Year),]
tab[1:10, ] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for Valsartan" )

Table 7.8: Evidences for Valsartan
Gene	Disease	Chemical	Sentence	Chemical Effect	pmid	Year
NPPB	Heart failure	Valsartan	In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group.	therapeutic\|therapeutic\|therapeutic	41335448	2026
NPPB	Congestive heart failure	Valsartan	In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group.	therapeutic\|therapeutic\|other	41335448	2026
NPPB	Chagas Disease	Valsartan	In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group.	other\|other\|other	41335448	2026
STING1	Diabetes Mellitus, Non-Insulin-Dependent	Valsartan	A variety of inhibitors, including small-molecule compounds (fenofibrate and nicotinamide riboside), proteins (proprotein convertase subtilisin/kexin type 9 monoclonal antibody, Metrnl, Brahma-related gene 1, and irsin, interferon-stimulated gene 15), natural products (rosavin and spermidine), probiotics (ZBiotics and garlic-derived exosomes-like nanoparticles), compound drugs (sacubitril/valsartan), and nanoparticles (Mito-G and Jumonji domain-containing protein 3 inhibitory nanoparticles), can inhibit STING signal transduction, alleviate glucose dysregulation, improve lipid metabolism in T2DM, and reduce organ damage.	other\|therapeutic\|other\|other\|other\|therapeutic	41161546	2026
CAMK1D	Hypertensive (finding)	Valsartan	CAMK1D and PI3 in low-density neutrophils are associated with the anti-hypertensive effects of valsartan.	other	41628662	2026
CAMK1D	Hypertensive disease	Valsartan	Theses findings highlight CAMK1D and PI3 as LDN-related genes influencing valsartan response in hypertension, offering a foundation for future functional studies.	therapeutic	41628662	2026
PI3	Hypertensive disease	Valsartan	Theses findings highlight CAMK1D and PI3 as LDN-related genes influencing valsartan response in hypertension, offering a foundation for future functional studies.	therapeutic	41628662	2026
CRP	Atherosclerosis	Valsartan	High-sensitivity C-reactive protein (hs-CRP) will be colllected and evaluated at each timepoint	other\|other\|other\|other	NCT06930885	2025
NPPB	Heart Failure, Systolic	Valsartan	Sacubitril/valsartan treatment in HFrEF leads to reduced sST2 and NT-proBNP concentrations with distinct decreasing curves, which are linked to reverse CR through LV-related parameters.	other\|other	39889435	2025
MME	Inflammation	Valsartan	Neprilysin inhibition by Sacubitril/Valsartan improved adverse cardiac remodelling in experimental DbCM through direct regulation of inflammation, highlighting immunomodulation as a novel mechanism underlying established its cardioprotective actions.	other\|toxicity	40369551	2025

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 7.8: Chemicals associated to Parkinson

7.2.4 Exploring the VDA evidences of a chemical

results <- chemical2evidence( chemical  = "CHEMBL_CHEMBL502", type = "VDA" , database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        CHEMBL502 
##  . Results:  5

tab <- results@qresult
tab <-tab %>% dplyr::select(variantid, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                            `Chemical Effect` =chemical_effect,  Year=pmYear, Sentence = sentence, pmid = reference )
tab <- tab[ order(-tab$Year),]
tab  %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for Donepezil" )

Table 7.9: Evidences for Donepezil
variantid	Disease	Chemical	Sentence	Chemical Effect	pmid	Year
rs1080985	Alzheimer’s Disease	Donepezil	The CYP2D6 SNP rs1080985 might be a useful pharmacogenetic marker of the long-term therapeutic response to donepezil in patients with AD.	therapeutic	34120801	2022
rs1080985	Alzheimer’s Disease	Donepezil	Recent data have indicated that the rs1080985 single nucleotide polymorphism (SNP) of the cytochrome P450 (CYP) 2D6 and the common apolipoprotein E (APOE) gene may affect the response to donepezil in patients with Alzheimer’s disease (AD).	therapeutic	25538729	2014
rs1080985	Alzheimer’s Disease	Donepezil	Recent data indicate that the rs1080985 single nucleotide polymorphism of the cytochrome P450 (CYP) 2D6 gene may affect the response to treatment with donepezil in patients with Alzheimer’s disease.	therapeutic	23950644	2013
rs1080985	Alzheimer’s Disease	Donepezil	In a sample of 415 AD cases, we found evidence of association between rs1080985 and response to donepezil after 6 months of therapy (OR [95% CI]: 1.74 [1.01-3.00], p = 0.04).	therapeutic	22465999	2012
rs1080985	Alzheimer’s Disease	Donepezil	The single nucleotide polymorphism rs1080985 in the CYP2D6 gene may influence the clinical efficacy of donepezil in patients with mild to moderate Alzheimer disease (AD).	therapeutic	19738170	2009

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 7.9: Evidence network

8 Disease-Disease Associations

The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).

8.1 Searching DDAs by shared genes

8.1.1 Single disease

To obtain disease-disease associations, use the disease2disease function. This function uses as input a disease, in the same format that in disease2gene, the database to perform the search (by default, CURATED), and the argument relationship, to indicate the type of relationship of the disease pair. If the relationship is set to “has_shared_genes”, arguments such as min_genes, the minimum number of shared genes between the disease(s) of interest, and jg, the Jaccard Index for genes, can be defined. By default min_genes = 0. If the relationship is set to “has_shared_variants”, similar arguments to filter the results of the search can be defined.

The output is a DataGeNET.DGN object that contains the top diseases that share genes with the disease that has been searched.

The DataGeNET.DGN object produced by the disease2disease function also contains the Jaccard Index, also known as the Jaccard similarity coefficient for each disease pair. The Jaccard Coefficient is a similarity metric, computed as the size of the intersection divided by the size of the union of two sample sets, in this case, the genes associates to each disease:

\[\begin{equation*} J(A, B) = \frac{\mid A \cap B \mid}{\mid A \cup B \mid} \end{equation*}\]

We calculate a p value to estimate the significance of the Jaccard coefficient for a list of disease pairs. The p value is estimated using a Fisher exact test. The pvalue column displays the minus logarithm of the p value for the Jaccard Index, and is available for disease-disease associations by shared genes and by shared variants.

results <- disease2disease(
  disease_1 = "UMLS_C0010674", relationship = "has_shared_genes",
  database = "CURATED" ,   min_genes =2 )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:        UMLS_C0010674 
##  . Results:  11

Table 8.1 shows the diseases that share at least a gene with Cystic Fibrosis (UMLS_C0010674) in DISGENET curated.

Table 8.1: Diseases that share genes with Cystic Fibrosis
disease1_Name	disease2_Name	jaccard_genes	shared_genes	pvalue_jaccard_genes
Cystic Fibrosis	Congenital bilateral aplasia of vas deferens	0.31034	9	22.7
Cystic Fibrosis	BRONCHIECTASIS WITH OR WITHOUT ELEVATED SWEAT CHLORIDE 1	0.31034	9	22.7
Cystic Fibrosis	CFTR-related disorder	0.32143	9	23.7
Cystic Fibrosis	Hereditary pancreatitis	0.24324	9	19.0
Cystic Fibrosis	Obstructive azoospermia	0.13793	4	9.6
Cystic Fibrosis	Infertility	0.09091	4	6.7
Cystic Fibrosis	MELANOMA-PANCREATIC CANCER SYNDROME	0.10000	3	6.7
Cystic Fibrosis	VAS DEFERENS, CONGENITAL BILATERAL ABSENCE OF	0.10714	3	7.7
Cystic Fibrosis	Neoplastic Syndromes, Hereditary	0.01299	3	1.7
Cystic Fibrosis	Pancreatitis	0.10345	3	7.1
Cystic Fibrosis	Cardiomyopathies	0.01124	2	1.2

8.1.1.1 Visualizing the diseases associated to a single disease

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share genes with it (Figure 8.1). The node size is proportional to the number of genes associated to each disease.

plot( results, 
      type = "Network",
      interactive=T )

Figure 8.1: The Disease-Disease Network by shared genes for Cystic Fibrosis

8.1.2 Multiple diseases

The function disease2disease can also use as an input a list of diseases in any of the previously described vocabularies. It will retrieve the top diseases that share genes with each of the diseases in the input list.

Table 8.2 shows the disease list selected for illustrating the disease2disease function

Table 8.2: Examples of Congenital diseases
UMLS_CUI	Disease_Name
C0162671	MELAS Syndrome
C0023264	Leigh Disease
C0917796	Optic Atrophy, Hereditary, Leber

diseasesOfInterest <-  paste0("UMLS_", c("C0162671", "C0023264", "C0917796", "C0751651", "C4551714"))
results <- disease2disease(
              disease_1 =  diseasesOfInterest, relationship = "has_shared_genes",
              database = "CURATED",
              min_genes  = 20, 
              order_by = "JACCARD_GENES" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0162671 ... UMLS_C4551714 
##  . Results:  51

Table 8.3 shows the diseases that share at least 20 genes with the diseases of interest.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share at list 20 genes with the diseases of interest")

Table 8.3: Diseases that share at list 20 genes with the diseases of interest
disease1_Name	disease2_Name	jaccard_genes	shared_genes	pvalue_jaccard_genes
Optic Atrophy, Hereditary, Leber	Maternally Inherited Leigh Syndrome	0.77778	28	75
MELAS Syndrome	Optic Atrophy, Hereditary, Leber	0.76190	32	82
Optic Atrophy, Hereditary, Leber	MELAS Syndrome	0.76190	32	82
Optic Atrophy, Hereditary, Leber	MITOCHONDRIAL COMPLEX V (ATP SYNTHASE) DEFICIENCY, MITOCHONDRIAL TYPE 1	0.72222	26	69
Optic Atrophy, Hereditary, Leber	Neuropathy, Ataxia, and Retinitis Pigmentosa	0.72222	26	69
Optic Atrophy, Hereditary, Leber	Flexion contracture of proximal interphalangeal joint of finger	0.70270	26	68
MELAS Syndrome	Maternally Inherited Leigh Syndrome	0.69231	27	70
Optic Atrophy, Hereditary, Leber	Wide spaced nipples (finding)	0.68421	26	67
Optic Atrophy, Hereditary, Leber	Cleft palate and bilateral cleft lip	0.65000	26	65
Optic Atrophy, Hereditary, Leber	Hypoplasia of scrotum	0.65000	26	65

To obtain the network, set the class argument of the plot function to Network(Figure 8.2). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.

plot( results,
      type = "Network",
      interactive=TRUE)

Figure 8.2: The Disease-Disease Network by shared genes for a list of diseases

You can also search for the genes shared between a list of diseases of interest using the disease

results <- disease2disease(
              disease_1 =  diseasesOfInterest,
              disease_2 =  diseasesOfInterest,  relationship = "has_shared_genes",
              database = "CURATED",
              min_genes  = 20, 
              order_by = "JACCARD_GENES" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0162671 ... UMLS_C4551714 
##  . Results:  10

Table 8.4 shows the diseases that share at least 20 genes with the diseases of interest.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share at list 20 genes with the diseases of interest")

Table 8.4: Diseases that share at list 20 genes with the diseases of interest
disease1_Name	disease2_Name	jaccard_genes	shared_genes	pvalue_jaccard_genes
MELAS Syndrome	Optic Atrophy, Hereditary, Leber	0.76190	32	82
Optic Atrophy, Hereditary, Leber	MELAS Syndrome	0.76190	32	82
Optic Atrophy, Hereditary, Leber	Rod-Cone Dystrophy	0.44828	26	56
Rod-Cone Dystrophy	Optic Atrophy, Hereditary, Leber	0.44828	26	56
MELAS Syndrome	Mitochondrial Diseases	0.31933	38	76
Mitochondrial Diseases	MELAS Syndrome	0.31933	38	76
Optic Atrophy, Hereditary, Leber	Mitochondrial Diseases	0.27049	33	62
Mitochondrial Diseases	Optic Atrophy, Hereditary, Leber	0.27049	33	62
Mitochondrial Diseases	Rod-Cone Dystrophy	0.19286	27	40
Rod-Cone Dystrophy	Mitochondrial Diseases	0.19286	27	40

plot( results,
      type = "Network",
      interactive=TRUE)

Figure 8.3: The Disease-Disease Network by shared genes among a list of diseases

8.2 Searching DDAs by shared variants

8.2.1 Single disease

To obtain disease-disease associations via shared genetic variants, use the disease2disease function with the argument relationship equal to “has_shared_variants”, the database to perform the search (by default, CURATED), and the argument min_vars, the minimum number of shared variants between the disease(s) of interest. By default min_vars = 0. The output is a DataGeNET.DGN object that contains the top diseases that share variants with the disease that has been searched.
In the example, we have specified a minimum value for the Jaccard Index computed from the shared variants (jv = 0.05).

results <- disease2disease(
  disease_1 =  c("UMLS_C0011860", "UMLS_C0028754", "UMLS_C0524620"),relationship = "has_shared_variants",
  database = "CURATED", jv = 0.1 )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-variant 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0524620 
##  . Results:  15

Table 8.5 shows the top diseases that share variants with Obesity and NIDDM.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_variants","shared_variants", "pvalue_jaccard_variants")] )
tab <- tab[ order(-tab$shared_variants),]

knitr::kable(tab[1:10,], caption = "Top diseases that share variants with Obesity and NIDDM", row.names = F)

Table 8.5: Top diseases that share variants with Obesity and NIDDM
disease1_Name	disease2_Name	jaccard_variants	shared_variants	pvalue_jaccard_variants
Diabetes Mellitus, Non-Insulin-Dependent	Wolfram Syndrome 1	0.22857	304	330.0
Diabetes Mellitus, Non-Insulin-Dependent	Wolfram-Like Syndrome, Autosomal Dominant	0.24213	300	330.0
Diabetes Mellitus, Non-Insulin-Dependent	DEAFNESS, AUTOSOMAL DOMINANT 6	0.22936	300	330.0
Diabetes Mellitus, Non-Insulin-Dependent	CATARACT 41	0.24267	298	330.0
Diabetes Mellitus, Non-Insulin-Dependent	Hyperinsulinemic hypoglycemia, familial, 1	0.16967	254	330.0
Diabetes Mellitus, Non-Insulin-Dependent	Diabetes Mellitus, Transient Neonatal, 2	0.16587	209	330.0
Diabetes Mellitus, Non-Insulin-Dependent	Hypoglycemia, leucine-induced	0.16843	207	330.0
Diabetes Mellitus, Non-Insulin-Dependent	DIABETES MELLITUS, PERMANENT NEONATAL, 3	0.16465	204	330.0
Obesity	BODY MASS INDEX QUANTITATIVE TRAIT LOCUS 20	0.11268	24	71.2
Obesity	Proopiomelanocortin Deficiency	0.10160	19	61.1

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share variants with it (Figure 8.4). The node size is proportional to the number of variants associated to each disease.

plot( results, 
      type = "Network",
       interactive=F, prop = 0.1 )

Figure 8.4: The Disease-Disease Network by shared variants

8.3 Searching DDAs via semantic relationships

To obtain disease-disease associations via semantic relationships, use the disease2disease function with the argument relationship equal to one of the following types of semantic relations: has_manifestation, has_associated_morphology, manifestation_of, associated_morphology_of, is_finding_of_disease, due_to, has_definitional_manifestation, has_associated_finding, definitional_manifestation_of, disease_has_finding, cause_of, associated_finding_of.

The output is a DataGeNET.DGN object that contains the diseases that have the type of relationship defined in the query with the query disease.

results <- disease2disease(
  disease_1 = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_manifestation", min_sokal = 0.7, order_by = "SOKAL",
  database = "CURATED"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-rela 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0028754 
##  . Results:  25

Table 8.6 shows the diseases associated with Obesity and Diabetes Mellitus non Insulin dependent (NIDDM) by the relation type “has_manifestation”.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","ddaRelation","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab , caption = "Diseases associated with Obesity and NIDDM")

Table 8.6: Diseases associated with Obesity and NIDDM
disease1_Name	disease2_Name	ddaRelation	shared_genes	pvalue_jaccard_genes
Obesity	Obesity, Hyperphagia, and Developmental Delay	has_manifestation	1	1.9
Obesity	Obesity, Hyperphagia, and Developmental Delay	has_manifestation	1	1.6
Obesity	Pseudohypoparathyroidism, Type Ia	has_manifestation	1	1.9
Obesity	BARDET-BIEDL SYNDROME 18	has_manifestation	1	2.2
Diabetes Mellitus, Non-Insulin-Dependent	MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13	has_manifestation	1	1.6
Diabetes Mellitus, Non-Insulin-Dependent	MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13	has_manifestation	1	2.4
Obesity	Pseudohypoparathyroidism Type 1C	has_manifestation	1	1.9
Obesity	Bardet-Biedl syndrome 2	has_manifestation	1	1.7
Obesity	Pseudohypoparathyroidism, Type Ia	has_manifestation	1	1.6
Obesity	LUSCAN-LUMISH SYNDROME	has_manifestation	1	1.9
Obesity	HYPOGONADOTROPIC HYPOGONADISM 27 WITHOUT ANOSMIA	has_manifestation	1	1.6
Obesity	Pseudopseudohypoparathyroidism	has_manifestation	1	1.7
Obesity	Pseudohypoparathyroidism Type 1C	has_manifestation	1	1.6
Obesity	Bardet-Biedl syndrome 4	has_manifestation	1	1.7
Obesity	CORTISONE REDUCTASE DEFICIENCY 2	has_manifestation	1	1.6
Diabetes Mellitus, Non-Insulin-Dependent	MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13	has_manifestation	1	1.5
Obesity	Pseudohypoparathyroidism, Type Ia	has_manifestation	1	1.7
Obesity	SHORT STATURE, BRACHYDACTYLY, IMPAIRED INTELLECTUAL DEVELOPMENT, AND SEIZURES	has_manifestation	1	2.2
Obesity	Pseudopseudohypoparathyroidism	has_manifestation	1	1.6
Obesity	Pseudopseudohypoparathyroidism	has_manifestation	1	1.9
Obesity	BARDET-BIEDL SYNDROME 6	has_manifestation	1	1.7
Obesity	Pseudohypoparathyroidism Type 1C	has_manifestation	1	1.7
Obesity	Bardet-Biedl syndrome 1	has_manifestation	1	1.0
Obesity	CHOPS SYNDROME	has_manifestation	1	1.6
Diabetes Mellitus, Non-Insulin-Dependent	KERATODERMA-ICHTHYOSIS-DEAFNESS SYNDROME, AUTOSOMAL RECESSIVE	has_manifestation	2	4.4

8.4 Searching semantically similar diseases

It is possible to obtain the most similar diseases according to the Sokal-Sneath semantic similarity distance using the the get_similar_diseases function. The disease similarity between concepts is computed using the Sokal-Sneath semantic similarity distance (Sánchez and Batet 2011) on the taxonomic relations provided by the Unified Medical Language System Metathesaurus. Only the relationships of type is-a (which describe the taxonomy in any ontology) are taken into account. The get_similar_diseases function uses as input a disease, and as an optional argument min_sokal, a minimum value for the Sokal distance. By default min_sokal = 0.1.

results <- get_similar_diseases(
  disease  = "UMLS_C0011860",
    min_sokal = 0.6)
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-sokal 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0011860 
##  . Results:  127

In the Table 8.7, the top diseases associated to the disease, by Sokal distance

tab <- unique(results@qresult[  ,c("disease1_Name",  "disease2_Name","sokal")] )
knitr::kable(tab[1:10,], caption = "Diseases semantically similar to NIDDM")

Table 8.7: Diseases semantically similar to NIDDM
disease1_Name	disease2_Name	sokal
Diabetes Mellitus, Non-Insulin-Dependent	Maturity onset diabetes mellitus in young	0.946
Diabetes Mellitus, Non-Insulin-Dependent	Lipoatrophic Diabetes Mellitus	0.945
Diabetes Mellitus, Non-Insulin-Dependent	Familial partial lipodystrophy	0.944
Diabetes Mellitus, Non-Insulin-Dependent	Type 2 diabetes mellitus in obese	0.943
Diabetes Mellitus, Non-Insulin-Dependent	MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 9 (disorder)	0.943
Diabetes Mellitus, Non-Insulin-Dependent	Type 2 diabetes mellitus with diabetic nephropathy	0.943
Diabetes Mellitus, Non-Insulin-Dependent	MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 3 (disorder)	0.943
Diabetes Mellitus, Non-Insulin-Dependent	Maturity-Onset Diabetes of the Young, Type 4	0.943
Diabetes Mellitus, Non-Insulin-Dependent	Maturity-Onset Diabetes of the Young, Type 1	0.943
Diabetes Mellitus, Non-Insulin-Dependent	Diabetes mellitus autosomal dominant type II (disorder)	0.943

9 Disease enrichment

The disease_enrichment function performs a disease enrichment (or over-representation) analysis. It determines whether a user-defined set of genes is statistically significantly associated with a disease gene set in DISGENET.

The function takes as input a list of entities, either genes or variants. They are compared against the gene/variant-disease associations in the selected database (by default, ALL) to determine the diseases associated with the given gene list. The genes can be identified with HGNC, ENSEMBL or Entrez identifiers.

The database parameter allows users to choose which data source to use: CURATED for curated gene-disease associations (the default option), CLINICALTRIALS for associations extracted from ClinicalTrials.gov, or ALL to include all available databases. The number of genes on the selected data source is used as background or universe of the over-representation test.

The common_entities parameter sets the minimum number of entities that must be shared with a disease for it to be considered in the analysis; the default is 1. The max_pvalue parameter sets a threshold for the p-value from the Fisher test (default is 0.05).

9.1 For genes

Below, an example of how to perform a disease enrichment with a list of genes extracted associated to Autism from the Developmental Brain Disorder Gene Database (Gonzalez-Mantilla et al. 2016).

genes <- c("ADNP", "ANKRD11", "ANKRD17",  "ASXL1",  "BCKDK",  "BRSK2",  "CDK13",  "CDK8",  "CHD2",  "CHD7",  "CHD8",  "CLCN2",  "CREBBP",  "CSDE1",  "CTCF",  "CTNNB1",  "DDX3X",  "FOXP1",  "GFER",  "H4C3",  "HNRNPUL2",  "IQSEC2",  "ITSN1",  "JARID2",  "LRP2",  "MARK2",  "MBOAT7",  "MYT1L",  "NAA15",  "NALCN",  "NAV3",  "NEXMIF" ,  "NSD1",  "PHF21A",  "POGZ",  "PRR12",  "QRICH1",  "SCAF1",  "SCN1A",  "SCN2A",  "SETD5",  "SHANK3",  "SIN3A",  "SOX11",  "SOX6",  "TANC2",  "TBCD",  "TCF20" ,  "TCF4",  "TCF7L2",  "TRAF7",  "TRIP12",  "WAC",  "WDR26",  "ZEB2",  "ZMYM2",  "ZNF292",  "ZSWIM6" )
results <- disease_enrichment(
   entities  = genes,
   common_entities = 5,
    vocabulary = "HGNC", database = "CURATED")

## Your query has 1 page.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       ADNP ... ZSWIM6

In the Table 9.1, the top diseases associated to the list of genes.

tab <- unique(results@qresult[  ,c("diseaseName",  "geneRatio", "bgRatio", "oddsRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of genes")

Table 9.1: Diseases significantly associated with the list of genes
diseaseName	geneRatio	bgRatio	oddsRatio	pvalue
Mild intellectual disability	6/58	6/14418	114.58771	0.0e+00
Intellectual Disability	47/58	47/14418	68.60675	0.0e+00
Rare genetic intellectual disability	8/58	8/14418	66.05629	0.0e+00
Neurodevelopmental abnormality	14/58	14/14418	53.77158	0.0e+00
Neurodevelopmental delay	24/58	24/14418	45.96441	0.0e+00
Developmental Disabilities	14/58	14/14418	33.70635	0.0e+00
Delayed speech and language development	9/58	9/14418	32.42526	0.0e+00
Neurodevelopmental Disorders	37/58	37/14418	30.89198	0.0e+00
Rare genetic syndromic intellectual disability	8/58	8/14418	30.23669	1.0e-07
Autosomal dominant non-syndromic intellectual disability	5/58	5/14418	29.13214	6.9e-05

To visualize the results of the enrichment, use the function plot. Use the argument cutoff to set a minimum p value threshold, and the argument limit to reduce the number of records shown (Figure 9.1). By default, the limit=50. The node size is proportional to the number of intersection between the user list and the disease.

plot( results, type = "Enrichment", count =4,  cutoff= 0.05)

Figure 9.1: The Enrichment plot for a list of genes

9.2 For variants

Below, an example of how to perform a disease enrichment with a list of variants extracted from the publication Genomic Landscape and Mutational Signatures of Deafness-Associated Genes (Azaiez et al. 2018).

results <- disease_enrichment(
   entities  =  c("rs80338902","rs397516871","rs368341987","rs375050157","rs111033280","rs140884994","rs201076440","rs111033439","rs1296612982","rs41281314","rs397516875","rs143282422","rs142381713","rs35818432","rs111033225","rs200104362","rs201004645","rs34988750","rs373169422","rs397517356","rs188376296","rs199897298","rs200263980","rs200416912","rs184866544","rs397517344","rs41281310","rs727503066","rs727504710","rs143240767","rs145771342","rs376898963","rs397516878","rs181255269","rs188498736","rs111033192","rs117966637","rs914189193","rs181611778","rs111033194","rs111033248","rs111033262","rs111033333","rs111033529","rs146824138","rs483353055","rs528089082","rs747131589","rs111033536","rs45629132","rs371142158","rs727504654","rs192524347","rs527236122","rs111033186","rs111033287","rs139889944","rs200454015","rs397517328","rs111033275","rs150822759","rs200038092","rs201709513","rs370155266","rs45500891","rs111033196","rs111033360","rs397517322","rs111033524","rs727505166","rs79444516","rs35730265","rs45549044","rs111033361","rs370696868","rs727504309","rs533231493"),
    vocabulary = "DBSNP", database = "CURATED",)

## Your query has 1 page.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       rs80338902 ... rs533231493

In the Table 9.2, the top diseases associated to the list of variants

tab <- unique(results@qresult[  ,c("diseaseName",   "variantRatio", "bgRatio", "oddsRatio", "pvalue")] )
tab <- tab %>% arrange(pvalue)
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of variants")

Table 9.2: Diseases significantly associated with the list of variants
diseaseName	variantRatio	bgRatio	oddsRatio
Usher Syndrome, Type I	26/77	26/1647127	598.6453
USHER SYNDROME, TYPE IIA	23/77	23/1647127	416.2284
Deafness, Autosomal Recessive 1A	16/77	16/1647127	1670.3999
RETINITIS PIGMENTOSA 39	20/77	20/1647127	414.4859
DEAFNESS, AUTOSOMAL RECESSIVE 2	13/77	13/1647127	594.6684
Usher syndrome, type 1A	12/77	12/1647127	653.9667
RETINITIS PIGMENTOSA-DEAFNESS SYNDROME	12/77	12/1647127	649.9078
Usher Syndrome, Type III	12/77	12/1647127	576.0194
Usher Syndrome, Type II	12/77	12/1647127	526.7542
Deafness, Autosomal Dominant 3A	9/77	9/1647127	1927.4827

Figure 9.2 shows the results of the enrichment.

plot( results, type = "Enrichment", count =4,  cutoff= 0.05, nchars = 60)

Figure 9.2: The Enrichment plot for a list of variants

10 Entity Attributes & Metadata

10.1 Gene attributes

The gene2attribute function allows to retrieve the information for a specific gene, or list of genes.

results <- gene2attribute( gene  = "3953", vocabulary = "ENTREZ"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene 
##  . Database:     ALL 
##  . Score:         
##  . Term:        3953

The result shows the the Disease Specificity Index (DSI), and the Disease Pleiotropy Index (DPI) for the gene (Table 10.1).

tab <-results@qresult
knitr::kable(tab, caption = "Gene attributes for LEPR")

Table 10.1: Gene attributes for LEPR
description	geneid	gene_symbol	ensembl_ids	uniprotids	proteinClasses	ncbi_type	numDiseasesAssociatedToGene	numVariantsAssociatedToGene	numChemicals	numPublications	numCTs	firstRef	lastRef	geneDSI	geneDPI	genepLI
leptin receptor	3953	LEPR	ENSG00000116678	P48357	DTO_05007599, DTO , Signaling	protein-coding	626	157	51	1233	34	1966	2026	0.432	0.875	8.86e-05

10.2 Disease attributes & vocabulary mapping

The disease2attribute function allows to retrieve the information for a specific disease

results <- disease2attribute( disease  = "UMLS_C0036341"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0036341 
##  . Results:  12

The results (Table 10.2) show the mappings to different disease vocabularies, and the disease type.

tab <- results@qresult  %>% arrange(desc(vocabulary)) %>% unique()
knitr::kable(tab, caption = "Disease attributes for Schizophrenia")

Table 10.2: Disease attributes for Schizophrenia
vocabulary	code	disease_name	type	diseaseClasses_UMLS_ST	diseaseClasses_HPO	diseaseClasses_DO	diseaseClasses_MSH
UMLS	C0036341	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
OMIM	181500	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
NCI	C3362	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
MSH	D012559	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
MONDO	0005090	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295.90	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295.9	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD10	F20	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD10	F20.9	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
HPO	HP:0100753	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
DO	5419	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)

10.2.1 Retrieving the UMLS CUIs via other vocabularies

It is possible to obtain the CUIs that map to an identifier of interest (example, ICD9CM, MSH, or OMIM) using the the get_umls_from_vocabulary function.

results <- get_umls_from_vocabulary(
            disease  = "MSH_D012559",  vocabulary = "MSH" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        MSH_D012559 
##  . Results:  2

The results are shown in Table 10.3.

tab <-results@qresult
knitr::kable(tab, caption = "Retrieving the UMLS CUI from MeSH", row.names=F)

Table 10.3: Retrieving the UMLS CUI from MeSH
VOCABULARIES	code	disease_name
MSH	D012559	Schizophrenia
UMLS	C0036341	Schizophrenia

10.3 Variant attributes

The variant2attribute function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN with attributes of the variant(s) such as the allelic frequency according to GNOMAD data, the most severe consequence type from the Variant Effect Predictor and the DPI, and DSI.

results <- variant2attribute( variant= "rs113488022")

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant 
##  . Database:     ALL 
##  . Score:         
##  . Term:        rs113488022

The results are shown in table 10.4.

tab <- unique(results@qresult )
tab <- tab %>% dplyr::select(-threeletterID, -oneletterID)
knitr::kable(tab, caption = "Attributes for variant rs113488022")

Table 10.4: Attributes for variant rs113488022
variantid	ref	alt	polyphen_score	chromosome	position	mostSevereConsequences	var_gene_symbol	geneid	geneEnsemblID	gene_symbol	numDiseasesAssociatedToVariant	numChemicals	numPublications	firstRef	lastRef	hgvsc	hgvsp	variantDSI	variantDPI	dbsnpclass	source	exome
rs113488022	A	C	0.958	7	140753336	missense_variant	BRAF	673	ENSG00000157764	BRAF	766	184	3949	1993	2026	ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A	ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu	0.353	0.045	snv
rs113488022	A	G	0.958	7	140753336	missense_variant	BRAF	673	ENSG00000157764	BRAF	766	184	3949	1993	2026	ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A	ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu	0.353	0.045	snv
rs113488022	A	T	0.958	7	140753336	missense_variant	BRAF	673	ENSG00000157764	BRAF	766	184	3949	1993	2026	ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A	ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu	0.353	0.045	snv	GNOMAD	1.4e-06

10.4 Chemical attributes

The chemical2attribute function allows to retrieve the information for a specific chemical, or list of chemicals.

results <- chemical2attribute( chemical  = "CHEMBL_CHEMBL25"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical 
##  . Database:     ALL 
##  . Score:         
##  . Term:         
##  . Results:  5

tab <-results@qresult %>% select(chemID, chemVocabulariesCrossreferences, chemPrefName)
knitr::kable(tab, caption = "Attributes for Acetylsalic acid")

Table 10.5: Attributes for Acetylsalic acid
chemID	chemVocabulariesCrossreferences	chemPrefName
CHEMBL25	CHEMBL_CHEMBL25	Acetylsalicylic acid
CHEMBL25	CHEBI_15365	Acetylsalicylic acid
CHEMBL25	DRUGBANK_DB00945	Acetylsalicylic acid
CHEMBL25	MESH_D001241	Acetylsalicylic acid
CHEMBL25	PUBCHEM_2244	Acetylsalicylic acid

11 Versions

11.1 Get DISGENET data version

get_disgenet_version()

## [1] "{ status : OK , payload :{ apiVersion : 1.9.5 , dataVersion : DISGENET v26.2 , lastUpdate : June 08 2026 , version : DISGENET v26.2 }, httpStatus :200}"

11.2 disgenet2r version

## Version: 1.2.9

12 COPYRIGHT

13 License

disgenet2r is distributed under the GPL-2 license.

References

Azaiez, Hela, Kevin T. Booth, Sean S. Ephraim, Bradley Crone, Elizabeth A. Black-Ziegelbein, Robert J. Marini, A. Eliot Shearer, et al. 2018. “Genomic Landscape and Mutational Signatures of Deafness-Associated Genes.” The American Journal of Human Genetics 103 (4): 484–97. https://doi.org/10.1016/j.ajhg.2018.08.006.

Gonzalez-Mantilla, Andrea J., Andres Moreno-De-Luca, David H. Ledbetter, and Christa Lese Martin. 2016. “A Cross-Disorder Method to Identify Novel Candidate Genes for Developmental Brain Disorders.” JAMA Psychiatry 73 (3): 275–83. https://doi.org/10.1001/jamapsychiatry.2015.2692.

MedBioInformatics Solutions. 2026. “Unlocking Biomedical Knowledge at Scale: Transforming Scientific Literature into Structured Intelligence.” White Paper. Rambla de Cataluña 14, 7, 1, Barcelona, Spain: MedBioInformatics Solutions. https://disgenet.com/publications/whitepapers/1627.

Piñero, Janet, Javier Corvi, Natalia Rykova, Anna Guillem, Amelia Martı́nez, Jaione Telleria Zufiaur, Ivo Rivetta, et al. 2026. “DISGENET: Accelerating Data-Driven Discovery in Disease Genomics and Therapeutic Development.” bioRxiv. https://doi.org/10.64898/2026.01.05.697749.

Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. 2019. “The DisGeNET knowledge platform for disease genomics: 2019 update.” Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkz1021.

Piñero, Janet, Josep Saüch, Ferran Sanz, and Laura I. Furlong. 2021. “The DisGeNET Cytoscape App: Exploring and Visualizing Disease Genomics Data.” Computational and Structural Biotechnology Journal 19: 2960–67. https://doi.org/https://doi.org/10.1016/j.csbj.2021.05.015.

Sánchez, David, and Montserrat Batet. 2011. “Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.” Journal of Biomedical Informatics 44 (5): 749–59. https://doi.org/10.1016/j.jbi.2011.03.013.