disgenet2r: An R package to explore the molecular underpinnings of human diseases

Introduction

The disgenet2r package contains a set of functions to retrieve, visualize and expand DISGENET data (Piñero et al. 2021, 2019). DISGENET is a comprehensive discovery platform that integrates more than 30 millions associations between genes, variants, and human diseases. The information in DISGENET has been extracted from expert-curated resources and from the literature using state-of-the-art text mining technologies (Table 1).

To use DISGENET and the disgenet2r package, you need to acquire a license. Please contact us at info@disgenet.com for license conditions and pricing.

Table 1: Sources of DISGENET data
Source_Name	Type_of_data	Description
CLINGEN	GDAs	The Clinical Genome Resource
ORPHANET	GDAs	The portal for rare diseases and orphan drugs
PSYGENET	GDAs	Psychiatric disorders Gene association NETwork
HPO	GDAs	Human Phenotype Ontology
MGD_HUMAN	GDAs	Mouse Genome Database, human data
MGD_MOUSE	GDAs	Mouse Genome Database, mouse data
RGD_HUMAN	GDAs	Rat Genome Database, human data
RGD_RAT	GDAs	Rat Genome Database, rat data
UNIPROT	GDAs/VDAs	The Universal Protein Resource
CLINVAR	GDAs/VDAs	ClinVar Database
GWASCAT	GDAs/VDAs	The NHGRI-EBI GWAS Catalog
PHEWASCAT	GDAs/VDAs	The PHEWAS Catalog
UK BIOBANK	GDAs/VDAs	UK Biobank GWAS data
FINNGEN	GDAs/VDAs	FinnGen data
TEXT MINING HUMAN	GDAs/VDAs	Data from text mining medline abstracts, human
TEXT MINING MODELS	GDAs	Data from text mining medline abstracts, animal models
CLINICAL TRIALS	GDAs	Data from Clinicaltrials.org
CURATED	GDAs/VDAs	Human curated sources: ClinGen, UniProt, Orphanet, PsyGeNET, ClinVar, MGD Human, RGD Huma
INFERRED	GDAs	Inferred data from the HPO and the GWAS and PHEWAS Catalogs, and from UK and FinnGen biobanks
MODELS	GDAs	Data from animal models: MGD MOUSE, RGD RAT, and TEXT MINING MODELS
ALL	GDAs/VDAs	All data sources

You can test DISGENET and the disgenet2r package by registering for a free trial account here.

disgenet2r package usage limits

Trial account

Please note that the trial account enables you to test all the functions of the disgenet2r package, but the queries to DISGENET database have the following restrictions:

Only the top-30 results ordered by descending DISGENET score are returned (pagination is not supported).
Multiple-entity queries support at most 10 entities (genes, diseases, variants).
The access to DISGENET with a TRIAL account will expire after 7 days from the day of activation.

Other plans

There are limits in place for the disgenet2r package to ensure smooth performance for all users. These limits apply to academics, advanced, and premium users, mirroring the limits of the DISGENET REST API.

Here’s a breakdown of the limitations:

A maximum of 100 pages of results are returned.
Multiple-entity queries support at most 100 entities (genes, diseases, variants).

Important Note: The package will display a warning message if you exceed these limits.

Recommendations for Efficient Use:

To improve performance and avoid exceeding limits, consider querying with smaller batches of entities. You can also use disgenet metrics and annotations to refine your search and reduce the number of returned results.

Installation and first run

The package disgenet2r is available through GitLab. The package requires an R version > 3.5.

Install disgenet2r by typing in R:

library(devtools)
install_gitlab("medbio/disgenet2r")

To load the package:

library(disgenet2r)

Once you have completed the registration process, go to your user profile…

… and retrieve your API key

After retrieving the API key from your user profile, run the lines below so the key is available for all the disgenet2r functions.

api_key <- "enter your API key here"

Sys.setenv(DISGENET_API_KEY= api_key)

In the following document, we illustrate how to use the disgenet2r package through a series of examples.

Quick Start

The functions in the disgenet2r package receive as parameters one entity (gene, disease, variant, and chemical), or a list of entities (up to 100) and combinations of them. In addition, they have the following parameters:

score A vector with two elements: 1) initial value of score 2) final value of score. Default 0-1.
database
Name of the database that will be queried. Default CURATED. It can take the values: ‘CLINGEN’, ‘CLINVAR’, ‘ORPHANET’, ‘PSYGENET’, ‘UNIPROT’, ‘CURATED’, ‘HPO’, ‘GWASCAT’, ‘PHEWASCAT’, ‘UKBIOBANK’, ‘FINNGEN’, ‘INFERRED’, ‘MGD_HUMAN’, ‘MGD_MOUSE’, ‘RGD_HUMAN’, ‘RGD_RAT’, ‘TEXTMINING_MODELS’, ‘MODELS’, ‘TEXTMINING_HUMAN’, “CLINICALTRIALS” , and ‘ALL’.
n_pags
A number between 1 and 100 indicating the number of pages to retrieve from the results of the query. Default 100. If a number of pages larger than 100 is indicated, the function will stop.
verbose By default FALSE. Change it to TRUE to enable real-time logging from the function.
order_by
By default score. Depending on the type of query, it can accept the following values: score, dsi, dpi, pli, pmYear, ei, yearInitial, yearFinal, numCTsupportingAssociation.

Below, an example of a query for the BRCA1 gene in ALL the data. Notice that this query retrieves over 300 pages of results. Only the first 10,000 results will be retrieved (100 pages, 100 results per page).

results <- gene2evidence( gene = "BRCA1", vocabulary = "HGNC", database = "ALL")

## Notice that your query has a maximum of 336 pages.
## By using the default n_pags (100), your query of 336 pages has been reduced to 100 pages.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        BRCA1 
##  . Results:  10000

Retrieving Gene-Disease Associations from DISGENET

Searching by gene

The gene2disease function retrieves the GDAs in DISGENET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary. By default, vocabulary = "HGNC". To switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.

The function also requires the user to specify the source database using the argument database. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from PsyGeNET, ClinGen, ClinVar, MGD Human data, UniProt, and Orphanet.

The information can be filtered using the DISGENET score. The argument score consists of a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1).

In the example, the query for the Leptin Receptor (Gene Symbol LEPR, and Entrez NCBI Identifier 3953) is performed in the curated data in DISGENET.

results <- gene2disease( gene = 3953, vocabulary = "ENTREZ",
                       database = "CURATED")

The function gene2disease produces an object DataGeNET.DGN that contains the results of the query.

class(results)

## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"

Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single or list), the type of entity (gene-disease), the selected database (CURATED), the score range used in the search (0-1), and the gene NCBI identifier (3953).

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        3953 
##  . Results:  68

To obtain the data frame with the results of the query

tab <- results@qresult
head( tab, 3 )

##   gene_symbol geneid       ensemblid   geneNcbiType geneDSI geneDPI    genepLI
## 1        LEPR   3953 ENSG00000116678 protein-coding    0.42   0.875 8.8607e-05
## 2        LEPR   3953 ENSG00000116678 protein-coding    0.42   0.875 8.8607e-05
## 3        LEPR   3953 ENSG00000116678 protein-coding    0.42   0.875 8.8607e-05
##       uniprotids protein_classid protein_class_name
## 1 Q4G138, P48357    DTO_05007599          Signaling
## 2 P48357, Q4G138    DTO_05007599          Signaling
## 3 P48357, Q4G138    DTO_05007599          Signaling
##                               disease_name diseaseType diseaseUMLSCUI
## 1                                  Obesity     disease       C0028754
## 2 Diabetes Mellitus, Non-Insulin-Dependent     disease       C0011860
## 3                        Diabetes Mellitus     disease       C0011849
##                                                                            diseaseClasses_MSH
## 1 Pathological Conditions, Signs and Symptoms (C23), Nutritional and Metabolic Diseases (C18)
## 2                   Endocrine System Diseases (C19), Nutritional and Metabolic Diseases (C18)
## 3                   Endocrine System Diseases (C19), Nutritional and Metabolic Diseases (C18)
##       diseaseClasses_UMLS_ST
## 1 Disease or Syndrome (T047)
## 2 Disease or Syndrome (T047)
## 3 Disease or Syndrome (T047)
##                                        diseaseClasses_DO
## 1                        disease of metabolism (0014667)
## 2 genetic disease (630), disease of metabolism (0014667)
## 3 genetic disease (630), disease of metabolism (0014667)
##                                                                           diseaseClasses_HPO
## 1                                                                 Growth abnormality (01507)
## 2 Abnormality of the endocrine system (00818), Abnormality of metabolism/homeostasis (01939)
## 3 Abnormality of the endocrine system (00818), Abnormality of metabolism/homeostasis (01939)
##   numCTsupportingAssociation numPMIDs chemsIncludedInEvidenceBySource
## 1                         16       14                            NULL
## 2                          2        5                            NULL
## 3                          3        1                            NULL
##   numChemsIncludedInEvidences numPMIDSWithChemsIncludedInEvidences
## 1                          NA                                   NA
## 2                          NA                                   NA
## 3                          NA                                   NA
##   numberChemsFiltered numberPmidsWithChemsFiltered score yearInitial yearFinal
## 1                  NA                           NA   1.0        1986      2023
## 2                  NA                           NA   1.0        2010      2024
## 3                  NA                           NA   0.9        2003      2003
##   evidence_level evidence_index diseaseid
## 1             NA      0.8702595  C0028754
## 2             NA      0.9126984  C0011860
## 3             NA      0.8260870  C0011849

The same query can be performed using the Gene Symbol (LEPR) and the data source (TEXTMINING_HUMAN). Notice how the number of diseases associated to the Leptin Receptor has increased.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  401

The same query can be performed using the ENSEMBL gene identifier of the LEPR gene (ENSG00000116678) by setting the vocabulary to ENSEMBL.

results <- gene2disease( gene = "ENSG00000116678",
                        vocabulary = "ENSEMBL",
                       database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        ENSG00000116678 
##  . Results:  401

Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.3,1) is used. Notice how the number of diseases associated to the Leptin Receptor drops when the score is restricted.

results <- gene2disease( gene = "LEPR",
                        vocabulary = "HGNC",
                       database = "ALL",
                       score =c(0.3,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.3-1 
##  . Term:        LEPR 
##  . Results:  92

In Table 2 are shown the top 20 diseases associated to the LEPR gene

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )
knitr::kable(tab[1:10,], caption = "Top diseases associated to LEPR" )

Table 2: Top diseases associated to LEPR
gene_symbol	disease_name	score	yearInitial	yearFinal
LEPR	Obesity	1.00	1966	2024
LEPR	Diabetes Mellitus, Non-Insulin-Dependent	1.00	1966	2024
LEPR	Diabetes Mellitus	0.90	1981	2023
LEPR	Hyperphagia	0.85	1986	2023
LEPR	Hyperinsulinism	0.85	1986	2022
LEPR	Hypertensive disease	0.85	1998	2022
LEPR	Morbid obesity	0.85	1995	2024
LEPR	Insulin Resistance	0.80	1999	2024
LEPR	Non-alcoholic Fatty Liver Disease	0.80	2006	2024
LEPR	Hyperglycemia	0.80	1986	2024

Visualizing the diseases associated to a single gene

The disgenet2r package offers two options to visualize the results of querying a single gene in DISGENET: a network showing the diseases associated to the gene of interest (Gene-Disease Network), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Gene-Disease Network on a DataGeNET.DGN object (Figure 1). In the Gene-Disease Network the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop parameter allows to adjust the size of the nodes, while the eprop parameter adjusts the width of the edges while keeping the proportionality to the score.

plot( results,
      type = "Network",
      prop = 20, eprop =5, verbose = T)

Figure 1: The Gene-Disease Network for the Leptin Receptor gene

Use interactive = TRUE to display an interactive plot (Figure 2).

plot( results,
      type = "Network",
       interactive = TRUE)

Figure 2: The interactive Gene-Disease Network for the Leptin Receptor gene

The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to DiseaseClass (Gene-Disease Class Network, Figure 3). In the Gene-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Leptin Receptor is associated mainly to Nutritional and Metabolic Diseases. There are 2 diseases in the example that do not have annotations to MeSH disease class (shown as a warning).

plot( results,
      class = "DiseaseClass",
       interactive=T, verbose = T)

Figure 3: The Disease Class Network for the Leptin Receptor Gene

Exploring the attributes of a gene

The gene2attribute function allows to retrieve the information for a specific gene, or list of genes.

results <- gene2attribute( gene  = "3953", vocabulary = "ENTREZ"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene 
##  . Database:     ALL 
##  . Score:         
##  . Term:        3953

The result shows the the Disease Specificity Index (DSI), and the Disease Pleiotropy Index (DPI) for the gene (Table 3).

tab <-results@qresult
knitr::kable(tab, caption = "Gene attributes for LEPR")

Table 3: Gene attributes for LEPR
description	geneid	gene_symbol	ensembl_ids	uniprotids	proteinClasses	ncbi_type	geneDSI	geneDPI	genepLI
leptin receptor	3953	LEPR	ENSG00000116678	Q4G138	DTO_05007599, DTO , Signaling	protein-coding	0.42	0.875	8.86e-05
leptin receptor	3953	LEPR	ENSG00000116678	P48357	DTO_05007599, DTO , Signaling	protein-coding	0.42	0.875	8.86e-05

Exploring the evidences associated to a gene

You can extract the evidences associated to a particular gene using the function gene2evidence. Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease.

results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
                                        disease ="UMLS_C3554225", database = "ALL")

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  18

The results are shown in Table 4.

tab <- results@qresult
tab <-  tab %>%
  filter(reference_type == "PMID") %>%
  select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))

tab <- tab %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(  pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) ) ) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY" )

Table 4: Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY
pmid	associationType	Year	Sentence
37140700	GeneticVariation	2023	In conclusion, we reported ten new patients with leptin and leptin receptor deficiencies and identified six novel LEPR variants expanding the mutational spectrum of this rare disorder.
33922961	GeneticVariation	2021	Recently, we discovered a spontaneous compound heterozygous mutation within the leptin receptor, resulting in a considerably more obese phenotype than described for the homozygous leptin receptor deficient mice.
29158088	AlteredExpression	2018	In this study, we demonstrate that leptin receptor activation directly affects iron metabolism by the finding that serum levels of hepcidin, the master regulator of iron in the whole body, were significantly lower in leptin-deficient (ob/ob) and leptin receptor-deficient (db/db) mice.
25751111	GeneticVariation	2015	Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect.
24611737	CausalMutation	2014	Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population.
22810975	GeneticVariation	2012	Variants in the LEPR gene are nominally associated with higher BMI and lower 24-h energy expenditure in Pima Indians.
18703626	CausalMutation	2008	Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor.
17229951	CausalMutation	2007	Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
16284652	CausalMutation	2005	Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes.
12646666	GeneticVariation	2003	Binge eating as a major phenotype of melanocortin 4 receptor gene mutations.
12031989	AlteredExpression	2002	These data demonstrate that leptin is not needed for ObR gene expression, and they suggest that leptin plays a role in receptor downregulation because sObR levels are negatively correlated with leptin levels and BMI in control subjects, whereas sObR levels are not depressed in obese leptin-deficient or leptin receptor-deficient individuals.
9860295	GeneticVariation	1998	Transmission disequilibrium and sequence variants at the leptin receptor gene in extremely obese German children and adolescents.
9537324	CausalMutation	1998	A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9537324	GeneticVariation	1998	A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9144432	GeneticVariation	1997	Amino acid variants in the human leptin receptor: lack of association to juvenile onset obesity.

To visualize the results when there are many evidences, we suggest to use plot the results using the argument Points (Figure 4). It is important to set the parameter limit to 10,000, in order to include all the evidences in the plot.

results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
                        database = "ALL", score=c(0.7,1) )
plot(results, type="Points",   interactive=T, limit=10000)

Figure 4: The Evidences plot for the Leptin Receptor gene

Searching multiple genes

The gene2disease function can also receive as input a list of genes, either as Entrez NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 5) and then, we apply the function gene2disease.

Table 5: Example of voltage-gated potassium channel family members
Name	Description
KCNE1	potassium channel, voltage gated subfamily E regulatory beta subunit 1
KCNE2	potassium channel, voltage gated subfamily E regulatory beta subunit 2
KCNH1	potassium channel, voltage gated eag related subfamily H, member 1
KCNH2	potassium channel, voltage gated eag related subfamily H, member 2
KCNG1	potassium voltage-gated channel modifier subfamily G member 1

Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.

myListOfGenes <- c( "KCNE1", "KCNE2", "KCNH1", "KCNH2", "KCNG1")

The gene2disease function also requires the user to specify the source database using the argument database, and optionally, the DISGENET score can also be applied to filter the results.

results <- gene2disease(
  gene     = myListOfGenes,
 database = "ALL",
 score =c(0.5, 1),
  verbose  = TRUE
)

## Your query has 1 page.

## Warning in gene2disease(gene = myListOfGenes, database = "ALL", score = c(0.5, : 
##  One or more of the genes in the list is not in DISGENET ( 'ALL' ):
##    - KCNG1

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:       KCNE1 ... KCNH2 
##  . Results:  43

In Table 6, the top 20 diseases associated to the list of genes belonging to the voltage-gated potassium channel family.

tab <- results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")]  %>% unique()  %>%
  arrange(desc(score), yearInitial)

knitr::kable(tab[1:10,], caption = "Top GDAs for the list of genes belonging to the voltage-gated potassium channel family")

Table 6: Top GDAs for the list of genes belonging to the voltage-gated potassium channel family
gene_symbol	disease_name	score	yearInitial	yearFinal
KCNH2	Long QT Syndrome	1.00	1970	2024
KCNH2	Cardiac Arrhythmia	1.00	1975	2024
KCNE1	Jervell-Lange Nielsen Syndrome	1.00	1993	2024
KCNE2	Long QT Syndrome	1.00	1999	2024
KCNH2	Long Qt Syndrome 2	0.95	1986	2024
KCNE2	Cardiac Arrhythmia	0.90	1999	2024
KCNH2	Sudden Cardiac Death	0.90	2000	2024
KCNE1	Long QT Syndrome	0.90	1975	2024
KCNE1	LONG QT SYNDROME 5	0.90	1991	2024
KCNH2	Short QT Syndrome 1	0.90	1999	2022

Visualizing the diseases associated to multiple genes

By default, plotting a DataGeNET.DGN resulting of the query with a list of genes produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5).

plot( results,
      type = "Network",
      prop = 10, verbose = T)

Figure 5: The Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Set the argument interactive = TRUE to see an interactive network (Figure 6).

plot( results,
      type = "Network",
      prop = 10,  interactive=TRUE)

Figure 6: The interactive Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Setting the argument type to Heatmap produces a Gene-Disease Heatmap (Figure 7), where the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs. The argument nchars can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type  ="Heatmap",
      limit  = 100, nchars = 60, interactive =T, verbose = T)

Figure 7: The Gene-Disease Heatmap for a list of genes belonging to the voltage-gated potassium channel family

These results can also be visualized as a Gene-Disease Class Heatmap by setting the argument type to Heatmap and class to DiseaseClass (Figure 8). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.

plot( results, type="Heatmap",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 8: The Gene-Disease Class Heatmap for a list of genes belonging to the voltage-gated potassium channel family

Alternative, set the arguments type to Network and class to DiseaseClass to generate a Gene-Disease Class Network (Figure 9).

plot( results, type="Network",
      class="DiseaseClass", nchars=60, interactive =T)

Figure 9: The Gene-Disease Class Network for a list of genes belonging to the voltage-gated potassium channel family

Exploring the evidences associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = myListOfGenes, 
                       database = "TEXTMINING_HUMAN", verbose  = TRUE)

## Your query has 23 pages.

To visualize the results set the argument class=Points (Figure 10).

plot(results, type="Points",   interactive=T, limit=10000)

Figure 10: The Evidences plot for a list of genes belonging to the voltage-gated potassium channel family

Exploring the Clinical trials associated to a list of genes

First, create the object gene-evidence using the gene2evidence function.

results <- gene2evidence(gene     = c("IL3", "IL4", "IL5", "IL6", "IL0"), 
                       database = "CLINICALTRIALS", verbose  = TRUE )

## Your query has 160 pages.
## Notice that your query has a maximum of 160 pages.
## By using the default n_pags (100), your query of 160 pages has been reduced to 100 pages.

## Warning in gene2evidence(gene = c("IL3", "IL4", "IL5", "IL6", "IL0"), database = "CLINICALTRIALS", : 
##  One or more of the genes in the list is not in DISGENET ('CLINICALTRIALS'): IL0

To visualize the results set the argument class=Points (Figure ??).

# plot(results, type="Points",   interactive=T, limit=10000)

Searching by gene and chemical

You can search GDAs by chemicals by specifying a chemical identifier using the chemical filter in the gene2disease function. Table 7 shows the diseases associated to LEPR associated to metformin.

results <- gene2disease( gene = "LEPR", vocabulary = "HGNC",
                       database = "TEXTMINING_HUMAN", 
                       chemical = "CHEMBL_CHEMBL1431" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        LEPR 
##  . Results:  4

tab <- results@qresult
tab <-tab%>% dplyr::select(chemical_name, gene_symbol, disease_name,  score)
knitr::kable(tab, caption = "GDAs for LEPR and metformin")

Table 7: GDAs for LEPR and metformin
chemical_name	gene_symbol	disease_name	score
Metformin	LEPR	Polycystic Ovary Syndrome	0.45
Metformin	LEPR	Steatohepatitis	0.35
Metformin	LEPR	Schizophrenia	0.20
Metformin	LEPR	Pulmonary arterial hypertension	0.10

Retrieving the chemicals associated to a gene

For GDAs that have a chemical annotation, we can perform a query with a gene, or list of genes, to retrieve the chemicals annotated to this associations.

results <- gene2chemical( gene  = "PDGFRA", 
                        vocabulary = "HGNC",
                          database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.8-1 
##  . Term:        PDGFRA 
##  . Results:  28

tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type == "PMID") %>%   dplyr::select(disease_name, chemical_name, chemical_effect,sentence, 
                           reference, pmYear)
tab <- tab %>% dplyr::rename(  Disease = disease_name, 
                             Chemical = chemical_name, `Chemical effect` =  chemical_effect,
                             Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid )  )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Selection of chemicals associated to PDGFRA" )

Table 8: Selection of chemicals associated to PDGFRA
Disease	Chemical	Sentence	pmid	Year
Gastrointestinal Stromal Tumors	Myricetin	In addition to mutations in KIT and PDGFRA, many other genetic alterations have been described in gastrointestinal stromal tumors (GISTs), including amplifications of C-MYC and EGFR, which are often associated with increased protein expression.	39636317	2025
Gastrointestinal Stromal Tumors	ADENOSINE DIPHOSPHATE RIBOSE	Representative candidate drugs for genome-matched therapies in KIT/PDGFRA-mutated and wild-type GISTs were as follows: pembrolizumab for tumor mutation burden-high in one and two patients, respectively; poly-adenosine diphosphate ribose polymerase inhibitors for alterations related to homologous recombination deficiency in 12 and one patient, respectively; NTRK inhibitor for ETV6-NTRK3 fusion in one with KIT/PDGFRA wild-type GIST; and human epidermal growth factor receptor 2-antibody-drug conjugate in one with KIT/PDGFRA-mutated GIST.	39447098	2024
Gastrointestinal Stromal Tumors	Pembrolizumab	Representative candidate drugs for genome-matched therapies in KIT/PDGFRA-mutated and wild-type GISTs were as follows: pembrolizumab for tumor mutation burden-high in one and two patients, respectively; poly-adenosine diphosphate ribose polymerase inhibitors for alterations related to homologous recombination deficiency in 12 and one patient, respectively; NTRK inhibitor for ETV6-NTRK3 fusion in one with KIT/PDGFRA wild-type GIST; and human epidermal growth factor receptor 2-antibody-drug conjugate in one with KIT/PDGFRA-mutated GIST.	39447098	2024
Gastrointestinal Stromal Tumors	Ripretinib	Ripretinib, a broad-spectrum inhibitor of the KIT and PDGFRA receptor tyrosine kinases, is designated as a fourth-line treatment for gastrointestinal stromal tumor (GIST).	38973363	2024
Gastrointestinal Stromal Tumors	Avapritinib	The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively.	38756640	2024
Gastrointestinal Stromal Tumors	Imatinib	The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively.	38756640	2024
Gastrointestinal Stromal Tumors	Sorafenib	Low Dose Sorafenib in Gastric Gastrointestinal Stromal Tumour with PDGFRA p.1843-D846 Deletion in an 88-Year-Old Male.	38576303	2024
Gastrointestinal Stromal Tumors	Avapritinib	Avapritinib is the only drug for adult patients with PDGFRA exon 18 mutated unresectable or metastatic gastrointestinal stromal tumor (GIST).	38803186	2024
Gastrointestinal Stromal Tumors	Imatinib	We report two cases of rare GISTs in the same family: A male patient with the V561D mutation in exon 12 of the PDGFRA gene, who has been taking the targeted drug imatinib since undergoing surgery, and a female patient diagnosed with wild-type GIST, who has been taking imatinib for 3 years since undergoing surgery.	39350996	2024
Gastrointestinal Stromal Tumors	Avapritinib	Avapritinib is the only potent and selective inhibitor approved for the treatment of D842V-mutant gastrointestinal stromal tumors (GIST), the most common primary mutation of the platelet-derived growth factor receptor α (PDGFRA).	38167404	2024

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=10000)

Figure 11: The Gene-Chemical Network for PDGFRA

Searching by disease

The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), ID is the identifier in the vocabulary, and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.8 to 1.

results <- disease2gene( disease  = "UMLS_C0036341",
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0036341 
##  . Results:  137

In Table 9, the top 20 genes associated to UMLS CUI C0036341.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>%
  arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top 10 genes associated to Schizophrenia")

Table 9: Top 10 genes associated to Schizophrenia
gene_symbol	disease_name	score	yearInitial	yearFinal
DRD3	Schizophrenia	1	1999	1999
DRD2	Schizophrenia	1	2000	2011
HTR2A	Schizophrenia	1	2004	2008
RTN4R	Schizophrenia	1	2004	2017
COMT	Schizophrenia	1	2005	2010
MTHFR	Schizophrenia	1	2006	2009
TNF	Schizophrenia	1	2006	2006
GRIN2B	Schizophrenia	1	2008	2008
ZNF804A	Schizophrenia	1	2008	2018
CHRFAM7A	Schizophrenia	1	2009	2009

Visualizing the genes associated to a single disease

There are two options to visualize the results from searching a single disease: a Gene-Disease Network showing the genes related to the disease of interest (Figure 12), and a Disease-Protein Class Network with the genes grouped grouped by the the Drug Target Ontology Protein Class (Figure 13).

Figure 12 shows the default Gene-Disease Network for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.

plot ( results,
       prop = 10, interactive=TRUE)

Figure 12: The Gene-Disease Network for genes associated to Schizophrenia

Alternatively, in the Disease-Protein Class Network, genes are grouped by the the Drug Target Ontology Protein Class (Figure 13). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class argument ProteinClass. The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are G-protein coupled receptors. Notice again that not all genes have annotations to Protein classes.

plot( results,
      class="ProteinClass",
      interactive=TRUE)

Figure 13: The Protein Class-Disease Network for genes associated to Schizophrenia

The same results are obtained when querying DISGENET with the MeSH identifier for Schizophrenia (D012559).

results <- disease2gene( disease  = "MESH_D012559",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        MESH_D012559 
##  . Results:  137

The same results are obtained when querying DISGENET with the OMIM identifier for Schizophrenia (181500).

results <- disease2gene( disease  = "OMIM_181500",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        OMIM_181500 
##  . Results:  137

The same results are obtained when querying DISGENET with the ICD9-CM identifier for Schizophrenia (295).

results <- disease2gene( disease  = "ICD9CM_295",  
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        ICD9CM_295 
##  . Results:  137

The same results are obtained when querying DISGENET with the NCI identifier for Schizophrenia (C3362).

results <- disease2gene( disease  = "NCI_C3362", 
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        NCI_C3362 
##  . Results:  137

The same results are obtained when querying DISGENET with the DO identifier for Schizophrenia (5419).

results <- disease2gene( disease  = "HPO_HP:0100753", 
                          database = "CURATED",
                         score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        HPO_HP:0100753 
##  . Results:  137

Searching by disease and chemical

You can filter the results to find associations that are mentioned in the context of a chemical, like the example below.

results <- disease2gene( disease  = "UMLS_C0006142", chemical = "CHEMBL_CHEMBL83",
                          database = "ALL" , n_pags = 1 )

## Notice that your query has a maximum of 9 pages.
## By indicating n_pags = 1, your query of 9 pages has been reduced to 1 pages.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        UMLS_C0006142 
##  . Results:  107

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "chemical_name", "chemicalid")] )%>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to breast cancer")

Table 10: Top GDAs associated to breast cancer
gene_symbol	disease_name	score	chemical_name	chemicalid
BARD1	Malignant neoplasm of breast	1	Tamoxifen	C-286314
BRCA1	Malignant neoplasm of breast	1	Tamoxifen	C-286314
BRCA2	Malignant neoplasm of breast	1	Tamoxifen	C-286314
CDH1	Malignant neoplasm of breast	1	Tamoxifen	C-286314
ESR1	Malignant neoplasm of breast	1	Tamoxifen	C-286314
ESR1	Malignant neoplasm of breast	1	Pamidronic acid	C-578377
ESR1	Malignant neoplasm of breast	1	BENZOQUINONE	C-88223
ESR1	Malignant neoplasm of breast	1	Pterostilbene	C-96644
FGFR2	Malignant neoplasm of breast	1	Tamoxifen	C-286314
PIK3CA	Malignant neoplasm of breast	1	Tamoxifen	C-286314

Retrieving the chemicals associated to a disease

For GDAs that have a chemical annotation, we can perform a query with a disease, or list of disease, to retrieve the chemicals annotated to this associations.

results <- disease2chemical( disease = "UMLS_C0010674", 
                           database = "TEXTMINING_MODELS" , score = c(0.8,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_MODELS 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0010674 
##  . Results:  49

tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type =="PMID") %>% dplyr::select(gene_symbol, chemical_name,chemical_effect ,sentence, reference, pmYear) 
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                          `Chemical Effect`=chemical_effect ,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top chemicals associated to Cystic Fibrosis" )

Table 11: Top chemicals associated to Cystic Fibrosis
Gene	Chemical	Sentence	pmid	Year
CFTR	Phenobarbital	These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion.	38869953	2024
CFTR	BICARBONATE	These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion.	38869953	2024
CFTR	Linaclotide	These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion.	38869953	2024
CFTR	Iggsorb	Cystic fibrosis (CF) is a genetic disease caused by variants in the gene encoding for the CF transmembrane conductance regulator (CFTR) protein, a chloride and bicarbonate channel.	39322262	2024
CFTR	BICARBONATE	Cystic fibrosis (CF) is a genetic disease caused by variants in the gene encoding for the CF transmembrane conductance regulator (CFTR) protein, a chloride and bicarbonate channel.	39322262	2024
CFTR	Chloride ion	Cystic fibrosis (CF) is a genetic disease caused by variants in the gene encoding for the CF transmembrane conductance regulator (CFTR) protein, a chloride and bicarbonate channel.	39322262	2024
CFTR	Chloride ion	Cystic fibrosis (CF) is a genetic disorder caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) that controls chloride current.	38347907	2024
CFTR	Ivacaftor	While numerous animal models of CF exist, few have a CFTR mutation that is amenable to the triple combination therapy elexacaftor-tezacaftor-ivacaftor (ETI).	38545546	2024
CFTR	Elexacaftor	While numerous animal models of CF exist, few have a CFTR mutation that is amenable to the triple combination therapy elexacaftor-tezacaftor-ivacaftor (ETI).	38545546	2024
CFTR	Tezacaftor	While numerous animal models of CF exist, few have a CFTR mutation that is amenable to the triple combination therapy elexacaftor-tezacaftor-ivacaftor (ETI).	38545546	2024

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 14: The Disease-Chemical Network associated to Cystic Fibrosis

Exploring the attributes of a disease

The disease2attribute function allows to retrieve the information for a specific disease

results <- disease2attribute( disease  = "UMLS_C0036341"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0036341 
##  . Results:  12

The results (Table 12) show the mappings to different disease vocabularies, and the disease type.

tab <- unique(results@qresult )
knitr::kable(tab[1:10,], caption = "Disease attributes for Schizophrenia")

Table 12: Disease attributes for Schizophrenia
vocabulary	code	disease_name	type	diseaseClasses_UMLS_ST	diseaseClasses_HPO	diseaseClasses_DO	diseaseClasses_MSH
MSH	D012559	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD10	F20	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD10	F20.9	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
OMIM	181500	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295.90	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
HPO	HP:0100753	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
NCI	C3362	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295.9	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
ICD9CM	295	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)
DO	5419	Schizophrenia	disease	Mental or Behavioral Dysfunction (T048)	Abnormality of the nervous system (00707)	disease of mental health (150)	Mental Disorders (F03)

Retrieving the UMLS CUIs via other vocabularies

It is possible to obtain the CUIs that map to an identifier of interest (example, ICD9CM, MSH, or OMIM) using the the get_umls_from_vocabulary function.

results <- get_umls_from_vocabulary(
            disease  = "MSH_D012559",  vocabulary = "MSH" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        MSH_D012559 
##  . Results:  2

The results are shown in Table 13.

tab <-results@qresult
knitr::kable(tab, caption = "Retrieving the UMLS CUI from MeSH", row.names=F)

Table 13: Retrieving the UMLS CUI from MeSH
VOCABULARIES	code	disease_name
MSH	D012559	Schizophrenia
UMLS	C0036341	Schizophrenia

Finding the CUI associated to the name of a disease of interest

It is possible to obtain the CUIS that correspond to a disease(s) of interest using the the get_umls_from_vocabulary function. For that, we should specify the parameter vocabulary = "NAME". Use the the parameter limit to change the number of CUIs that are retrieved.

results <- get_umls_from_vocabulary(
  disease  = "long QT",  vocabulary = "NAME" ,  limit =10)
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease 
##  . Database:     ALL 
##  . Score:         
##  . Term:        long QT 
##  . Results:  10

The results are shown in Table 14.

tab <-results@qresult
knitr::kable(tab, caption = "List of CUIs that map to long QT", row.names = F)

Table 14: List of CUIs that map to long QT
VOCABULARIES	code	disease_name
UMLS	C1141890	Familial long QT syndrome (disorder)
UMLS	C2678485	Long Qt Syndrome 9
UMLS	C1832916	Timothy syndrome
UMLS	C1867904	LONG QT SYNDROME 5
UMLS	C1859062	LONG QT SYNDROME 3
UMLS	C2732979	Acquired long QT syndrome (disorder)
UMLS	C0023976	Long QT Syndrome
UMLS	C0152154	Prolonged labor
UMLS	C1833154	Long Qt Syndrome 4
UMLS	C5687394	Long QT syndrome type 6

Exploring the evidences associated to a disease

To explore the evidences supporting the associations for Schizophrenia use the function disease2evidence.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           type = "GDA",
                          database = "CURATED",
                          score    = c( 0.8,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.8-1 
##  . Term:        UMLS_C0036341 
##  . Results:  388

A selection of evidences is shown in Table 15.

tab <- results@qresult
tab <-tab[tab$reference_type == "PMID" & tab$pmYear > 2013 & tab$source =="PSYGENET", ] 
tab <- tab[ order(-tab$pmYear), c("gene_symbol","source", "associationType", "sentence", "reference", "pmYear")][1:5,]
tab <- tab %>% dplyr::rename(Gene = gene_symbol,  Year=pmYear, Sentence = sentence, pmid = reference)

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid)    )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association for Schizophrenia" )

Table 15: Evidences supporting the association for Schizophrenia
Gene	source	associationType	Sentence	pmid	Year
GRIN2A	PSYGENET	Biomarker	GRIN2A (GT)21 may play a significant role in the etiology of schizophrenia among the Chinese Han population of Shaanxi.	25958346	2015
NOTCH4	PSYGENET	Biomarker	Our data indicate that NOTCH4 polymorphism can influence clinical symptoms in Slovenian patients with schizophrenia.	25529856	2015
PPARA	PSYGENET	Biomarker	We report significant increases in PPAR?, SREBP1, IL-6 and TNF?, and decreases in PPAR? and C/EPB? and mRNA levels from patients with schizophrenia, with additional BMI interactions, characterizing dysregulation of genes relating to metabolic-inflammation in schizophrenia.	25433960	2015
MAPK3	PSYGENET	Biomarker	Both single-gene and gene-set enrichment analyses in genome-wide association data from the largest schizophrenia sample to date of 13,689 cases and 18,226 controls show significant association of HIST1H1E and MAPK3, and enrichment of our PSD proteome.	25048004	2015
MAGI2	PSYGENET	Biomarker	One of the rare CNVs found in SZ cohorts is the duplication of Synaptic Scaffolding Molecule (S-SCAM, also called MAGI-2), which encodes a postsynaptic scaffolding protein controlling synaptic AMPA receptor levels, and thus the strength of excitatory synaptic transmission.	25653350	2015

Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene.

results <- disease2evidence( disease  = "UMLS_C0036341",
                           gene = c("DRD2", "DRD3"),
                           type = "GDA",
                          database = "ALL",
                          score    = c( 0.5,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0036341 
##  . Results:  571

The more recent papers are shown in the Table 16.

tab <- results@qresult
tab <-  tab %>%
    filter(reference_type == "PMID") %>%
    select(gene_symbol, associationType, reference, sentence, pmYear) %>% arrange(desc(pmYear)) %>% head(10)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences supporting the association between C0036341 & DRD2,DRD3" )

Table 16: Evidences supporting the association between C0036341 & DRD2,DRD3
Gene	associationType	pmid	Sentence	Year
DRD2	CausalorOrContributing	37422511	We focus on schizophrenia and the dopamine D2 receptor (DRD2), hot flashes and the neurokinin B receptor (TACR3), cigarette smoking and receptors bound by nicotine (CHRNA5, CHRNA3, CHRNB4), and alcohol use and enzymes that help to break down alcohol (ADH1B, ADH1C, ADH7).	2024
DRD3	GeneticVariation	39187246	DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia.	2024
DRD2	GeneticVariation	38810489	Six loci including neurexin-1(NRXN1) (rs1045881), dopamine D1 receptor (DRD1) (rs686, rs4532), chitinase-3-like protein 1 (CHI3L1) (rs4950928), velocardiofacial syndrome (ARVCF) (rs165815), dopamine D2 receptor (DRD2) (rs1076560) were identified higher expression with significant difference in individuals converted into schizophrenia after two years.	2024
DRD2	GeneticVariation	39187246	DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia.	2024
DRD2	GeneticVariation	38598465	Adult patients with schizophrenia will be randomized (2: 1) to receive PGx-assisted treatment (drug and regimen selection depending on the results of single-nucleotide polymorphisms in genes DRD2, HTR1A, HTR2C, ABCB1, CYP2D6, CYP3A5, and CYP1A2) or the standard of care.	2024
DRD2	GeneticVariation	38421437	Our significant polymorphism findings, mainly those in DRD2 (rs1800497, rs1799978, and rs2734841), HTR2C (rs3813929), and HTR2A (rs6311), were largely consistent with earlier findings (predictors of RIS effectiveness in adult schizophrenia patients), confirming their validity for identifying ASD children with a greater likelihood of core symptom improvement compared to noncarriers/wild types.	2024
DRD2	CausalorOrContributing	39127265	According to the well-documented dysregulation of endocannabinoid and dopaminergic system genes in schizophrenia, we investigated DNA methylation cannabinoid type 1 receptor (CNR1) and dopamine D2 receptor (DRD2) genes in saliva samples from psychotic subjects using pyrosequencing.	2024
DRD2	CausalorOrContributing	39036710	TAAR1 agonists may be less efficacious than dopamine D 2 receptor antagonists already licensed for schizophrenia.	2024
DRD3	PostTranslationalModification	38648100	Schizophrenia subjects exhibited thousands of neuronal and non-neuronal epigenetic differences at regions that included several susceptibility genetic loci, such as NRG1, DISC1, and DRD3.	2024
DRD2	CausalorOrContributing	38114631	The Drd2 gene, encoding the dopamine D2 receptor (D2R), was recently indicated as a potential target in the etiology of lowered sociability (i.e., social withdrawal), a symptom of several neuropsychiatric disorders such as Schizophrenia and Major Depression.	2024

Searching multiple diseases

The disease2gene function also accepts as input a list of diseases (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 17 shows the UMLS CUIs and the corresponding disease names.

Table 17: Disease list selected for illustrating the **disease2gene** multiple search
UMLS_CUI	Disease_Name
C0036341	Schizophrenia
C0036341	Alzheimer’s Disease
C0030567	Parkinson Disease
C0005586	Bipolar Disorder

Creating the vector with the list of diseases.

diseasesOfInterest <- paste0("UMLS_",c("C0036341", "C0002395", "C0030567","C0005586"))

In the example, we will search in CURATED data, using a score range of 0.8-1.

results <- disease2gene(
  disease = diseasesOfInterest,
  database = "CURATED",
  score =c(0.8,1),
  verbose  = TRUE )

## Your query has 4 pages.

In table 18, the top 20 genes associated to the list of diseases.

tab <- unique(results@qresult[  ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>%  dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top Genes associated to a list of diseases")

Table 18: Top Genes associated to a list of diseases
gene_symbol	disease_name	score	yearInitial	yearFinal
GBA1	Parkinson Disease	1	1987	2021
APP	Alzheimer’s Disease	1	1989	2023
SNCA	Parkinson Disease	1	1989	2021
PSEN1	Alzheimer’s Disease	1	1993	2022
LRRK2	Parkinson Disease	1	1993	2021
GRN	Alzheimer’s Disease	1	1993	2020
APOE	Alzheimer’s Disease	1	1993	2020
MAPT	Alzheimer’s Disease	1	1993	2020
PSEN2	Alzheimer’s Disease	1	1993	2020
PRKN	Parkinson Disease	1	1998	2022

Visualizing the genes associated to multiple diseases

The default plot of the results of querying DISGENET with a list of diseases produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 15).

plot( results,
      type = "Network",
      prop = 10, interactive=T)

Figure 15: The Gene-Disease Network associated to a list of diseases

To visualize the results as a Gene-Disease Heatmap (Figure 16) change the argument class to “Heatmap”. In this plot, the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.

plot( results,
      type="Heatmap",
      limit =20,
      cutoff=0.2, interactive=TRUE)

## [1] "Dataframe of 365 rows has been reduced to 20 rows."

Figure 16: The Gene-Disease Heatmap for genes associated to a list of diseases

A third visualization option is a Protein Class-Disease Heatmap (Figure 17), in which genes are grouped by protein class. This plot is obtained by setting the class argument to “ProteinClass”. In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.

plot( results,
      class="ProteinClass", type = "Heatmap", interactive=TRUE)

Figure 17: The Protein Class-Disease Heatmap for genes associated to a list of diseases

A Protein Class-Disease Network visualization is also possible (Figure 18).

plot( results,
      class="ProteinClass", type = "Network", interactive=TRUE)

Figure 18: The Protein Class-Disease Network for genes associated to a list of diseases

To explore the evidences supporting the associations, use the function disease2evidence.

results <- disease2evidence( disease  = diseasesOfInterest,
                           type = "GDA",
                           score=c(0.5,1),
                          database = "CURATED" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-evidence 
##  . Database:     CURATED 
##  . Score:        0.5-1 
##  . Term:       UMLS_C0036341 ... UMLS_C0005586 
##  . Results:  3478

To visualize the results use the argument Points (Figure 19).

plot( results,  
      type = "Points", limit=10000 )

Figure 19: The Evidences plot for a list of diseases

Searching by disease and chemical

The disease2gene function can also be used to retrieve genes mentioned in the context of a specific disease and chemical (Table 19)

results <- disease2gene( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN",
                          chemical = "CHEMBL_CHEMBL1009")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        UMLS_C0030567 
##  . Results:  107

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score)) 
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Parkinson and levodopa")

Table 19: Top GDAs associated to Parkinson and levodopa
gene_symbol	disease_name	chemical_name	score
BDNF	Parkinson Disease	Levodopa	1
GBA1	Parkinson Disease	Levodopa	1
GDNF	Parkinson Disease	Levodopa	1
MAOB	Parkinson Disease	Levodopa	1
PRKN	Parkinson Disease	Levodopa	1
SNCA	Parkinson Disease	Levodopa	1
TH	Parkinson Disease	Levodopa	1
PARK7	Parkinson Disease	Levodopa	1
PINK1	Parkinson Disease	Levodopa	1
LRRK2	Parkinson Disease	Levodopa	1

To visualize the results use the function plot (Figure 19)

plot( results, interactive= T )

Figure 20: The Gene Disease Chemical Network for a disease and a drug

Retrieving the chemicals associated to a disease

To retrieve the chemicals mentioned in the GDAs involving a specific disease, we can use the disease2chemical function.

results <- disease2chemical( disease  = "UMLS_C0030567",
                          database = "TEXTMINING_HUMAN" , score = c(0.5,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.5-1 
##  . Term:        UMLS_C0030567 
##  . Results:  509

tab <- results@qresult
tab <-tab%>% dplyr::filter(reference_type == "PMID")  %>% dplyr::select(gene_symbol, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
                    `Chemical Effect` = chemical_effect,   Year=pmYear, Sentence = sentence, pmid = reference)   %>% dplyr::arrange(desc(Year))

tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid))) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Top Chemicals associated to Parkinson" )

Table 20: Top Chemicals associated to Parkinson
Gene	Chemical	Sentence	pmid	Year
TH	Rotenone	The neuroprotective effect of our nanoformulation is attributed to the upregulation of tyrosine hydroxylase (TH), the PD therapeutic target, with behavioral improvement in animals against rotenone-induced PD deficits.	38109795	2024
GBA1	Levodopa	Levodopa-carbidopa intestinal gel for advanced Parkinson’s disease: Impact of LRRK2 and GBA1 mutations.	39208588	2024
GBA1	Carbidopa	Levodopa-carbidopa intestinal gel for advanced Parkinson’s disease: Impact of LRRK2 and GBA1 mutations.	39208588	2024
MAPT	Caffeine	Based on the genetic association and interaction studies, only MAPT, SLC2A13, LRRK2, ApoE, NOS2A, GRIN2A, CYP1A2, and ADORA2A have been shown by at least one study to have a positive caffeine-gene interaction influencing the risk of PD.	38914264	2024
MAPT	THIOUREA	Evaluation of Alpha-Synuclein and Tau Antiaggregation Activity of Urea and Thiourea-Based Small Molecules for Neurodegenerative Disease Therapeutics. Alzheimer’s disease (AD) and Parkinson’s disease (PD) are multifactorial, chronic diseases involving neurodegeneration.	39436010	2024
VPS35	ESTROGEN	In conclusion, alternative autophagy might be important for maintaining neuronal homeostasis and may be associated with the neuroprotective effect of estrogen in PD with VPS35 D620N.	38409392	2024
SNCA	Gold	The conformational landscapes of αS indicate that uncharged Aun(SCH2OH?) chaperones the native intrinsically disordered conformations of αS, while negatively and positively charged AuNCs greatly increase the likelihood of forming intramolecular β-sheet domains, which are necessary for αS fibrillation and are a hallmark of PD.	39437152	2024
SNCA	RETINAL	Is there any correlation between alpha-synuclein levels in tears and retinal layer thickness in Parkinson’s disease? To determine the total alpha-synuclein (αSyn) reflex tears and its association with retinal layers thickness in Parkinson’s disease (PD).	37151018	2024
SNCA	Cocoa	Targeting protein aggregation using a cocoa-bean shell extract to reduce α-synuclein toxicity in models of Parkinson’s disease.	39525389	2024
SNCA	(E)-4-oxonon-2-enal	4-Oxo-2-Nonenal- and Agitation-Induced Aggregates of α-Synuclein and Phosphorylated α-Synuclein with Distinct Biophysical Properties and Biomedical Applications. α-Synuclein (α-syn) can form oligomers, protofibrils, and fibrils, which are associated with the pathogenesis of Parkinson’s disease and other synucleinopathies.	38727274	2024

To visualize the results use the function plot

plot( results )

Figure 21: The Evidences plot for a list of diseases

Retrieving Variant-Disease Associations from DISGENET

Searching by variant

The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN, with Type = "variant-disease".

results <- variant2disease( variant= "rs113488022",
                         database = "CURATED", score = c(0.2,1)) 
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0.2-1 
##  . Term:        rs113488022 
##  . Results:  13

The results are shown in Table 21.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Top diseases associated to variant rs113488022")

Table 21: Top diseases associated to variant rs113488022
variantid	disease_name	score	yearInitial	yearFinal
rs113488022	Colorectal Carcinoma	0.7	1993	2024
rs113488022	melanoma	0.7	2002	2021
rs113488022	Colon Carcinoma	0.7	2002	2020
rs113488022	Non-Small Cell Lung Carcinoma	0.7	2002	2019
rs113488022	Papillary thyroid carcinoma	0.7	2002	2018
rs113488022	Nephroblastoma	0.6
rs113488022	Multiple Myeloma	0.6
rs113488022	ASTROCYTOMA, LOW-GRADE, SOMATIC	0.4	2002	2018
rs113488022	Nongerminomatous Germ Cell Tumor	0.4	2002	2018
rs113488022	Vascular anomaly	0.4	2004	2021

Visualizing the diseases associated to a single variant

The disgenet2r package offers several options to visualize the results of querying DISGENET for a single variant: a Variant-Disease Network (Figure 22) showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network, Figure 23). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Variant-Disease Network on a DataGeNET.DGN object (Figure 22). In the Variant-Disease Network the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.

plot( results, 
      type = "Network", interactive=T,
      prop  = 10)

Figure 22: The Variant-Disease Network for the variant rs113488022

plot(results, class="DiseaseClass" , interactive=T)

Figure 23: The Variant-Disease Class Network for the variant rs113488022

Exploring the evidences associated to a variant

You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease.

results <- variant2evidence( variant = "rs10795668",
                disease ="UMLS_C0009402",
                       database = "ALL",
                       score =c(0,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs10795668 
##  . Results:  24

The results are shown in table 22.

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>% select(associationType, reference, pmYear, sentence) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid=reference) %>% dplyr::arrange(desc(Year))
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting the association between C0009402 & rs10795668")

Table 22: Evidences supporting the association between C0009402 & rs10795668
associationType	pmid	Year	Sentence
GeneticVariation	34676053	2021	Increasing risk of CRC was noted for rs10795668 in log-additive model (OR = 1.45, 95% CI: 1.05-1.99, p = 0.023); for rs1035209 in log-additive model (OR = 1.79, 95% CI: 1.18-2.72, p = 0.003); for rs11190164 in log-additive model (OR = 1.67, 95% CI: 1.17-2.38, p = 0.004).
GeneticVariation	30194776	2019	In conclusion, some variants associated with CRC risk (rs10505477, rs6983267, rs10795668 and rs11255841) are also involved in the susceptibility to CRA and specific subtypes.
GeneticVariation	24801760	2015	The CRC SNPs accounted for 4.3% of the variation in multiple adenoma risk, with three SNPs (rs6983267, rs10795668, rs3802842) explaining 3.0% of the variation.
GeneticVariation	24066093	2013	We genotyped four variants previously associated with CRC: rs10795668, rs16892766, rs3802842 and rs4939827.
GeneticVariation	22363440	2012	We observed an association between the low colorectal cancer risk allele (A) for rs10795668 at 10p14 and increased expression of ATP5C1 (q = 0.024) and between the colorectal cancer high risk allele (C) for rs4444235 at 14q22.2 and increased expression of DLGAP5 (q = 0.041), both in tumor samples.
GeneticVariation	21071539	2011	We studied the generalizability of the associations with 11 risk variants for CRC on 8q23 (rs16892766), 8q24 (rs6983267), 9p24 (rs719725), 10p14 (rs10795668), 11q23 (rs3802842), 14q22 (rs4444235), 15q13 (rs4779584), 16q22 (rs9929218), 18q21 (rs4939827), 19q13 (rs10411210), and 20p12 (rs961253) in a multiethnic sample of 2,472 CRC cases, 839 adenoma cases and 4,466 controls comprised of European American, African American, Native Hawaiian, Japanese American, and Latino men and women.
GeneticVariation	21402474	2011	Our data suggested that rs10795668, a CRC susceptibility variant identified by GWA studies, might be used as a biomarker to identify CRC patients with high risk of recurrence after chemotherapy.
GeneticVariation	20530476	2010	These results suggest that rs6983267, rs4939827, rs10795668, rs3802842, and rs961253 SNPs are associated with the risk of CRC in the Chinese population individually and jointly.
GeneticVariation	19843678	2009	We studied the role of the 8q24.21 (rs6983267), 18q21.1 (rs12953717), 15q13.3 (rs4779584), 11q23.1 (rs3802842), 8q23.3 (rs16892766), and 10p14 (rs10795668) risk variants in a series of 995 Dutch CRC cases and 1340 controls.
GeneticVariation	18372905	2008	In addition to the previously reported 8q24, 15q13 and 18q21 CRC risk loci, we identified two previously unreported associations: rs10795668, located at 10p14 (P = 2.5 x 10(-13) overall; P = 6.9 x 10(-12) replication), and rs16892766, at 8q23.3 (P = 3.3 x 10(-18) overall; P = 9.6 x 10(-17) replication), which tags a plausible causative gene, EIF3H.

The results can be visualized using the plot function with the argument Points. This will show the number of publications per year associated to this variant. It is important to set the parameter limit to 10,000 in order to include all the results in the plot.

results <- variant2evidence( variant = "rs1800629",
                       database = "ALL",
                       score =c(0,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs1800629 
##  . Results:  1945

plot( results,  
      type = "Points", limit=10000 )

Figure 24: The Evidence plot for the variant rs1800629

Exploring the information associated to a variant

The variant2attribute function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN with attributes of the variant(s) such as the allelic frequency according to GNOMAD data, the most severe consequence type from the Variant Effect Predictor and the DPI, and DSI.

results <- variant2attribute( variant= "rs113488022")

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant 
##  . Database:     ALL 
##  . Score:         
##  . Term:        rs113488022

The results are shown in table 23.

tab <- unique(results@qresult )
tab <- tab %>% dplyr::select(-threeletterID,-source, -var_gene_symbol)
knitr::kable(tab, caption = "Attributes for variant rs113488022")

Table 23: Attributes for variant rs113488022
variantid	ref	alt	polyphen_score	chromosome	coord	mostSevereConsequences	geneid	geneEnsemblID	gene_symbol	variantDSI	variantDPI	dbsnpclass	exome
rs113488022	A	C	0.958	7	140753336	missense_variant	673	ENSG00000157764	BRAF	0.338	0.045	snv
rs113488022	A	G	0.958	7	140753336	missense_variant	673	ENSG00000157764	BRAF	0.338	0.045	snv
rs113488022	A	T	0.958	7	140753336	missense_variant	673	ENSG00000157764	BRAF	0.338	0.045	snv	1.4e-06

Searching multiple variants

The variant2disease function retrieves the information in DISGENET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database. By default, variant2disease function uses as source database CURATED.

results <- variant2disease(
         variant  = c("rs121913013", "rs1060500621",
              "rs199472709", "rs72552293",
              "rs74315445", "rs199472795"),
         database = "ALL")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        variant-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:       rs121913013 ... rs199472795 
##  . Results:  21

In table 24, the top 20 diseases associated to the list of variants.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )%>% dplyr::arrange(desc(score), desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to the list of variants")

Table 24: Top diseases associated to the list of variants
variantid	disease_name	score	yearInitial	yearFinal
rs74315445	LONG QT SYNDROME 5	0.7	1993	2024
rs74315445	Jervell And Lange-Nielsen Syndrome 2	0.6	1993	2024
rs199472709	Romano-Ward Syndrome	0.6	1993	2022
rs199472795	Romano-Ward Syndrome	0.6	1993	2022
rs72552293	Brugada Syndrome 2	0.6	1993	2007
rs74315445	Jervell-Lange Nielsen Syndrome	0.4	1993	2024
rs74315445	Long QT Syndrome	0.4	1997	2024
rs74315445	Sudden death, cause unknown	0.4	1997	2024
rs74315445	Familial long QT syndrome (disorder)	0.4	1997	2024
rs74315445	Jervell And Lange-Nielsen Syndrome 1	0.4	1993	2024

Visualizing the diseases associated to multiple variants

The results of querying DISGENET with a list of variants can be visualized as a Variant-Disease Network (Figure 25), as a Variant-Gene-Disease Network (Figure 26), as Variant-Disease Heatmap (Figure 27), as Variant-Disease Class Network (Figure 28) and as a Variant-Disease Class Heatmap (Figure 29).

plot( results,
      type = "Network", interactive=T)

Figure 25: The Variant-Disease Network for a list of variants

To obtain the Variant-Gene-Disease Network (Figure 26), change the showGenes argument to “TRUE”.

plot( results,
      type = "Network", 
      showGenes= T,
      interactive=T)

Figure 26: The Variant-Gene-Disease Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network by changing the type argument to Heatmap (Figure 27).

plot( results,
      type = "Heatmap",
      prop = 10, interactive = TRUE, nchar=50)

Figure 27: The Variant-Disease Heatmap for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network by changing the class argument to DiseaseClass (Figure 28).

plot( results,
      class = "DiseaseClass", interactive=T)

Figure 28: The Variant-Disease Class Network for a list of variants

The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Heatmap by changing the type argument to Heatmap (Figure 29).

plot( results,  type = "Heatmap",
      class = "DiseaseClass", interactive=T)

Figure 29: The Variant-Disease Class Heatmap for a list of variants

Searching by disease

The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  154

In Table 25, the variants associated to Timothy syndrome according to ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>%  dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = " Variants associated to Timothy syndrome according to ClinVar")

Table 25: Variants associated to Timothy syndrome according to ClinVar
variantid	disease_name	score	yearInitial	yearFinal
rs79891110	Timothy syndrome	0.7	1993	2018
rs786205745	Timothy syndrome	0.7	1993	2004
rs786205753	Timothy syndrome	0.6	1993	2019
rs549476254	Timothy syndrome	0.6	1993	2019
rs786205748	Timothy syndrome	0.5	1993	2020
rs1057517711	Timothy syndrome	0.5	1993	2015
rs797044881	Timothy syndrome	0.5	1993	2015
rs374528680	Timothy syndrome	0.5	1993	2015
rs80315385	Timothy syndrome	0.5	1993	2015
rs587782933	Timothy syndrome	0.5	1993	1993

The results can be further restricted to keep variants predicted to be deleterious by SIFT and PolyPhen scores, by passing ranges of these scores to the function, using sift and polyphen arguments, like in the example below. Remember that genetic variants with SIFT scores smaller than 0.05 are predicted to be deleterious, while values of PolyPhen greater than 0.908 are classified as Probably Damaging.

results <- disease2variant(disease = c("UMLS_C1832916"),
                       database = "CLINVAR", sift = c(0,0.05), polyphen = c(0.9,1) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        UMLS_C1832916 
##  . Results:  86

In Table 26, the deleterious variants associated to Timothy syndrome repored in ClinVar database.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "polyphen_score", "sift_score", "yearInitial", "yearFinal")] ) %>%  dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Deleterious variants associated to Timothy syndrome according to ClinVar")

Table 26: Deleterious variants associated to Timothy syndrome according to ClinVar
variantid	disease_name	score	polyphen_score	sift_score	yearInitial	yearFinal
rs79891110	Timothy syndrome	0.7	1.000	0.00	1993	2018
rs786205745	Timothy syndrome	0.7	1.000	0.01	1993	2004
rs786205753	Timothy syndrome	0.6	0.999	0.00	1993	2019
rs549476254	Timothy syndrome	0.6	0.999	0.00	1993	2019
rs786205748	Timothy syndrome	0.5	1.000	0.00	1993	2020
rs1057517711	Timothy syndrome	0.5	0.999	0.00	1993	2015
rs797044881	Timothy syndrome	0.5	1.000	0.00	1993	2015
rs80315385	Timothy syndrome	0.5	1.000	0.00	1993	2015
rs587782933	Timothy syndrome	0.5	1.000	0.00	1993	1993
rs199473391	Timothy syndrome	0.4	1.000	0.00	1993	2023

Visualizing the variants associated to a single disease

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network (Figure 30).

plot( results,     
      type = "Network", interactive=T)

Figure 30: The Variant-Disease Network for a single disease

The Variant-Disease Network can be displayed as a Variant-Disease-Gene Network, by setting the showGenes parameter to TRUE (Figure 31).

plot( results, 
      type = "Network",
      showGenes = T)

$The **Variant-Gene-Disease Network** for a single disease$

Figure 31: The Variant-Gene-Disease Network for a single disease

Explore the evidences associated to a single disease

To explore the evidences supporting the VDAs for Timothy syndrome, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Timothy syndrome.

results <- disease2evidence( disease  = "UMLS_C1832916",
                           type = "VDA",
                          database = "ALL",
                          score    = c( 0.5,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     ALL 
##  . Score:        0.5-1 
##  . Term:        UMLS_C1832916 
##  . Results:  72

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>%
    select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear)) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Evidences supporting associations")

Table 27: Evidences supporting associations
pmid	associationType	Year	Sentence
39420001	GeneticVariation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
39580446	GeneticVariation	2024	It remains underexplored whether individuals with the canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) exhibit overlapping symptoms.
38968219	GeneticVariation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
38826393	GeneticVariation	2024	Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
39079396	GeneticVariation	2024	In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
38826393	GeneticVariation	2024	Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
39580446	GeneticVariation	2024	It remains underexplored whether individuals with the canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) exhibit overlapping symptoms.
38968219	GeneticVariation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
39420001	GeneticVariation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
39079396	GeneticVariation	2024	In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).

If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene.

results <- disease2evidence( disease  = "UMLS_C1832916",
                   gene = "775", vocabulary = "ENTREZ",
                   type = "VDA",  database = "TEXTMINING_HUMAN",
                   score    = c( 0.7,1 ) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-evidence 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.7-1 
##  . Term:        UMLS_C1832916 
##  . Results:  20

results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID")%>%
    select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear))%>% head(10)

results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption ="Selection of evidences supporting associations between C0036341 & CACNA1C")

Table 28: Selection of evidences supporting associations between C0036341 & CACNA1C
pmid	associationType	Year	Sentence
39420001	GeneticVariation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
39580446	GeneticVariation	2024	It remains underexplored whether individuals with the canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) exhibit overlapping symptoms.
38968219	GeneticVariation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
38826393	GeneticVariation	2024	Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
39079396	GeneticVariation	2024	In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).
38826393	GeneticVariation	2024	Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms.
39580446	GeneticVariation	2024	It remains underexplored whether individuals with the canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) exhibit overlapping symptoms.
38968219	GeneticVariation	2024	Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation.
39420001	GeneticVariation	2024	The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias.
39079396	GeneticVariation	2024	In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R).

Searching multiple diseases

results <- disease2variant(
              disease = paste0("UMLS_",c("C3150943",  "C1859062", "C1832916", "C4015695")),
              database = "CURATED", 
              score = c(0.6, 1) )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-variant 
##  . Database:     CURATED 
##  . Score:        0.6-1 
##  . Term:       UMLS_C3150943 ... UMLS_C4015695 
##  . Results:  144

Table 29 shows the variants associated to a list of Long QT syndromes in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>%  dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Variants associated to a list of Long QT syndromes")

Table 29: Variants associated to a list of Long QT syndromes
variantid	disease_name	score	yearInitial	yearFinal
rs137854600	LONG QT SYNDROME 3	0.8	1993	2022
rs9333649	Long Qt Syndrome 2	0.7	1993	2022
rs199473428	Long Qt Syndrome 2	0.7	1993	2022
rs199472961	Long Qt Syndrome 2	0.7	1993	2022
rs137854601	LONG QT SYNDROME 3	0.7	1993	2022
rs199473524	Long Qt Syndrome 2	0.7	1993	2022
rs79891110	Timothy syndrome	0.7	1993	2018
rs786205745	Timothy syndrome	0.7	1993	2004
rs199473108	LONG QT SYNDROME 3	0.7	1995	2018
rs199472916	Long Qt Syndrome 2	0.7

Visualizing the variants associated to multiple diseases

The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 32), by changing the class argument from “Network” to “Heatmap”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 32: The Variant-Disease Network for a list of diseases

The results can be visualized as a Heatmap (Figure 33).

plot( results,
      type = "Heatmap",
      interactive=T)

Figure 33: The Variant-Disease Heatmap for a list of diseases

Searching by gene

results <- gene2vda(
              gene = "APP",
              database = "CURATED" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        APP 
##  . Results:  17

Table 30 shows the top variants associated to the APP gene in the curated data in DISGENET.

tab <- unique(results@qresult[  ,c("variantid", "gene_symbols", "disease_name","score", "yearInitial", "yearFinal")] ) %>%  dplyr::arrange(desc(score), yearInitial, desc(yearFinal))

knitr::kable(tab[1:10,], caption = "Top variants associated to APP")

Table 30: Top variants associated to APP
variantid	gene_symbols	disease_name	score	yearInitial	yearFinal
rs63750264	APP	Alzheimer’s Disease	0.7	1991	2020
rs63750579	APP	Alzheimer’s Disease	0.6	1990	2020
rs63750066	APP	Alzheimer’s Disease	0.6	1992	2020
rs63750734	APP	Alzheimer’s Disease	0.6	1993	2020
rs193922916	APP	Alzheimer’s Disease	0.6	1993	2020
rs63750579	APP	CEREBRAL AMYLOID ANGIOPATHY, APP-RELATED	0.6	1990	2019
rs63749964	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	1991	2020
rs63750264	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	1991	2020
rs63750671	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	1992	2020
rs63751039	APP	ALZHEIMER DISEASE, FAMILIAL, 1	0.6	1992	2020

Visualizing the variant-disease associations retrieved for a gene

The results of querying DISGENET variant information with a gene can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 34), if the input is a list of genes, by changing the class argument from Network to Heatmap. The genes can be shown by setting the showGenes argument to “TRUE”.

plot( results,     
      type = "Network", interactive =TRUE)

Figure 34: The Variant-Disease Network for a gene

Searching by variant and chemical

results <- variant2disease( variant   = "rs121434568",
                          database = "TEXTMINING_HUMAN",
                          chemical = "CHEMBL_CHEMBL1173655")
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:        rs121434568 
##  . Results:  12

Table 31 shows the VDAs associated to rs121434568 and afatinib.

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))

knitr::kable(tab[1:10,], caption = "VDAs associated to rs121434568 and afatinib")

Table 31: VDAs associated to rs121434568 and afatinib
variantid	disease_name	chemical_name	score
rs121434568	Adenocarcinoma of lung (disorder)	Afatinib	0.7
rs121434568	Lung Neoplasms	Afatinib	0.3
rs121434568	Non-Small Cell Lung Carcinoma	Afatinib	0.3
rs121434568	Malignant neoplasm of lung	Afatinib	0.3
rs121434568	Metastatic malignant neoplasm to brain	Afatinib	0.3
rs121434568	Advanced Lung Adenocarcinoma	Afatinib	0.3
rs121434568	Adenocarcinoma of lung, stage IV	Afatinib	0.2
rs121434568	Metastatic Lung Adenocarcinoma	Afatinib	0.2
rs121434568	Metastatic Malignant Neoplasm to the Leptomeninges	Afatinib	0.2
rs121434568	Metastatic non-small cell lung cancer	Afatinib	0.2

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 35: VDAs associated to rs121434568 and afatinib

Retrieving the chemicals associated to a variant

The variant2chemical function allows to retrieve the chemicals associated to a variant

results <- variant2chemical( variant =  "rs1801133",
                          database = "TEXTMINING_HUMAN" , score = c(0.3,1))
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-chemical 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0.3-1 
##  . Term:        rs1801133 
##  . Results:  119

tab <- results@qresult
tab <-tab%>% dplyr::select( disease_name, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab[1:10, ] %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                        `Chemical Effect`=chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))

tab %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Chemicals associated to rs1801133" )

Table 32: Chemicals associated to rs1801133
Disease	Chemical	Chemical Effect	Sentence	pmid	Year
Multiple Sclerosis	Homocysteine	other\|therapeutic\|other\|other	The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS.	38648773	2024
Multiple Sclerosis	VITAMIN B12	other\|therapeutic\|other\|other	The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS.	38648773	2024
Schizophrenia	Homocysteine	other	In this study, we hypothesized that MTHFR C677T polymorphism and homocysteine concentration may play important roles in the development of depressive symptoms in schizophrenia.	32379616	2020
Schizophrenia	alpha-Linolenic acid	other	Our results demonstrated no significant differences in MTHFR Ala222Val genotype and allele distributions between the SCZ patients and controls (p > 0.05), but showed a statistical significance in the distribution of Ala/Val genotype between suicide attempters and non-attempters (p < 0.05).	32193498	2020
Schizophrenia	Homocysteine	other	Previous studies suggest that elevated total homocysteine levels and the methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism, which correlates with plasma total homocysteine levels, are risk factors for schizophrenia (SCZ).	27810229	2016
Schizophrenia	Homocysteine	other	The aim was to detect a serum level of Hcy, examine the associations between the level of Hcy, methylenetetrahydrofolate reductase (MTHFR) gene C677T polymorphism and clinical properties for patients with schizophrenia, mood disorders and in a control group.	23586533	2014
Schizophrenia	Homocysteine	other	Previous studies suggest that elevated blood homocysteine levels and the methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism are risk factors for schizophrenia.	24535549	2014
Schizophrenia	Homocysteine	other	The aim was to examine the serum levels of homocysteine (Hcy) and their associations with the methylenetetrahydrofolate reductase (MTHFR) gene C677T polymorphism in patients with schizophrenia and mood disorders as well as controls.	23091720	2012
Schizophrenia	Dopamine	other	A second polymorphism, methylenetetrahydrofolate reductase (MTHFR) 677C –> T (rs1801133), has been associated with overall schizophrenia risk and executive function impairment in patients, and may influence dopamine signaling through mechanisms upstream of COMT effects.	18988738	2008
Schizophrenia	Homocysteine	other	The elevated risk of schizophrenia associated with the homozygous genotype of the MTHFR 677C>T polymorphism provides support for causality between a disturbed homocysteine metabolism and risk of schizophrenia.	16172608	2006

To visualize the results use the plot function.

plot(results, 
     type="Network",   
     interactive=T, limit=50)

Figure 36: Chemicals associated to rs1801133

Retrieving associations involving Chemicals from DISGENET

Retrieving genes, variants, and diseases associated to chemicals

The chemical2gene function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gene( chemical  = "CHEMBL_CHEMBL1009" , database = "ALL" , n_pags = 5)

## Notice that your query has a maximum of 16 pages.
## By indicating n_pags = 5, your query of 16 pages has been reduced to 5 pages.

## Warning in chemical2gene(chemical = "CHEMBL_CHEMBL1009", database = "ALL", : 
##  One or more chemicals in the list is not in DISGENET ( 'ALL' ):
##    - CHEMBL_CHEMBL1009

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gene 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  88

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol,gene_type , chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Genes associated to levodopa")

Table 33: Genes associated to levodopa
gene_symbol	gene_type	chemical_name	pmids_chemical
COMT	protein-coding	Levodopa	55
DDC	protein-coding	Levodopa	30
GH1	protein-coding	Levodopa	24
SLC6A3	protein-coding	Levodopa	20
MAOB	protein-coding	Levodopa	18
PRKN	protein-coding	Levodopa	18
DRD2	protein-coding	Levodopa	17
GCH1	protein-coding	Levodopa	15
TH	protein-coding	Levodopa	14
SNCA	protein-coding	Levodopa	13

The results can be visualized as a Chemical-Gene Network (Figure 37).

plot( results,
      type = "Network", interactive=T)

Figure 37: The Chemical-Gene Network for a single chemical

The chemical2disease function allows to retrieve the diseases for a specific chemical, or list of chemicals, and the information cab be extracted from GDAs or VDAs. To specify from where, use the type parameter.

results <- chemical2disease( chemical  = "CHEMBL_CHEMBL1009" , type = "GDA", database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  230

tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Diseases associated to levodopa, type GDA", align= "lllc")

Table 34: Diseases associated to levodopa, type GDA
diseaseid	disease_name	chemical_name	pmids_chemical
C0030567	Parkinson Disease	Levodopa	324
C0013384	Dyskinetic syndrome	Levodopa	195
C0242422	Parkinsonian Disorders	Levodopa	71
C0030567	Parkinson Disease	Dopamine	49
C0393593	Dystonia Disorders	Levodopa	27
C0013421	Dystonia	Levodopa	26
C0426980	motor symptom	Levodopa	16
C0013384	Dyskinetic syndrome	Dopamine	15
C0013384	Dyskinetic syndrome	OXIDOPAMINE	15
C0030567	Parkinson Disease	Carbidopa	14

plot( results,
      type = "Network",
      interactive=T)

Figure 38: The Chemical-Disease Network for a chemical

A DiseaseClass plot is also available.

plot( results,
      type = "Network",
      class = "DiseaseClass",
      interactive=T)

Figure 39: The Chemical-Disease Class Network for a chemical

For VDAs

results <- chemical2disease( chemical  = "CHEMBL_CHEMBL1282" , type = "VDA", database =  "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-disease 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  5

tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical)  %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Diseases associated to imiquimod, type VDA",  align= "lllc")

Table 35: Diseases associated to imiquimod, type VDA
diseaseid	disease_name	chemical_name	pmids_chemical
C4721806	Skin Basal Cell Carcinoma	Imiquimod	2
C0025202	melanoma	Imiquimod	1
C0151779	Cutaneous Melanoma	Imiquimod	1
C0524910	Hepatitis C, Chronic	Ribavirin	1
C0524910	Hepatitis C, Chronic	Polyox WSR-N 60	1
C0524910	Hepatitis C, Chronic	Imiquimod	1
C0596263	Carcinogenesis	Imiquimod	1

plot( results,
      type = "Network", interactive=T)

Figure 40: The Chemical-Disease Network for a chemical

The chemical2variant function allows to retrieve the variants for a specific chemical, or list of chemicals.

results <- chemical2variant( chemical  = "CHEMBL_CHEMBL108", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  43

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, most_severe_consequence, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc")

Table 36: VDAs for carbamazepine
variantid	gene_symbols	most_severe_consequence	chemical_name	pmids_chemical
rs3812718	SCN1A	splice_donor_5th_base_variant	Carbamazepine	8
rs776746	ZSCAN25, CYP3A5	splice_acceptor_variant	Carbamazepine	6
rs1045642	ABCB1	missense_variant	Carbamazepine	5
rs1801133	MTHFR	missense_variant	Carbamazepine	4
rs2298771	SCN1A , LOC102724058	missense_variant	Carbamazepine	4
rs2032582	ABCB1	missense_variant	Carbamazepine	3
rs1051740	EPHX1	missense_variant	Carbamazepine	2
rs1389503611	EPHX1	missense_variant	Carbamazepine	2
rs15524	ZSCAN25, CYP3A5	3_prime_UTR_variant	Carbamazepine	2
rs1801131	MTHFR	missense_variant	Carbamazepine	2

The chemical2variant function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2variant( chemical  = "CHEMBL_CHEMBL108", database = "ALL", sift = c(0,0.05), polyphen = c(0.9,1)  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-variant 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  9

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, sift_score, polyphen_score, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc")

Table 37: VDAs for carbamazepine
variantid	gene_symbols	sift_score	polyphen_score	chemical_name	pmids_chemical
rs1045642	ABCB1	0.02	0.998	Carbamazepine	5
rs1051740	EPHX1	0.00	0.987	Carbamazepine	2
rs1389503611	EPHX1	0.01	0.995	Carbamazepine	2
rs762468188	TMEM63A, EPHX1	0.00	1.000	Carbamazepine	2
rs1045642	ABCB1	0.02	0.998	Phenytoin	1
rs1051740	EPHX1	0.00	0.987	CARBAMAZEPINE EPOXIDE	1
rs121912438	SOD1	0.00	0.967	Sod	1
rs121912438	SOD1	0.00	0.967	Carbamazepine	1
rs1553491169	SCN9A , SCN1A-AS1	0.00	0.956	Carbamazepine	1
rs1555085798	KCNA1	0.00	1.000	Carbamazepine	1

plot( results,
      type = "Network", interactive=T)

Figure 41: The Chemical-Variant Network for carbamazepine

Retrieving GDAs and VDAs associated to chemicals

Exploring the GDAs of a chemical

The chemical2gda function allows to retrieve the GDAS for a specific chemical, or list of chemicals.

results <- chemical2gda( chemical  = "CHEMBL_CHEMBL809", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  2003

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score, pmids_chemical)
knitr::kable(tab[1:10,], caption = "GDAs for sertraline")

Table 38: GDAs for sertraline
gene_symbol	disease_name	chemical_name	score	pmids_chemical
SLC6A4	Depressive disorder	Serotonin	1	90
SLC6A4	Depressive disorder	demeton-S-methyl	1	5
SLC6A4	Depressive disorder	Cyclohexyl isocyanate	1	2
SLC6A4	Depressive disorder	Norepinephrine	1	6
SLC6A4	Depressive disorder	Mirtazapine	1	1
SLC6A4	Depressive disorder	PAROXETINE HYDROCHLORIDE	1	1
SLC6A4	Depressive disorder	Interferon alfa	1	3
SLC6A4	Depressive disorder	Hydrocortisone	1	4
SLC6A4	Depressive disorder	[3H]CITALOPRAM	1	1
SLC6A4	Depressive disorder	Ethanol	1	2

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 42: Network for LEPR and metformin

Exploring the VDAs of a chemical

The chemical2vda function allows to retrieve the VDAS for a specific chemical, or list of chemicals.

results <- chemical2vda( chemical  = "CHEMBL_CHEMBL2010601", database = "ALL"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  56

The chemical2vda function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.

results <- chemical2vda( chemical  = "CHEMBL_CHEMBL2010601", 
                         database = "ALL", 
                         sift = c(0,0.05) , polyphen = c(0.9,1)  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  46

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score,pmids_chemical)
knitr::kable(tab[1:10,], caption = "VDAs associated ivacaftor")

Table 39: VDAs associated ivacaftor
variantid	disease_name	chemical_name	score	pmids_chemical
rs75527207	Cystic Fibrosis	Aztreonam	0.9	1
rs75527207	Cystic Fibrosis	Tobramycin	0.9	1
rs75527207	Cystic Fibrosis	Ivacaftor	0.9	37
rs75527207	Cystic Fibrosis	Colistin	0.9	1
rs75527207	Cystic Fibrosis	Mannitol	0.9	1
rs75527207	Cystic Fibrosis	Chloride ion	0.9	6
rs75527207	Cystic Fibrosis	Genistein	0.9	3
rs75527207	Cystic Fibrosis	Isoprenaline	0.9	1
rs75527207	Cystic Fibrosis	Elexacaftor	0.9	2
rs75527207	Cystic Fibrosis	Resveratrol	0.9	1

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 43: Network of VDAs

Exploring the GDA evidences of a chemical

The chemical2evidence function allows to retrieve the evidences for the GDAS or VDAs for a specific chemical, or list of chemicals.

results <- chemical2evidence( chemical  = "CHEMBL_CHEMBL3989936", type = "GDA" , database = "ALL" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-gda 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:         
##  . Results:  9

tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Disease = disease_name, Chemical = chemical_name,  `Chemical Effect` =chemical_effect,    Year=pmYear, Sentence = sentence, pmid = reference)
tab <- tab[ order(-tab$Year),]
tab[1:10, ] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for Vilaprisan" )

Table 40: Evidences for Vilaprisan
Gene	Disease	Chemical	Sentence	Chemical Effect	pmid	Year
PGR	Endometriosis	Vilaprisan	Vilaprisan is a highly potent selective progesterone receptor modulator in development for the treatment of symptomatic uterine fibroids and endometriosis.		34569009	2022
PGR	Endometriosis	Vilaprisan	Vilaprisan is a novel selective progesterone receptor modulator for the long-term treatment of uterine fibroids and endometriosis.		32716091	2021
PGR	Uterine Fibroids	Vilaprisan	Vilaprisan is a novel selective progesterone receptor modulator for the long-term treatment of uterine fibroids and endometriosis.		32716091	2021
PGR	Uterine Fibroids	Vilaprisan	Vilaprisan (VPR) is a new orally available selective progesterone receptor modulator (SPRM), with anti-proliferative activity against uterine fibroids (UFs).		31985366	2020
PGR	Renal Insufficiency	Vilaprisan	Pharmacokinetics and Safety of the Novel Selective Progesterone Receptor Modulator Vilaprisan in Participants With Renal Impairment.	other	32227643	2020
PGR	Uterine Fibroids	Mifepristone	Selective progesterone receptor modulators (SPRMs), such as Mifepristone, Asoprisnil, Ulipristal acetate (UPA) and Vilaprisan, were tested for their antiproliferative effects on uterine fibroids.		30845294	2018
PGR	Uterine Fibroids	Asoprisnil	Selective progesterone receptor modulators (SPRMs), such as Mifepristone, Asoprisnil, Ulipristal acetate (UPA) and Vilaprisan, were tested for their antiproliferative effects on uterine fibroids.		30845294	2018
PGR	Uterine Fibroids	ULIPRISTAL ACETATE	Selective progesterone receptor modulators (SPRMs), such as Mifepristone, Asoprisnil, Ulipristal acetate (UPA) and Vilaprisan, were tested for their antiproliferative effects on uterine fibroids.		30845294	2018
PGR	Uterine Fibroids	Vilaprisan	Selective progesterone receptor modulators (SPRMs), such as Mifepristone, Asoprisnil, Ulipristal acetate (UPA) and Vilaprisan, were tested for their antiproliferative effects on uterine fibroids.		30845294	2018
					NA

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 44: Chemicals associated to Parkinson

Exploring the VDA evidences of a chemical

results <- chemical2evidence( chemical  = "CHEMBL_CHEMBL502", type = "VDA" , database = "TEXTMINING_HUMAN" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical-vda 
##  . Database:     TEXTMINING_HUMAN 
##  . Score:        0-1 
##  . Term:         
##  . Results:  14

tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
                            `Chemical Effect` =chemical_effect,  Year=pmYear, Sentence = sentence, pmid = reference )
tab <- tab[ order(-tab$Year),]
tab[1:10,] %>%  dplyr::mutate(
    pmid = kableExtra::cell_spec(pmid,  link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>% 
  knitr::kable(format = 'markdown', row.names = F,  caption = "Evidences for Donepezil" )

Table 41: Evidences for Donepezil
variantid	Disease	Chemical	Sentence	Chemical Effect	pmid	Year
rs3793790	Alzheimer’s Disease	Donepezil	Association of CHAT Gene Polymorphism rs3793790 and rs2177370 with Donepezil Response and the Risk of Alzheimer’s Disease Continuum.	therapeutic	38894884	2024
rs2177370	Alzheimer’s Disease	Donepezil	Association of CHAT Gene Polymorphism rs3793790 and rs2177370 with Donepezil Response and the Risk of Alzheimer’s Disease Continuum.	therapeutic	38894884	2024
rs1080985	Alzheimer’s Disease	Donepezil	The CYP2D6 SNP rs1080985 might be a useful pharmacogenetic marker of the long-term therapeutic response to donepezil in patients with AD.	therapeutic	34120801	2022
rs1135840	Alzheimer’s Disease	Donepezil	Our results suggests that CYP2D6*10 strongly influences Cpss and there is a trend toward better outcomes of donepezil in patients with AD.	therapeutic	31564952	2019
rs1065852	Alzheimer’s Disease	Donepezil	Our results suggests that CYP2D6*10 strongly influences Cpss and there is a trend toward better outcomes of donepezil in patients with AD.	therapeutic	31564952	2019
rs1065852	Alzheimer’s Disease	Donepezil	The roles of apolipoprotein E3 and CYP2D6 (rs1065852) gene polymorphisms in the predictability of responses to individualized therapy with donepezil in Han Chinese patients with Alzheimer’s disease.	therapeutic	26768225	2016
rs1080985	Alzheimer’s Disease	Donepezil	Recent data have indicated that the rs1080985 single nucleotide polymorphism (SNP) of the cytochrome P450 (CYP) 2D6 and the common apolipoprotein E (APOE) gene may affect the response to donepezil in patients with Alzheimer’s disease (AD).	therapeutic	25538729	2014
rs1080985	Alzheimer’s Disease	Donepezil	Influence of rs1080985 single nucleotide polymorphism of the CYP2D6 gene on response to treatment with donepezil in patients with alzheimer’s disease.	therapeutic	23950644	2013
rs1065852	Alzheimer’s Disease	Donepezil	Effect of CYP2D6*10 and APOE polymorphisms on the efficacy of donepezil in patients with Alzheimer’s disease.	therapeutic	22986607	2013
rs1135840	Alzheimer’s Disease	Donepezil	Effect of CYP2D6*10 and APOE polymorphisms on the efficacy of donepezil in patients with Alzheimer’s disease.	therapeutic	22986607	2013

To visualize the results use the plot function.

plot(results, type="Network",   interactive=T, limit=50)

Figure 45: Evidence network

Exploring the attributes of a chemical

The chemical2attribute function allows to retrieve the information for a specific chemical, or list of chemicals.

results <- chemical2attribute( chemical  = "CHEMBL_CHEMBL25"  )

## Warning: Unknown or uninitialised column: `chemicalid`.
## Unknown or uninitialised column: `chemicalid`.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        chemical 
##  . Database:     ALL 
##  . Score:         
##  . Term:         
##  . Results:  5

tab <-results@qresult %>% select(chemID, chemVocabulariesCrossreferences, chemPrefName)
knitr::kable(tab, caption = "Attributes for Acetylsalic acid")

Table 42: Attributes for Acetylsalic acid
chemID	chemVocabulariesCrossreferences	chemPrefName
C-153605	CHEMBL_CHEMBL25	Acetylsalicylic acid
C-153605	CHEBI_15365	Acetylsalicylic acid
C-153605	DRUGBANK_DB00945	Acetylsalicylic acid
C-153605	MESH_D001241	Acetylsalicylic acid
C-153605	PUBCHEM_2244	Acetylsalicylic acid

Retrieving Disease-Disease Associations from DISGENET

The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).

Searching DDAs by genes for a single disease

To obtain disease-disease associations, use the disease2disease function. This function uses as input a disease, in the same format that in disease2gene, the database to perform the search (by default, CURATED), and the argument relationship, to indicate the type of relationship of the disease pair. If the relationship is set to “has_shared_genes”, arguments such as min_genes, the minimum number of shared genes between the disease(s) of interest, and jg, the Jaccard Index for genes, can be defined. By default min_genes = 0. If the relationship is set to “has_shared_variants”, similar arguments to filter the results of the search can be defined.

The output is a DataGeNET.DGN object that contains the top diseases that share genes with the disease that has been searched.

The DataGeNET.DGN object produced by the disease2disease function also contains the Jaccard Index, also known as the Jaccard similarity coefficient for each disease pair. The Jaccard Coefficient is a similarity metric, computed as the size of the intersection divided by the size of the union of two sample sets, in this case, the genes associates to each disease:

\[\begin{equation*} J(A, B) = \frac{\mid A \cap B \mid}{\mid A \cup B \mid} \end{equation*}\]

We calculate a p value to estimate the significance of the Jaccard coefficient for a list of disease pairs. The p value is estimated using a Fisher exact test. The pvalue column displays the minus logarithm of the p value for the Jaccard Index, and is available for disease-disease associations by shared genes and by shared variants.

results <- disease2disease(
  disease  = "UMLS_C0010674", relationship = "has_shared_genes",
  database = "CURATED" ,   min_genes =2 )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:        UMLS_C0010674 
##  . Results:  11

Table 43 shows the diseases that share at least a gene with Cystic Fibrosis (UMLS_C0010674) in DISGENET curated.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share genes with Cystic Fibrosis")

Table 43: Diseases that share genes with Cystic Fibrosis
disease1_Name	disease2_Name	jaccard_genes	shared_genes	pvalue_jaccard_genes
Cystic Fibrosis	COPD	0.11724	17	22.4
Cystic Fibrosis	BESC1	0.13793	8	19.2
Cystic Fibrosis	SYSTEMIC LUPUS ERYTHEMATOSIS	0.08589	14	16.3
Cystic Fibrosis	CBAVD	0.11864	7	15.8
Cystic Fibrosis	Hereditary pancreatitis	0.12308	8	15.4
Cystic Fibrosis	High blood pressure	0.04971	17	14.4
Cystic Fibrosis	Alzheimer Disease	0.05534	14	12.8
Cystic Fibrosis	Adult-Onset Diabetes Mellitus	0.04043	15	11.3
Cystic Fibrosis	Obstructive azoospermia	0.05085	3	6.5
Cystic Fibrosis	Cardiomyopathy	0.02952	8	5.4

Visualizing the diseases associated to a single disease

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share genes with it (Figure 46). The node size is proportional to the number of genes associated to each disease.

plot( results, 
      type = "Network",
      interactive=T )

Figure 46: The Disease-Disease Network by shared genes for Cystic Fibrosis

Searching DDAs via genes for multiple diseases

The function disease2disease can also use as an input a list of diseases in any of the previously described vocabularies. It will retrieve the top diseases that share genes with each of the diseases in the input list.

Table 44 shows the disease list selected for illustrating the disease2disease function

Table 44: Examples of Congenital metabolic diseases
UMLS_CUI	Disease_Name
C0162671	MELAS Syndrome
C0023264	Leigh Disease
C0917796	Optic Atrophy, Hereditary, Leber

diseasesOfInterest <-  paste0("UMLS_", c("C0162671", "C0023264", "C0917796"))
results <- disease2disease(
              disease = diseasesOfInterest, relationship = "has_shared_genes",
              database = "CURATED",
              min_genes  = 20, 
              order_by = "JACCARD_GENES" )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0162671 ... UMLS_C0917796 
##  . Results:  35

Table 45 shows the diseases that share at least 20 genes with the diseases of interest.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share at list 20 genes with the diseases of interest")

Table 45: Diseases that share at list 20 genes with the diseases of interest
disease1_Name	disease2_Name	jaccard_genes	shared_genes	pvalue_jaccard_genes
Leber’s optic atrophy	MELAS Syndrome	0.62963	34	84
MELAS Syndrome	Leber’s optic atrophy	0.62963	34	84
Encephalomyelopathies, Subacute Necrotizing	Mitochondrial Diseases	0.23741	66	83
MELAS Syndrome	Mitochondrial Diseases	0.20652	38	69
Leber’s optic atrophy	MC5DM1	0.55319	26	68
Leber’s optic atrophy	NEUROPATHY, ATAXIA, AND RETINITIS PIGMENTOSA	0.55319	26	68
Leber’s optic atrophy	Camptodactyly of proximal interphalangeal joint	0.54167	26	66
Leber’s optic atrophy	Wide spaced nipples (finding)	0.50980	26	63
Leber’s optic atrophy	Scrotal hypoplasia	0.50980	26	63
Leber’s optic atrophy	postaxial polydactyly hands (physical finding)	0.50000	26	63

To obtain the network, set the class argument of the plot function to Network(Figure 47). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.

plot( results,
      type = "Network",
      interactive=TRUE)

Figure 47: The Disease-Disease Network by shared genes for a list of diseases

Searching DDAs via shared variants for a single disease

To obtain disease-disease associations via shared genetic variants, use the disease2disease function with the argument relationship equal to “has_shared_variants”, the database to perform the search (by default, CURATED), and the argument min_vars, the minimum number of shared variants between the disease(s) of interest. By default min_vars = 0. The output is a DataGeNET.DGN object that contains the top diseases that share variants with the disease that has been searched.
In the example, we have specified a minimum value for the Jaccard Index computed from the shared variants (jv = 0.05).

results <- disease2disease(
  disease  = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_shared_variants",
  database = "CURATED", jv = 0.01 )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-variant 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0028754 
##  . Results:  34

Table 46 shows the top diseases that share variants with Obesity and NIDDM.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","jaccard_variants","shared_variants", "pvalue_jaccard_variants")] )
tab <- tab[ order(-tab$shared_variants),]

knitr::kable(tab[1:10,], caption = "Top diseases that share variants with Obesity and NIDDM", row.names = F)

Table 46: Top diseases that share variants with Obesity and NIDDM
disease1_Name	disease2_Name	jaccard_variants	shared_variants	pvalue_jaccard_variants
Adult-Onset Diabetes Mellitus	WOLFRAM SYNDROME 1	0.04687	170	315
Adult-Onset Diabetes Mellitus	DFNA38	0.04508	163	305
Adult-Onset Diabetes Mellitus	WOLFRAM-LIKE SYNDROME, AUTOSOMAL DOMINANT	0.04522	160	330
Adult-Onset Diabetes Mellitus	HYPERINSULINEMIC HYPOGLYCEMIA, FAMILIAL, 1	0.04250	160	254
Adult-Onset Diabetes Mellitus	CTRCT41	0.04509	159	330
Adult-Onset Diabetes Mellitus	Decreased HDL	0.01764	150	63
Adult-Onset Diabetes Mellitus	Maturity onset diabetes mellitus in young	0.02786	124	123
Adult-Onset Diabetes Mellitus	DIABETES MELLITUS, TRANSIENT NEONATAL, 2	0.02612	93	182
Adult-Onset Diabetes Mellitus	NAFLD - Nonalcoholic Fatty Liver Disease	0.02405	88	139
Adult-Onset Diabetes Mellitus	HYPOGLYCEMIA, LEUCINE-INDUCED	0.02497	88	201

The plot function applied to the DataGeNET.DGN object generated by the disease2disease function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share variants with it (Figure 48). The node size is proportional to the number of variants associated to each disease.

plot( results, 
      type = "Network",
       interactive=T )

Figure 48: The Disease-Disease Network by shared variants

Searching DDAs via semantic relationships

To obtain disease-disease associations via semantic relationships, use the disease2disease function with the argument relationship equal to one of the following types of semantic relations: has_manifestation, has_associated_morphology, manifestation_of, associated_morphology_of, is_finding_of_disease, due_to, has_definitional_manifestation, has_associated_finding, definitional_manifestation_of, disease_has_finding, cause_of, associated_finding_of.

The output is a DataGeNET.DGN object that contains the diseases that have the type of relationship defined in the query with the query disease.

results <- disease2disease(
  disease  = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_manifestation", min_sokal = 0.7, order_by = "SOKAL",
  database = "CURATED"  )
results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-disease-rela 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       UMLS_C0011860 ... UMLS_C0028754 
##  . Results:  20

Table 47 shows the diseases associated with Obesity and Diabetes Mellitus non Insulin dependent (NIDDM) by the relation type “has_manifestation”.

tab <- unique(results@qresult[  ,c("disease1_Name", "disease2_Name","ddaRelation","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab , caption = "Diseases associated with Obesity and NIDDM")

Table 47: Diseases associated with Obesity and NIDDM
disease1_Name	disease2_Name	ddaRelation	shared_genes	pvalue_jaccard_genes
Adult-Onset Diabetes Mellitus	KERATODERMA-ICHTHYOSIS-DEAFNESS SYNDROME, AUTOSOMAL RECESSIVE	has_manifestation	2	2.77
Obesity	OBESITY, HYPERPHAGIA, AND DEVELOPMENTAL DELAY	has_manifestation	1	1.66
Obesity	PHP1C	has_manifestation	1	1.66
Obesity	BARDET-BIEDL SYNDROME 18	has_manifestation	1	1.66
Obesity	Bardet-Biedl syndrome 4	has_manifestation	1	1.66
Obesity	SBIDDS	has_manifestation	1	1.66
Obesity	Pseudo Pseudohypoparathyroidism	has_manifestation	1	1.66
Obesity	CHOPS SYNDROME	has_manifestation	1	1.66
Adult-Onset Diabetes Mellitus	MODY, TYPE 13	has_manifestation	1	1.62
Obesity	BBS1	has_manifestation	2	1.44
Obesity	PSEUDOHYPOPARATHYROIDISM, TYPE IA	has_manifestation	1	1.36
Obesity	PWLS	has_manifestation	1	1.36
Obesity	HYPOGONADOTROPIC HYPOGONADISM 27 WITHOUT ANOSMIA	has_manifestation	1	1.36
Obesity	CORTRD2	has_manifestation	1	1.36
Adult-Onset Diabetes Mellitus	IDDHH	has_manifestation	1	1.32
Obesity	BARDET-BIEDL SYNDROME 6	has_manifestation	1	1.19
Obesity	Bardet-Biedl syndrome 2	has_manifestation	1	1.19
Obesity	WAGR Syndrome	has_manifestation	1	0.90
Obesity	9q- Syndrome	has_manifestation	1	0.84
Obesity	DiGeorge’s syndrome	has_manifestation	1	0.46

Searching diseases similar to a disease of interest

It is possible to obtain the most similar diseases according to the Sokal-Sneath semantic similarity distance using the the get_similar_diseases function. The disease similarity between concepts is computed using the Sokal-Sneath semantic similarity distance (Sánchez and Batet 2011) on the taxonomic relations provided by the Unified Medical Language System Metathesaurus. Only the relationships of type is-a (which describe the taxonomy in any ontology) are taken into account. The get_similar_diseases function uses as input a disease, and as an optional argument min_sokal, a minimum value for the Sokal distance. By default min_sokal = 0.1.

results <- get_similar_diseases(
  disease  = "UMLS_C0011860",
    min_sokal = 0.6)
results

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-sokal 
##  . Database:     ALL 
##  . Score:         
##  . Term:        UMLS_C0011860 
##  . Results:  143

In the Table 48, the top diseases associated to the disease, by Sokal distance

tab <- unique(results@qresult[  ,c("disease1_Name",  "disease2_Name","sokal")] )
knitr::kable(tab[1:10,], caption = "Diseases semantically similar to NIDDM")

Table 48: Diseases semantically similar to NIDDM
disease1_Name	disease2_Name	sokal
Adult-Onset Diabetes Mellitus	Diabetes Mellitus	0.830
Adult-Onset Diabetes Mellitus	Glucose Intolerance	0.821
Adult-Onset Diabetes Mellitus	Diabetes Mellitus, Insulin-Dependent	0.706
Adult-Onset Diabetes Mellitus	Hyperglycemia	0.695
Adult-Onset Diabetes Mellitus	Diabetic Retinopathies	0.687
Adult-Onset Diabetes Mellitus	Diabetic Nephropathies	0.685
Adult-Onset Diabetes Mellitus	Diabetes, Gestational	0.684
Adult-Onset Diabetes Mellitus	Syndrome X, Reaven	0.677
Adult-Onset Diabetes Mellitus	Prediabetic State	0.677
Adult-Onset Diabetes Mellitus	Insulin Resistance	0.668

Disease enrichment

The disease_enrichment function performs a disease enrichment (or over-representation) analysis. It determines whether a user-defined set of genes is statistically significantly associated with a disease gene set in DISGENET.

The function takes as input a list of entities, either genes or variants. They are compared against the gene/variant-disease associations in the selected database (by default, ALL) to determine the diseases associated with the given gene list. The genes can be identified with HGNC, ENSEMBL or Entrez identifiers.

The database parameter allows users to choose which data source to use: CURATED for curated gene-disease associations (the default option), CLINICALTRIALS for associations extracted from ClinicalTrials.gov, or ALL to include all available databases. The number of genes on the selected data source is used as background or universe of the over-representation test.

The common_entities parameter sets the minimum number of entities that must be shared with a disease for it to be considered in the analysis; the default is 1. The max_pvalue parameter sets a threshold for the p-value from the Fisher test (default is 0.05).

For genes

Below, an example of how to perform a disease enrichment with a list of genes extracted associated to Autism from the Developmental Brain Disorder Gene Database (Gonzalez-Mantilla et al. 2016).

genes <- c("ADNP", "ANKRD11", "ANKRD17",  "ASXL1",  "BCKDK",  "BRSK2",  "CDK13",  "CDK8",  "CHD2",  "CHD7",  "CHD8",  "CLCN2",  "CREBBP",  "CSDE1",  "CTCF",  "CTNNB1",  "DDX3X",  "FOXP1",  "GFER",  "H4C3",  "HNRNPUL2",  "IQSEC2",  "ITSN1",  "JARID2",  "LRP2",  "MARK2",  "MBOAT7",  "MYT1L",  "NAA15",  "NALCN",  "NAV3",  "NEXMIF" ,  "NSD1",  "PHF21A",  "POGZ",  "PRR12",  "QRICH1",  "SCAF1",  "SCN1A",  "SCN2A",  "SETD5",  "SHANK3",  "SIN3A",  "SOX11",  "SOX6",  "TANC2",  "TBCD",  "TCF20" ,  "TCF4",  "TCF7L2",  "TRAF7",  "TRIP12",  "WAC",  "WDR26",  "ZEB2",  "ZMYM2",  "ZNF292",  "ZSWIM6" )
results <- disease_enrichment(
   entities  = genes,
   common_entities = 5,
    vocabulary = "HGNC", database = "CURATED")

## Your query has 1 page.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       ADNP ... ZSWIM6

In the Table 49, the top diseases associated to the list of genes.

tab <- unique(results@qresult[  ,c("diseaseName",  "geneRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of genes")

Table 49: Diseases significantly associated with the list of genes
	diseaseName	geneRatio	bgRatio	pvalue
1	Non-specific syndromic intellectual disability	19/58	194/13581	0.00e+00
2	Epilepsy	9/58	147/13581	2.00e-07
3	Rare genetic syndromic intellectual disability	5/58	51/13581	6.44e-05
NA
NA.1
NA.2
NA.3
NA.4
NA.5
NA.6

To visualize the results of the enrichment, use the function plot. Use the argument cutoff to set a minimum p value threshold, and the argument limit to reduce the number of records shown (Figure 49). By default, the limit=50. The node size is proportional to the number of intersection between the user list and the disease.

plot( results, type = "Enrichment", count =4,  cutoff= 0.05)

Figure 49: The Enrichment plot for a list of genes

For variants

Below, an example of how to perform a disease enrichment with a list of variants extracted from the publication Genomic Landscape and Mutational Signatures of Deafness-Associated Genes (Azaiez et al. 2018).

results <- disease_enrichment(
   entities  =  c("rs80338902","rs397516871","rs368341987","rs375050157","rs111033280","rs140884994","rs201076440","rs111033439","rs1296612982","rs41281314","rs397516875","rs143282422","rs142381713","rs35818432","rs111033225","rs200104362","rs201004645","rs34988750","rs373169422","rs397517356","rs188376296","rs199897298","rs200263980","rs200416912","rs184866544","rs397517344","rs41281310","rs727503066","rs727504710","rs143240767","rs145771342","rs376898963","rs397516878","rs181255269","rs188498736","rs111033192","rs117966637","rs914189193","rs181611778","rs111033194","rs111033248","rs111033262","rs111033333","rs111033529","rs146824138","rs483353055","rs528089082","rs747131589","rs111033536","rs45629132","rs371142158","rs727504654","rs192524347","rs527236122","rs111033186","rs111033287","rs139889944","rs200454015","rs397517328","rs111033275","rs150822759","rs200038092","rs201709513","rs370155266","rs45500891","rs111033196","rs111033360","rs397517322","rs111033524","rs727505166","rs79444516","rs35730265","rs45549044","rs111033361","rs370696868","rs727504309","rs533231493"),
    vocabulary = "DBSNP", database = "CURATED",)

## Your query has 1 page.

results

## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-enrichment 
##  . Database:     CURATED 
##  . Score:         
##  . Term:       rs80338902 ... rs533231493

In the Table 50, the top diseases associated to the list of variants

tab <- unique(results@qresult[  ,c("diseaseName",  "variantRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of variants")

Table 50: Diseases significantly associated with the list of variants
diseaseName	variantRatio	bgRatio
USHER SYNDROME, TYPE IIA	28/77	1461/696672
Usher Syndrome, Type I	26/77	1282/696672
RETINITIS PIGMENTOSA 39	21/77	1128/696672
Deafness, Autosomal Recessive 1A	15/77	245/696672
USHER SYNDROME, TYPE ID	12/77	513/696672
DEAFNESS, AUTOSOMAL RECESSIVE 2	12/77	536/696672
Usher Syndrome	10/77	336/696672
Deafness, Autosomal Dominant 3A	8/77	106/696672
Deafness, Autosomal Recessive 12	10/77	516/696672
Senter syndrome	6/77	61/696672

Figure 50 shows the results of the enrichment.

plot( results, type = "Enrichment", count =9,  cutoff= 0.05)

Figure 50: The Enrichment plot for a list of variants

Versions

Get DISGENET data version

get_disgenet_version()

## [1] "{ status : OK , payload :{ apiVersion : 1.8.0 , dataVersion : DISGENET v25.1 , lastUpdate : March 31 2025 , version : DISGENET v25.1 }, httpStatus :200}"

disgenet2r version

## Version: 1.2.3

COPYRIGHT

License

disgenet2r is distributed under the GPL-2 license.

References

Azaiez, Hela, Kevin T. Booth, Sean S. Ephraim, Bradley Crone, Elizabeth A. Black-Ziegelbein, Robert J. Marini, A. Eliot Shearer, et al. 2018. “Genomic Landscape and Mutational Signatures of Deafness-Associated Genes.” The American Journal of Human Genetics 103 (4): 484–97. https://doi.org/10.1016/j.ajhg.2018.08.006.

Gonzalez-Mantilla, Andrea J., Andres Moreno-De-Luca, David H. Ledbetter, and Christa Lese Martin. 2016. “A Cross-Disorder Method to Identify Novel Candidate Genes for Developmental Brain Disorders.” JAMA Psychiatry 73 (3): 275–83. https://doi.org/10.1001/jamapsychiatry.2015.2692.

Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. 2019. “The DisGeNET knowledge platform for disease genomics: 2019 update.” Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkz1021.

Piñero, Janet, Josep Saüch, Ferran Sanz, and Laura I. Furlong. 2021. “The DisGeNET Cytoscape App: Exploring and Visualizing Disease Genomics Data.” Computational and Structural Biotechnology Journal 19: 2960–67. https://doi.org/https://doi.org/10.1016/j.csbj.2021.05.015.

Sánchez, David, and Montserrat Batet. 2011. “Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.” Journal of Biomedical Informatics 44 (5): 749–59. https://doi.org/10.1016/j.jbi.2011.03.013.