disgenet2r: An R package to explore the molecular underpinnings of human diseases
1 Introduction
The disgenet2r package contains a set of functions to retrieve, visualize and expand DISGENET data (Piñero et al. 2021, 2019). DISGENET is a comprehensive discovery platform that integrates more than 30 millions associations between genes, variants, and human diseases. The information in DISGENET has been extracted from expert-curated resources and from the literature using state-of-the-art text mining technologies (Table 1.1).
To use DISGENET and the disgenet2r package, you need to acquire a license. Please contact us at info@disgenet.com for license conditions and pricing.
| Source_Name | Type_of_data | Description |
|---|---|---|
| ALL | GDAs/VDAs | All data sources |
| CLINICALTRIALS | GDAs | Data from ClinicalTrials.gov |
| CLINVAR | GDAs/VDAs | The ClinVar database |
| CLINGEN | GDAs/VDAs | The Clinical Genome Resource |
| CLINPGX | GDAs/VDAs | The Clinical Pharmacogenomics Resource |
| GENCC | GDAs | The Gene Curation Coalition |
| UNIPROT | GDAs/VDAs | The Universal Protein Resource (UniProt) |
| CURATED | GDAs/VDAs | Human curated sources: ClinGen, ClinVar, ClinPGX, GenCC, UniProt, Orphanet, PsyGeNET, MGD, and RGD |
| FINNGEN | GDAs/VDAs | FinnGen data |
| UK BIOBANK | GDAs/VDAs | UK Biobank GWAS data |
| GWASCAT | GDAs/VDAs | The NHGRI-EBI GWAS Catalog |
| PHEWASCAT | GDAs/VDAs | The PHEWAS Catalog |
| HPO | GDAs | Human Phenotype Ontology |
| INFERRED | GDAs | Inferred data from the HPO and the GWAS and PHEWAS Catalogs, and from UK and FinnGen biobanks |
| MGD_HUMAN | GDAs | Mouse Genome Database, human data |
| MGD_MOUSE | GDAs | Mouse Genome Database, mouse data |
| MODELS | GDAs | Data from animal models: MGD mouse, RGD rat, and text-mining models |
| ORPHANET | GDAs | The portal for rare diseases and orphan drugs (Orphanet) |
| PSYGENET | GDAs | Psychiatric disorders Gene Association NETwork (PsyGeNET) |
| RGD_HUMAN | GDAs | Rat Genome Database, human data |
| RGD_RAT | GDAs | Rat Genome Database, rat data |
| TEXT MINING HUMAN | GDAs/VDAs | Data from text mining of Medline abstracts (human) |
| TEXT MINING MODELS | GDAs | Data from text mining of Medline abstracts (animal models) |
You can test DISGENET and the disgenet2r package by registering for a free trial account here.
In the following document, we illustrate how to use the disgenet2r package through a series of examples.
2 Getting Started
2.1 Installation
The package disgenet2r is available through GitLab. The package requires an R version > 3.5.
Install disgenet2r by typing in R:
To load the package:
2.2 Authentication
Once you have completed the registration process, go to your user profile…
… and retrieve your API key
After retrieving the API key from your user profile, run the lines below so the key is available for all the disgenet2r functions.
2.3 Quick Start
The functions in the disgenet2r package receive as parameters one entity (gene, disease, variant, and chemical), or a list of entities (up to 100) and combinations of them. In addition, they have the following common parameters:
score: A vector with two elements: 1) initial value of score 2) final value of score. Default0-1.database: Name of the database that will be queried. DefaultCURATED. It can take the values: ‘CLINGEN’, ‘CLINPGX’, ‘CLINVAR’,‘GENCC’, ‘ORPHANET’, ‘PSYGENET’, ‘UNIPROT’, ‘CURATED’, ‘HPO’, ‘GWASCAT’, ‘PHEWASCAT’, ‘UKBIOBANK’, ‘FINNGEN’, ‘INFERRED’, ‘MGD_HUMAN’, ‘MGD_MOUSE’, ‘RGD_HUMAN’, ‘RGD_RAT’, ‘TEXTMINING_MODELS’, ‘MODELS’, ‘TEXTMINING_HUMAN’, “CLINICALTRIALS”, and ‘ALL’.n_pags: A number between 1 and 100 indicating the number of pages to retrieve from the results of the query. Default100.verbose: By defaultFALSE. Change it to TRUE to enable real-time logging from the function.order_by: By defaultscore. Depending on the type of query, it can accept the following values: score, dsi, dpi, pli, pmYear, ei, yearInitial, yearFinal, numCTsupportingAssociation.
Below, an example of a query for the BRCA1 gene in ALL the data. Notice that this query retrieves over 300 pages of results. Only the first 10,000 results will be retrieved (100 pages, 100 results per page).
3 Usage Limits
3.1 Trial account
Please note that the trial account enables you to test all the functions of the disgenet2r package, but the queries to DISGENET database have the following restrictions:
Only the top-30 results ordered by descending DISGENET score are returned (pagination is not supported).
Multiple-entity queries support at most 10 entities (genes, diseases, variants).
The access to DISGENET with a TRIAL account will expire after 7 days from the day of activation.
3.2 Academic account
Academics can access our expert-curated dataset.
3.3 Other plans
There are limits in place for the disgenet2r package to ensure smooth performance for all users. These limits apply to academics, advanced, and premium users, mirroring the limits of the DISGENET REST API.
Here’s a breakdown of the limitations:
A maximum of 100 pages of results are returned.
Multiple-entity queries support at most 100 entities (genes, diseases, variants).
Important Note: The package will display a warning message if you exceed these limits.
3.4 Recommendations for Efficient Use
To improve performance and avoid exceeding limits, consider querying with smaller batches of entities. You can also use DISGENET metrics and annotations to refine your search and reduce the number of returned results.
4 Entity Normalization
The entity_normalization function maps free-text biomedical terms to standardized identifiers. It takes an entity_type as a parameter, specifying the target namespace (e.g., disease, gene, chemical), and a term_list containing one or more free-text expressions separated by “|” for matching. Users can control match quality through minimum_similarity_threshold, which sets the cosine similarity cutoff between 0.0 and 1.0 (default 0.8), and can define how many candidates to return using results, which accepts values from 0 to 25 (default 5).
4.1 Genes
results <- entity_normalization(entity_type = "gene", term_list = "p53",
minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Gene Normalization Example") | term | entityType | normalizedId | normalizedName | similarity | matchedText |
|---|---|---|---|---|---|
| p53 | gene | 7157 | TP53 | 1.00000 | p53 |
| p53 | gene | 10042 | HMGXB4 | 0.94705 | P53N |
| p53 | gene | 8925 | HERC1 | 0.91898 | p532 |
| p53 | gene | 7158 | TP53BP1 | 0.90215 | p53B |
4.2 Diseases
results <- entity_normalization(entity_type = "disease", term_list = c("ALS", "MS"),
minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Disease Normalization Example") | term | entityType | normalizedId | normalizedName | similarity | matchedText |
|---|---|---|---|---|---|
| ALS | disease | C0002736 | Amyotrophic Lateral Sclerosis | 1.00000 | ALS |
| ALS | disease | C0268425 | Alstrom Syndrome | 0.91667 | ALSS |
| MS | disease | C0026769 | Multiple Sclerosis | 1.00000 | MS |
| MS | disease | C0026269 | Mitral Valve Stenosis | 1.00000 | MS |
| MS | disease | C1868685 | MULTIPLE SCLEROSIS, SUSCEPTIBILITY TO | 1.00000 | MS |
4.3 Chemicals
results <- entity_normalization(entity_type = "chemical", term_list = c("aspirin", "paracetamol"),
minimum_similarity_threshold = 0.9)
tab <- results@qresult
knitr::kable(tab , caption = "Chemical Normalization Example") | term | entityType | normalizedId | normalizedName | similarity | matchedText |
|---|---|---|---|---|---|
| aspirin | chemical | CHEMBL25 | Acetylsalicylic acid | 1 | aspirin |
| paracetamol | chemical | CHEMBL112 | Acetaminophen | 1 | paracetamol |
5 Gene-Disease Associations (GDAs)
5.1 Searching by gene
The gene2disease function retrieves the GDAs in DISGENET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary. By default, vocabulary = "HGNC". To switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.
The function also requires the user to specify the source database using the argument database. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from ClinGen, ClinVar, ClinPGX, MGD (Human data), RGD (Human data), GenCC, PsyGeNET, UniProt, and Orphanet.
The information can be filtered using the DISGENET score. The argument score consists of a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1).
5.1.1 Single gene
In the example, the query for the Leptin Receptor (Gene Symbol LEPR, and Entrez NCBI Identifier 3953) is performed in the curated data in DISGENET.
The function gene2disease produces an object DataGeNET.DGN that contains the results of the query.
## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"
Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single or list), the type of entity (gene-disease), the selected database (CURATED), the score range used in the search (0-1), and the gene NCBI identifier (3953).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: CURATED
## . Score: 0-1
## . Term: 3953
## . Results: 72
To obtain the data frame with the results of the query
## gene_symbol geneid ensemblid geneNcbiType geneDSI geneDPI genepLI
## 1 LEPR 3953 ENSG00000116678 protein-coding 0.432 0.875 8.8607e-05
## 2 LEPR 3953 ENSG00000116678 protein-coding 0.432 0.875 8.8607e-05
## 3 LEPR 3953 ENSG00000116678 protein-coding 0.432 0.875 8.8607e-05
## uniprotids protein_classid protein_class_name
## 1 P48357 DTO_05007599 Signaling
## 2 P48357 DTO_05007599 Signaling
## 3 P48357 DTO_05007599 Signaling
## disease_name diseaseType diseaseUMLSCUI
## 1 Obesity [disease] C0028754
## 2 Diabetes Mellitus, Non-Insulin-Dependent [disease] C0011860
## 3 Hyperphagia [phenotype] C0020505
## diseaseClasses_MSH
## 1 Nutritional and Metabolic Diseases (C18), Pathological Conditions, Signs and Symptoms (C23)
## 2 Nutritional and Metabolic Diseases (C18), Endocrine System Diseases (C19)
## 3 Pathological Conditions, Signs and Symptoms (C23)
## diseaseClasses_UMLS_ST
## 1 Disease or Syndrome (T047)
## 2 Disease or Syndrome (T047)
## 3 Finding (T033)
## diseaseClasses_DO
## 1 disease of metabolism (0014667)
## 2 genetic disease (630), disease of metabolism (0014667)
## 3
## diseaseClasses_HPO
## 1 Growth abnormality (01507)
## 2 Abnormality of metabolism/homeostasis (01939), Abnormality of the endocrine system (00818)
## 3 Abnormality of the nervous system (00707)
## disease_prevalence_class disease_prevalence_geo_area disease_prevalence_type
## 1
## 2
## 3
## disease_inheritance numCTsupportingAssociation numPMIDs
## 1 19 15
## 2 4 5
## 3 1 3
## chemsIncludedInEvidenceBySource numChemsIncludedInEvidences
## 1 NA NA
## 2 NA NA
## 3 NA NA
## numPMIDSWithChemsIncludedInEvidences numNCTSWithChemsIncludedInEvidences
## 1 NA NA
## 2 NA NA
## 3 NA NA
## score yearInitial yearFinal evidence_index evidence_level diseaseid
## 1 1.00 1986 2023 0.8806306 <NA> C0028754
## 2 1.00 2010 2024 0.9569892 <NA> C0011860
## 3 0.95 1986 2007 1.0000000 <NA> C0020505
The same query can be performed using the Gene Symbol (LEPR) and the data source (TEXTMINING_HUMAN). Notice how the number of diseases associated to the Leptin Receptor has increased.
results <- gene2disease( gene = "LEPR",
vocabulary = "HGNC",
database = "TEXTMINING_HUMAN" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: LEPR
## . Results: 433
The same query can be performed using the ENSEMBL gene identifier of the LEPR gene (ENSG00000116678) by setting the vocabulary to ENSEMBL.
results <- gene2disease( gene = "ENSG00000116678",
vocabulary = "ENSEMBL",
database = "TEXTMINING_HUMAN" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: ENSG00000116678
## . Results: 433
Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.3,1) is used. Notice how the number of diseases associated to the Leptin Receptor drops when the score is restricted.
results <- gene2disease( gene = "LEPR",
vocabulary = "HGNC",
database = "ALL",
score =c(0.3,1))
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: ALL
## . Score: 0.3-1
## . Term: LEPR
## . Results: 97
In Table 5.1 are shown the top 10 diseases associated to the LEPR gene
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )
knitr::kable(tab[1:10,], caption = "Top diseases associated to LEPR" ) | gene_symbol | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| LEPR | Obesity | 1.00 | 1966 | 2026 |
| LEPR | Diabetes Mellitus, Non-Insulin-Dependent | 1.00 | 1966 | 2024 |
| LEPR | Hyperphagia | 0.95 | 1986 | 2023 |
| LEPR | Diabetes Mellitus | 0.90 | 1985 | 2025 |
| LEPR | Hyperinsulinism | 0.85 | 1986 | 2023 |
| LEPR | Morbid obesity | 0.85 | 1997 | 2022 |
| LEPR | Hypertensive disease | 0.85 | 1999 | 2025 |
| LEPR | Metabolic Syndrome X | 0.85 | 2000 | 2024 |
| LEPR | Liver carcinoma | 0.80 | 1996 | 2024 |
| LEPR | Non-alcoholic Fatty Liver Disease | 0.80 | 2009 | 2025 |
5.1.1.1 Visualizing the diseases associated to a single gene
The disgenet2r package offers two options to visualize the results of querying a single gene in DISGENET: a network showing the diseases associated to the gene of interest (Gene-Disease Network), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.
By default, the plot function produces a Gene-Disease Network on a DataGeNET.DGN object (Figure 5.1). In the Gene-Disease Network the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop parameter allows to adjust the size of the nodes, while the eprop parameter adjusts the width of the edges while keeping the proportionality to the score.
Figure 5.1: The Gene-Disease Network for the Leptin Receptor gene
Use interactive = TRUE to display an interactive plot (Figure 5.2).
Figure 5.2: The interactive Gene-Disease Network for the Leptin Receptor gene
The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to DiseaseClass (Gene-Disease Class Network, Figure 5.3). In the Gene-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Leptin Receptor is associated mainly to Nutritional and Metabolic Diseases. There diseases that do not have annotations to MeSH disease class will be shown as a warning.
Figure 5.3: The Disease Class Network for the Leptin Receptor Gene
5.1.1.2 Exploring the evidences associated to a gene
You can extract the evidences associated to a particular gene using the function gene2evidence. The evidence types in DISGENET are scientific publications (PMIDs), and clinical trials (NCTIDs).
Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease.
results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
disease ="UMLS_C3554225", database = "ALL")
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-evidence
## . Database: ALL
## . Score: 0-1
## . Term: LEPR
## . Results: 23
The results are shown in Table 5.2.
tab <- results@qresult
tab <- tab %>%
filter(reference_type == "PMID") %>%
select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))
tab <- tab %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate( pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) ) ) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY" ) | pmid | associationType | Year | Sentence |
|---|---|---|---|
| 25751111 | CausalMutation | 2015 | Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect. |
| 25751111 | GeneticVariation | 2015 | Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect. |
| 24319006 | CausalMutation | 2014 | Novel LEPR mutations in obese Pakistani children identified by PCR-based enrichment and next generation sequencing. |
| 24611737 | GeneticVariation | 2014 | Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population. |
| 24611737 | CausalMutation | 2014 | Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population. |
| 23616257 | CausalMutation | 2014 | Whole-exome sequencing identifies novel LEPR mutations in individuals with severe early onset obesity. |
| 22810975 | GeneticVariation | 2012 | Variants in the LEPR gene are nominally associated with higher BMI and lower 24-h energy expenditure in Pima Indians. |
| 18703626 | GeneticVariation | 2008 | Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor. |
| 18703626 | CausalMutation | 2008 | Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor. |
| 17229951 | CausalMutation | 2007 | Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor. |
| 17229951 | GeneticVariation | 2007 | Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor. |
| 17229951 | GeneticVariation | 2007 | Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor. |
| 16284652 | CausalMutation | 2005 | Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes. |
| 12646666 | GeneticVariation | 2003 | Binge eating as a major phenotype of melanocortin 4 receptor gene mutations. |
| 9537324 | CausalMutation | 1998 | A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. |
| 9860295 | GeneticVariation | 1998 | Transmission disequilibrium and sequence variants at the leptin receptor gene in extremely obese German children and adolescents. |
| 9537324 | GeneticVariation | 1998 | A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. |
| 9537324 | GeneticVariation | 1998 | A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. |
| 9144432 | GeneticVariation | 1997 | Amino acid variants in the human leptin receptor: lack of association to juvenile onset obesity. |
To visualize the results when there are many evidences, we suggest to use plot the results using the argument Points (Figure 5.4). It is important to set the parameter limit to 10,000, in order to include all the evidences in the plot.
results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
database = "ALL", score=c(0.7,1) )
plot(results, type="Points", interactive=T, limit=10000)Figure 5.4: The Evidences plot for the Leptin Receptor gene
5.1.2 Multiple genes
The gene2disease function can also receive as input a list of genes, either as Entrez NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 5.3) and then, we apply the function gene2disease.
| Name | Description |
|---|---|
| KCNE1 | potassium channel, voltage gated subfamily E regulatory beta subunit 1 |
| KCNE2 | potassium channel, voltage gated subfamily E regulatory beta subunit 2 |
| KCNH1 | potassium channel, voltage gated eag related subfamily H, member 1 |
| KCNH2 | potassium channel, voltage gated eag related subfamily H, member 2 |
| KCNG1 | potassium voltage-gated channel modifier subfamily G member 1 |
Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.
The gene2disease function also requires the user to specify the source database using the argument database, and optionally, the DISGENET score can also be applied to filter the results.
## Your query has 1 page.
## Warning in gene2disease(gene = myListOfGenes, database = "ALL", score = c(0.5, :
## One or more of the genes in the list is not in DISGENET ( 'ALL' ):
## - KCNG1
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: gene-disease
## . Database: ALL
## . Score: 0.5-1
## . Term: KCNE1 ... KCNH2
## . Results: 46
In Table 5.4, the top 10 diseases associated to the list of genes belonging to the voltage-gated potassium channel family.
tab <- results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] %>% unique() %>%
arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top GDAs for the list of genes belonging to the voltage-gated potassium channel family") | gene_symbol | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| KCNH2 | Long QT Syndrome | 1.00 | 1970 | 2026 |
| KCNH2 | Cardiac Arrhythmia | 1.00 | 1975 | 2026 |
| KCNH2 | Long Qt Syndrome 2 | 1.00 | 1990 | 2025 |
| KCNE1 | Jervell-Lange Nielsen Syndrome | 1.00 | 1993 | 2025 |
| KCNH2 | Short QT Syndrome 1 | 1.00 | 1999 | 2025 |
| KCNE2 | Long QT Syndrome | 1.00 | 1999 | 2024 |
| KCNH2 | Atrial Fibrillation | 0.95 | 2001 | 2025 |
| KCNE1 | LONG QT SYNDROME 5 | 0.90 | 1991 | 2022 |
| KCNH2 | Prolonged QT interval | 0.90 | 1995 | 2026 |
| KCNE1 | Long QT Syndrome | 0.90 | 1997 | 2025 |
5.1.2.1 Visualizing the diseases associated to multiple genes
By default, plotting a DataGeNET.DGN resulting of the query with a list of genes produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5.5).
Figure 5.5: The Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family
Set the argument interactive = TRUE to see an interactive network (Figure 5.6).
Figure 5.6: The interactive Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family
Setting the argument type to Heatmap produces a Gene-Disease Heatmap (Figure 5.7), where the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs. The argument nchars can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.
Figure 5.7: The Gene-Disease Heatmap for a list of genes belonging to the voltage-gated potassium channel family
These results can also be visualized as a Gene-Disease Class Heatmap by setting the argument type to Heatmap and class to DiseaseClass (Figure 5.8). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.
Figure 5.8: The Gene-Disease Class Heatmap for a list of genes belonging to the voltage-gated potassium channel family
Alternative, set the arguments type to Network and class to DiseaseClass to generate a Gene-Disease Class Network (Figure 5.9).
Figure 5.9: The Gene-Disease Class Network for a list of genes belonging to the voltage-gated potassium channel family
5.1.2.2 Exploring the evidences associated to a list of genes
First, create the object gene-evidence using the gene2evidence function.
## Your query has 28 pages.
To visualize the results set the argument class=Points (Figure 5.10).
Figure 5.10: The Evidences plot for a list of genes belonging to the voltage-gated potassium channel family
5.1.2.3 Exploring the Clinical trials associated to a list of genes
First, create the object gene-evidence using the gene2evidence function.
results <- gene2evidence(gene = c("MMP1", "MMP2", "MMP3", "MMP9", "MMP10"),
database = "CLINICALTRIALS", verbose = TRUE )## Your query has 13 pages.
To visualize the results set the argument class=Points and the argument reference_type to NCTID (Figure 5.11).
Figure 5.11: The Evidences plot for a list of MMPs in clinical trials
5.1.3 Filtering chemical
You can search GDAs by chemicals by specifying a chemical identifier using the chemical filter in the gene2disease function. Table 5.5 shows the diseases associated to LEPR associated to metformin.
results <- gene2disease( gene = "LEPR", vocabulary = "HGNC",
database = "TEXTMINING_HUMAN",
chemical = "CHEMBL_CHEMBL1431" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: LEPR
## . Results: 4
tab <- results@qresult
tab <-tab%>% dplyr::select(chemical_name, gene_symbol, disease_name, score)
knitr::kable(tab, caption = "GDAs for LEPR and metformin") | chemical_name | gene_symbol | disease_name | score |
|---|---|---|---|
| Metformin | LEPR | Hyperinsulinism | 0.85 |
| Metformin | LEPR | Steatohepatitis | 0.35 |
| Metformin | LEPR | Increased insulin level | 0.35 |
| Metformin | LEPR | Fatty degeneration | 0.20 |
5.1.3.1 Retrieving the chemicals associated to a gene
For GDAs that have a chemical annotation, we can perform a query with a gene, or list of genes, to retrieve the chemicals annotated to this associations.
results <- gene2chemical( gene = "PDGFRA",
vocabulary = "HGNC",
database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.8-1
## . Term: PDGFRA
## . Results: 15
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type == "PMID") %>% dplyr::select(disease_name, chemical_name, chemical_effect,sentence, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name,
Chemical = chemical_name, `Chemical effect` = chemical_effect,
Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid ) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Selection of chemicals associated to PDGFRA" ) | Disease | Chemical | Chemical effect | Sentence | pmid | Year |
|---|---|---|---|---|---|
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | Imatinib is the first-line treatment for advanced gastrointestinal stromal tumors (GISTs) harboring KIT or PDGFRA mutations. | 41559406 | 2026 |
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | Patients with unresectable or metastatic GISTs harboring the D842V mutation in the PDGFRA gene have a poor prognosis due to intrinsic resistance to imatinib and all other approved tyrosine kinase inhibitors. | 40349140 | 2025 |
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | First-line imatinib therapy can be employed to treat GISTs harboring mutations in the tyrosine-protein kinase KIT (KIT) and platelet-derived growth factor receptor α (PDGFRα) genes to reduce the tumor size to resectable levels and minimize surgical risks. | 40276085 | 2025 |
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | In NF1-associated GIST, KIT and PDGFRA mutations are frequently absent and imatinib is ineffective. | 39811049 | 2025 |
| Eosinophilia | Imatinib | toxicity | Clonal eosinophilia with exclusive pulmonary involvement driven by PDGFRA rearrangement treated with imatinib: A case report. | 40115037 | 2025 |
| Gastrointestinal Stromal Tumors | Ripretinib | therapeutic | Ripretinib, a broad-spectrum inhibitor of the KIT and PDGFRA receptor tyrosine kinases, is designated as a fourth-line treatment for gastrointestinal stromal tumor (GIST). | 38973363 | 2024 |
| Gastrointestinal Stromal Tumors | Avapritinib | therapeutic | Avapritinib is the only drug for adult patients with PDGFRA exon 18 mutated unresectable or metastatic gastrointestinal stromal tumor (GIST). | 38803186 | 2024 |
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | In NF1-associated GIST, KIT and PDGFRA mutations are frequently absent and imatinib is ineffective. | 37122520 | 2023 |
| Gastrointestinal Stromal Tumors | Imatinib | therapeutic | Discovery of constitutive activation of KIT/PDGFRA tyrosine kinases in gastrointestinal stromal tumors (GISTs) leads to the development of the targeted drug imatinib. | 37706279 | 2023 |
| Gastrointestinal Stromal Tumors | IMATINIB MESYLATE | therapeutic | Most gastrointestinal stromal tumors (GISTs) express constitutively activated mutant isoforms of KIT or kinase platelet-derived growth factor receptor alpha (PDGFRA) that are potential therapeutic targets for imatinib mesylate. | 37890277 | 2023 |
To visualize the results use the plot function.
Figure 5.12: The Gene-Chemical Network for PDGFRA
5.2 Searching by disease
The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), ID is the identifier in the vocabulary, and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.
5.2.1 Single disease
In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.8 to 1.
results <- disease2gene( disease = "UMLS_C0036341",
database = "CURATED",
score = c( 0.8,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: UMLS_C0036341
## . Results: 152
In Table 5.7, the top 10 genes associated to UMLS CUI C0036341.
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>%
arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top 10 genes associated to Schizophrenia") | gene_symbol | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| DRD3 | Schizophrenia | 1 | 1992 | 2003 |
| DRD2 | Schizophrenia | 1 | 2000 | 2011 |
| BDNF | Schizophrenia | 1 | 2003 | 2008 |
| HTR2A | Schizophrenia | 1 | 2004 | 2008 |
| RTN4R | Schizophrenia | 1 | 2004 | 2017 |
| AKT1 | Schizophrenia | 1 | 2004 | 2011 |
| COMT | Schizophrenia | 1 | 2005 | 2010 |
| TNF | Schizophrenia | 1 | 2006 | 2006 |
| MTHFR | Schizophrenia | 1 | 2006 | 2009 |
| ZNF804A | Schizophrenia | 1 | 2008 | 2018 |
5.2.1.1 Visualizing the genes associated to a single disease
There are two options to visualize the results from searching a single disease: a Gene-Disease Network showing the genes related to the disease of interest (Figure 5.13), and a Disease-Protein Class Network with the genes grouped grouped by the the Drug Target Ontology Protein Class (Figure 5.14).
Figure 5.13 shows the default Gene-Disease Network for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.
Figure 5.13: The Gene-Disease Network for genes associated to Schizophrenia
Alternatively, in the Disease-Protein Class Network, genes are grouped by the the Drug Target Ontology Protein Class (Figure 5.14). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class argument ProteinClass. The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are G-protein coupled receptors. Notice again that not all genes have annotations to Protein classes.
Figure 5.14: The Protein Class-Disease Network for genes associated to Schizophrenia
The same results are obtained when querying DISGENET with the MeSH identifier for Schizophrenia (D012559).
results <- disease2gene( disease = "MESH_D012559",
database = "CURATED",
score = c( 0.8,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: MESH_D012559
## . Results: 152
The same results are obtained when querying DISGENET with the OMIM identifier for Schizophrenia (181500).
results <- disease2gene( disease = "OMIM_181500",
database = "CURATED",
score = c( 0.8,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: OMIM_181500
## . Results: 152
The same results are obtained when querying DISGENET with the ICD9-CM identifier for Schizophrenia (295).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: ICD9CM_295
## . Results: 152
The same results are obtained when querying DISGENET with the NCI identifier for Schizophrenia (C3362).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: NCI_C3362
## . Results: 152
The same results are obtained when querying DISGENET with the DO identifier for Schizophrenia (5419).
results <- disease2gene( disease = "HPO_HP:0100753",
database = "CURATED",
score = c( 0.8,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: HPO_HP:0100753
## . Results: 152
5.2.1.2 Exploring the evidences associated to a disease
To explore the evidences supporting the associations for Schizophrenia use the function disease2evidence.
results <- disease2evidence( disease = "UMLS_C0036341",
type = "GDA",
database = "CURATED",
score = c( 0.8,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: CURATED
## . Score: 0.8-1
## . Term: UMLS_C0036341
## . Results: 426
A selection of evidences is shown in Table 5.8.
tab <- results@qresult
tab <-tab[tab$reference_type == "PMID" & tab$pmYear > 2013 & tab$source =="PSYGENET", ]
tab <- tab[ order(-tab$pmYear), c("gene_symbol","source", "associationType", "sentence", "reference", "pmYear")][1:5,]
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association for Schizophrenia" ) | Gene | source | associationType | Sentence | pmid | Year |
|---|---|---|---|---|---|
| GRIN2A | PSYGENET | Biomarker | GRIN2A (GT)21 may play a significant role in the etiology of schizophrenia among the Chinese Han population of Shaanxi. | 25958346 | 2015 |
| NOTCH4 | PSYGENET | Biomarker | Our data indicate that NOTCH4 polymorphism can influence clinical symptoms in Slovenian patients with schizophrenia. | 25529856 | 2015 |
| PPARA | PSYGENET | Biomarker | We report significant increases in PPAR?, SREBP1, IL-6 and TNF?, and decreases in PPAR? and C/EPB? and mRNA levels from patients with schizophrenia, with additional BMI interactions, characterizing dysregulation of genes relating to metabolic-inflammation in schizophrenia. | 25433960 | 2015 |
| GRM7 | PSYGENET | Biomarker | In summary, our results indicate that the GRM7 SNPs rs13353402 and rs1531939 might be associated with schizophrenia in Chinese Han population. | 26254163 | 2015 |
| BCL2 | PSYGENET | Biomarker | ADNP haploinsufficiency in mice, which results in age-related neuronal death, cognitive and social dysfunction, exhibited reduced hippocampal beclin1 and increased Bcl2 expression (mimicking schizophrenia and normal human aging). | 24365867 | 2015 |
Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene.
results <- disease2evidence( disease = "UMLS_C0036341",
gene = c("DRD2", "DRD3"),
type = "GDA",
database = "ALL",
score = c( 0.5,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: ALL
## . Score: 0.5-1
## . Term: UMLS_C0036341
## . Results: 489
The more recent papers are shown in the Table 5.9.
tab <- results@qresult
tab <- tab %>%
filter(reference_type == "PMID") %>%
select(gene_symbol, associationType, reference, sentence, pmYear) %>% arrange(desc(pmYear)) %>% head(10)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association between C0036341 & DRD2,DRD3" ) | Gene | associationType | pmid | Sentence | Year |
|---|---|---|---|---|
| DRD3 | CausalOrContributing | 41588897 | Novel antipsychotic drugs with partial agonism at D2 and D3 receptors improve positive and negative schizophrenia symptoms, as well as cognitive symptoms, more effectively than second- generation antipsychotic drugs. | 2026 |
| DRD2 | AlteredExpression | 40665271 | These results suggest that hypermethylation and low expression of the DRD2 gene may be related to SCZ risk. | 2025 |
| DRD3 | GeneticVariation | 39993143 | For DRD3 polymorphisms, the rs7631540 TC genotype was associated with schizophrenia in the female subgroup. | 2025 |
| DRD2 | GeneticVariation | 40881611 | Additionally, we propose that the DRD2 Taq1 A2 allele could offer protection against SUD in certain individuals with schizophrenia, whereas the Taq1 A1 allele may heighten susceptibility to SUD due to impaired dopaminergic reward processing. | 2025 |
| DRD2 | CausalOrContributing | 40056428 | Most antipsychotics approved for schizophrenia interact with D2 DA receptors as an important part of their mechanism of action. | 2025 |
| DRD2 | GeneticVariation | 39993143 | In addition, the DRD2 rs1800497 genotype GA showed a reduced risk of schizophrenia in the male subgroup and the late-onset subgroup (>27 years of age). | 2025 |
| DRD2 | CausalOrContributing | 39618418 | Antipsychotics effective for schizophrenia approved prior to 2024 shared the common mechanism of postsynaptic dopamine D2 receptor antagonism or partial agonism. | 2024 |
| DRD2 | CausalOrContributing | 38114631 | The Drd2 gene, encoding the dopamine D2 receptor (D2R), was recently indicated as a potential target in the etiology of lowered sociability (i.e., social withdrawal), a symptom of several neuropsychiatric disorders such as Schizophrenia and Major Depression. | 2024 |
| DRD3 | GeneticVariation | 39187246 | DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia. | 2024 |
| DRD2 | GeneticVariation | 39187246 | DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia. | 2024 |
5.2.2 Multiple diseases
The disease2gene function also accepts as input a list of diseases (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 5.10 shows the UMLS CUIs and the corresponding disease names.
| UMLS_CUI | Disease_Name |
|---|---|
| C0036341 | Schizophrenia |
| C0036341 | Alzheimer’s Disease |
| C0030567 | Parkinson Disease |
| C0005586 | Bipolar Disorder |
Creating the vector with the list of diseases.
In the example, we will search in CURATED data, using a score range of 0.8-1.
results <- disease2gene(
disease = diseasesOfInterest,
database = "CURATED",
score =c(0.9,1),
verbose = TRUE )## Your query has 2 pages.
In table 5.11, the top 10 genes associated to the list of diseases.
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top Genes associated to a list of diseases") | gene_symbol | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| GBA1 | Parkinson Disease | 1 | 1987 | 2021 |
| SNCA | Parkinson Disease | 1 | 1989 | 2021 |
| APP | Alzheimer’s Disease | 1 | 1990 | 2023 |
| DRD3 | Schizophrenia | 1 | 1992 | 2003 |
| LRRK2 | Parkinson Disease | 1 | 1993 | 2025 |
| PSEN1 | Alzheimer’s Disease | 1 | 1993 | 2022 |
| PRKN | Parkinson Disease | 1 | 1993 | 2022 |
| GRN | Alzheimer’s Disease | 1 | 1993 | 2020 |
| MAPT | Alzheimer’s Disease | 1 | 1993 | 2020 |
| PSEN2 | Alzheimer’s Disease | 1 | 1993 | 2020 |
5.2.2.1 Visualizing the genes associated to multiple diseases
The default plot of the results of querying DISGENET with a list of diseases produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5.15).
Figure 5.15: The Gene-Disease Network associated to a list of diseases
To visualize the results as a Gene-Disease Heatmap (Figure 5.16) change the argument class to “Heatmap”. In this plot, the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.
## [1] "Dataframe of 166 rows has been reduced to 65 rows."
Figure 5.16: The Gene-Disease Heatmap for genes associated to a list of diseases
A third visualization option is a Protein Class-Disease Heatmap (Figure 5.17), in which genes are grouped by protein class. This plot is obtained by setting the class argument to “ProteinClass”. In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.
Figure 5.17: The Protein Class-Disease Heatmap for genes associated to a list of diseases
A Protein Class-Disease Network visualization is also possible (Figure 5.18).
Figure 5.18: The Protein Class-Disease Network for genes associated to a list of diseases
To explore the evidences supporting the associations, use the function disease2evidence.
results <- disease2evidence( disease = diseasesOfInterest,
type = "GDA",
score=c(0.5,1),
database = "CURATED" )
results## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-evidence
## . Database: CURATED
## . Score: 0.5-1
## . Term: UMLS_C0036341 ... UMLS_C0005586
## . Results: 3568
To visualize the results use the argument Points (Figure 5.19).
Figure 5.19: The Evidences plot for a list of diseases
5.2.3 Filtering by chemical
You can filter the results to find associations that are mentioned in the context of a chemical, like the example below.
results <- disease2gene( disease = "UMLS_C0678222", chemical = "CHEMBL_CHEMBL83",
database = "ALL" , n_pags = 1 )## Notice that your query has a maximum of 8 pages.
## By indicating n_pags = 1, your query of 8 pages has been reduced to 1 pages.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: ALL
## . Score: 0-1
## . Term: UMLS_C0678222
## . Results: 100
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "chemical_name", "chemicalid")] )%>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Breast Carcinoma") | gene_symbol | disease_name | score | chemical_name | chemicalid |
|---|---|---|---|---|
| BRCA2 | Breast Carcinoma | 1.0 | Tamoxifen | CHEMBL83 |
| ESR1 | Breast Carcinoma | 1.0 | Tamoxifen | CHEMBL83 |
| TP53 | Breast Carcinoma | 1.0 | Tamoxifen | CHEMBL83 |
| CHEK2 | Breast Carcinoma | 1.0 | Tamoxifen | CHEMBL83 |
| ATM | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
| BRCA1 | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
| CAV1 | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
| CDH1 | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
| EGFR | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
| PIK3CA | Breast Carcinoma | 0.9 | Tamoxifen | CHEMBL83 |
5.2.3.1 Retrieving the chemicals associated to a disease
For GDAs that have a chemical annotation, we can perform a query with a disease, or list of disease, to retrieve the chemicals annotated to this associations.
results <- disease2chemical( disease = "UMLS_C0010674",
database = "TEXTMINING_MODELS" , score = c(0.8,1))
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-chemical
## . Database: TEXTMINING_MODELS
## . Score: 0.8-1
## . Term: UMLS_C0010674
## . Results: 40
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type =="PMID") %>% dplyr::select(gene_symbol, chemical_name,chemical_effect ,sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
`Chemical Effect`=chemical_effect , Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Top chemicals associated to Cystic Fibrosis" ) | Gene | Chemical | Chemical Effect | Sentence | pmid | Year |
|---|---|---|---|---|---|
| CFTR | Elexacaftor | therapeutic|therapeutic|therapeutic | Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease. | 39882833 | 2025 |
| CFTR | Ivacaftor | therapeutic|therapeutic|therapeutic | Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease. | 39882833 | 2025 |
| CFTR | Tezacaftor | therapeutic|therapeutic|therapeutic | Triple-combination CFTR modulators, including ivacaftor/tezacaftor/elexacaftor with an additional class 2 corrector, are now the standard of care for most CF patients, transforming the outlook for this disease. | 39882833 | 2025 |
| CFTR | BELNACASAN | other | Breeding this reporter line with CFTRG551D CF ferret resulted in a novel CF model, CFTRint1-eGFP(lsl)/G551D, with disease onset manageable via the administration of CFTR modulator VX770. | 39791230 | 2025 |
| CFTR | Tezacaftor | therapeutic | The CFTR modulator Trikafta has markedly improved lung disease for Cystic Fibrosis (CF) patients carrying the common delta F508 (F508del-CFTR) CFTR mutation. | 38925289 | 2024 |
| CFTR | Linaclotide | other | These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion. | 38869953 | 2024 |
| CFTR | 2,6-DIAMINOPURINE | other | The ability of DAP to correct various endogenous UGA nonsense mutations in the CFTR gene and to restore its function in mice, in organoids derived from murine or patient cells, and in cells from patients with cystic fibrosis reveals the potential of such readthrough-stimulating molecules in developing a therapeutic approach. | 36641622 | 2023 |
| CFTR | BICARBONATE | other | CFTR, the cystic fibrosis (CF) gene-encoded epithelial anion channel, has a prominent role in driving chloride, bicarbonate and fluid secretion in the ductal cells of the exocrine pancreas. | 35011616 | 2021 |
| SCNN1A | Amiloride | other | Engineered mutant α-ENaC subunit mRNA delivered by lipid nanoparticles reduces amiloride currents in cystic fibrosis-based cell and mice models. | 33208364 | 2020 |
| SCNN1B | FIBOFLAPON SODIUM | other | Scnn1b-Tg mice overexpress the epithelial Na+ channel (ENaC) in their lungs, driving increased sodium absorption that causes lung pathology similar to CF. | 32631918 | 2020 |
To visualize the results use the plot function.
Figure 5.20: The Disease-Chemical Network associated to Cystic Fibrosis
5.2.3.2 Searching by disease and chemical
The disease2gene function can also be used to retrieve genes mentioned in the context of a specific disease and chemical (Table 5.14)
results <- disease2gene( disease = "UMLS_C0030567",
database = "TEXTMINING_HUMAN",
chemical = "CHEMBL_CHEMBL1009")
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: UMLS_C0030567
## . Results: 69
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Parkinson and levodopa") | gene_symbol | disease_name | chemical_name | score |
|---|---|---|---|
| BDNF | Parkinson Disease | Levodopa | 1 |
| DRD2 | Parkinson Disease | Levodopa | 1 |
| GBA1 | Parkinson Disease | Levodopa | 1 |
| GDNF | Parkinson Disease | Levodopa | 1 |
| MAOB | Parkinson Disease | Levodopa | 1 |
| PRKN | Parkinson Disease | Levodopa | 1 |
| SNCA | Parkinson Disease | Levodopa | 1 |
| TH | Parkinson Disease | Levodopa | 1 |
| PINK1 | Parkinson Disease | Levodopa | 1 |
| LRRK2 | Parkinson Disease | Levodopa | 1 |
To visualize the results use the function plot (Figure 5.19)
Figure 5.21: The Gene Disease Chemical Network for a disease and a drug
5.2.3.2.1 Retrieving the chemicals associated to a disease
To retrieve the chemicals mentioned in the GDAs involving a specific disease, we can use the disease2chemical function.
results <- disease2chemical( disease = "UMLS_C0030567",
database = "TEXTMINING_HUMAN" , score = c(0.5,1))
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.5-1
## . Term: UMLS_C0030567
## . Results: 307
tab <- results@qresult
tab <-tab%>% dplyr::filter(reference_type == "PMID") %>% dplyr::select(gene_symbol, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
`Chemical Effect` = chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid))) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Top Chemicals associated to Parkinson" ) | Gene | Chemical | Chemical Effect | Sentence | pmid | Year |
|---|---|---|---|---|---|
| NFE2L2 | Curcumin | other | Identification and mechanistic exploration of a mono-carbonyl analog A1 of dietary curcumin as a potent Nrf2-dependent neuroprotective agent within a cellular model of Parkinson’s disease. | 41380918 | 2026 |
| SNCA | Fibrinogen human | other | Fibrinogen exacerbates α-synuclein aggregation and mitochondrial dysfunction via alpha5beta3 integrin in Parkinson’s disease. | 40425084 | 2025 |
| SNCA | Homocysteine thiolactone, DL- | other | Furthermore, our findings corroborated that N-homocysteinylation of α-synuclein by HcyT induces apoptosis in SH-SY5Y cells, suggesting that such a modification may indeed contribute to the onset and progression of Parkinson’s disease (PD) in patients. | 40596457 | 2025 |
| SNCA | OLEUROPEIN | other | Oleuropein Aglycone, an Olive Polyphenol, Influences Alpha-Synuclein Aggregation and Exerts Neuroprotective Effects in Different Parkinson’s Disease Models. | 40702289 | 2025 |
| SNCA | HOMOCYSTEINE | other | . α-synuclein, homocysteine (Hcy) and leucine-rich α2-glycoprotein (LRG) have been shown to correlate to Parkinson’s disease (PD). | 39738968 | 2025 |
| SNCA | MICROCYSTIN-LR | other | These results suggest that MC-LR is involved in α-syn aggregate formation and PD pathogenesis by enhancing SNCA transcriptional activity to promote α-syn elevation via the MAPK4/GATA2 pathway and inducing α-syn phosphorylation via the PP2A/GRKs pathway. | 39738876 | 2025 |
| SNCA | ISOBAVACHALCONE | other | Isobavachalcone inhibits α-synuclein fibrillogenesis and its Parkinson’s disease variants, disassembles the mature fibrils and alleviates cellular toxicity. | 40763859 | 2025 |
| SNCA | Thymol | other | Listerin promotes α-synuclein degradation to alleviate Parkinson’s disease through the ESCRT pathway. | 39937915 | 2025 |
| SNCA | L-Acetylleucine | other | These findings highlight the therapeutic potential of NALL in PD by its protective effects on α-synuclein pathology and synaptic function in vulnerable dopaminergic neurons. | 40297686 | 2025 |
| MAPT | Flortaucipir F-18 | other | PET imaging with tracers like 18F-flortaucipir provided visualization of amyloid and tau aggregates in AD and dopaminergic changes in PD. | 40657296 | 2025 |
To visualize the results use the function plot
Figure 5.22: The Network plot for chemicals associated to Parkinson Disease
5.3 Exploring a GDA timeline
To display the evolution of publications first create a timeline object containing all evidences for a GDA using the timeline function.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gda-evidence
## . Database: ALL
## . Score: -
## . Term: UMLS_C0002395
## . Results: 4510
To visualize the results use the function plot with the argument Type = "Points".
Figure 5.23: The timeline plot for APOE and Alzheimer’s Disease
6 Variant-Disease Associations (VDAs)
6.1 Searching by variant
6.1.1 Single variant
The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN, with Type = "variant-disease".
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: CURATED
## . Score: 0.2-1
## . Term: rs113488022
## . Results: 13
The results are shown in Table 6.1.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to variant rs113488022") | variantid | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| rs113488022 | Colorectal Carcinoma | 0.8 | 1993 | 2024 |
| rs113488022 | Non-Small Cell Lung Carcinoma | 0.8 | 2002 | 2019 |
| rs113488022 | Papillary thyroid carcinoma | 0.8 | 2002 | 2018 |
| rs113488022 | melanoma | 0.8 | 2002 | 2018 |
| rs113488022 | Colon Carcinoma | 0.7 | 2002 | 2020 |
| rs113488022 | Multiple Myeloma | 0.7 | ||
| rs113488022 | RASopathy | 0.6 | 2011 | 2018 |
| rs113488022 | Nephroblastoma | 0.6 | ||
| rs113488022 | Nongerminomatous Germ Cell Tumor | 0.4 | 2002 | 2018 |
| rs113488022 | ASTROCYTOMA, LOW-GRADE, SOMATIC | 0.4 | 2002 | 2018 |
6.1.1.1 Visualizing the diseases associated to a single variant
The disgenet2r package offers several options to visualize the results of querying DISGENET for a single variant: a Variant-Disease Network (Figure 6.1) showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network, Figure 6.2). These graphics can be obtained by changing the class argument in the plot function.
By default, the plot function produces a Variant-Disease Network on a DataGeNET.DGN object (Figure 6.1). In the Variant-Disease Network the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.
Figure 6.1: The Variant-Disease Network for the variant rs113488022
Figure 6.2: The Variant-Disease Class Network for the variant rs113488022
6.1.1.2 Exploring the evidences associated to a variant
You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease.
results <- variant2evidence( variant = "rs10795668",
disease ="UMLS_C0009402",
database = "ALL" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-evidence
## . Database: ALL
## . Score: 0-1
## . Term: rs10795668
## . Results: 15
The results are shown in table 6.2.
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>% select(associationType, reference, pmYear, sentence) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid=reference) %>% dplyr::arrange(desc(Year))
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Evidences supporting the association between C0009402 & rs10795668") | associationType | pmid | Year | Sentence |
|---|---|---|---|
| GeneticVariation | 40479638 | 2025 | Our findings validated rs10795668 (LOC10537640), rs4939827 (SMAD7), rs6066825 (PREX1), and rs6983267 (CCAT2) polymorphisms in CRC risk among Brazilians and suggest that lower Asian and African ancestries might influence CRC susceptibility. |
| GeneticVariation | 36653562 | 2023 | FinnGen provides genetic insights from a well-phenotyped isolated population. |
| GeneticVariation | 24801760 | 2015 | The CRC SNPs accounted for 4.3% of the variation in multiple adenoma risk, with three SNPs (rs6983267, rs10795668, rs3802842) explaining 3.0% of the variation. |
| GeneticVariation | 24968322 | 2014 | . rs4631962 and rs10795668 contribute to CRC risk in the Taiwanese and East Asian populations, and the newly identified rs1338565 was specifically associated with CRC, supporting the ethnic diversity of CRC-susceptibility SNPs. |
| GeneticVariation | 23712746 | 2013 | In conclusion, CRC susceptibility variants rs9929218 and rs10795668 may exert some influence in modulating patient’s survival and they deserve to be further tested in additional CRC cohorts in order to confirm their potential as prognosis or predictive biomarkers. |
| GeneticVariation | 23359760 | 2012 | However, no associations with CRC risk were detected for six other loci (rs9929218, rs10411210, rs12701937, rs7014346, rs6983267, and rs10795668), and one SNP, rs16892766, was not polymorphic in any of the study participants. |
| GeneticVariation | 22235025 | 2012 | In conclusion, variants at 10p14 (rs10795668), 11q23.1 (rs3802842) and 15q13.3 (rs4779584) may have a predominant role in predisposition to early-onset CRC. |
| GeneticVariation | 21351697 | 2010 | Five SNPs (rs6983267, rs4939827, rs3802842, rs4444235, rs10795668) showed an association with colon and rectal cancer. |
| GeneticVariation | 20530476 | 2010 | These results suggest that rs6983267, rs4939827, rs10795668, rs3802842, and rs961253 SNPs are associated with the risk of CRC in the Chinese population individually and jointly. |
| GeneticVariation | 18372905 | 2008 | In addition to the previously reported 8q24, 15q13 and 18q21 CRC risk loci, we identified two previously unreported associations: rs10795668, located at 10p14 (P = 2.5 x 10(-13) overall; P = 6.9 x 10(-12) replication), and rs16892766, at 8q23.3 (P = 3.3 x 10(-18) overall; P = 9.6 x 10(-17) replication), which tags a plausible causative gene, EIF3H. |
The results can be visualized using the plot function with the argument Points. This will show the number of publications per year associated to this variant. It is important to set the parameter limit to 10,000 in order to include all the results in the plot.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-evidence
## . Database: ALL
## . Score: 0-1
## . Term: rs1800629
## . Results: 1746
Figure 6.3: The Evidence plot for the variant rs1800629
6.1.2 Multiple variants
The variant2disease function retrieves the information in DISGENET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database. By default, variant2disease function uses as source database CURATED.
results <- variant2disease(
variant = c("rs121913013", "rs1060500621",
"rs199472709", "rs72552293",
"rs74315445", "rs199472795"),
database = "ALL")
results## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: variant-disease
## . Database: ALL
## . Score: 0-1
## . Term: rs121913013 ... rs199472795
## . Results: 21
In table 6.3, the top 10 diseases associated to the list of variants.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )%>% dplyr::arrange(desc(score), desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to the list of variants") | variantid | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| rs74315445 | LONG QT SYNDROME 5 | 0.6 | 1993 | 2022 |
| rs199472709 | Romano-Ward Syndrome | 0.6 | 1993 | 2022 |
| rs199472795 | Romano-Ward Syndrome | 0.6 | 1993 | 2022 |
| rs74315445 | Jervell And Lange-Nielsen Syndrome 2 | 0.6 | 1993 | 2011 |
| rs72552293 | Brugada Syndrome 2 | 0.6 | 1993 | 2007 |
| rs74315445 | Jervell-Lange Nielsen Syndrome | 0.5 | 1993 | 2015 |
| rs74315445 | Long QT Syndrome | 0.5 | 1997 | 2014 |
| rs199472795 | Long QT Syndrome | 0.4 | 2000 | 2021 |
| rs199472709 | Beckwith-Wiedemann Syndrome | 0.4 | 1993 | 2020 |
| rs199472795 | Beckwith-Wiedemann Syndrome | 0.4 | 1993 | 2020 |
6.1.2.1 Visualizing the diseases associated to multiple variants
The results of querying DISGENET with a list of variants can be visualized as a Variant-Disease Network (Figure 6.4), as a Variant-Gene-Disease Network (Figure 6.5), as Variant-Disease Heatmap (Figure 6.6), as Variant-Disease Class Network (Figure 6.7) and as a Variant-Disease Class Heatmap (Figure 6.8).
Figure 6.4: The Variant-Disease Network for a list of variants
To obtain the Variant-Gene-Disease Network (Figure 6.5), change the showGenes argument to “TRUE”.
Figure 6.5: The Variant-Gene-Disease Network for a list of variants
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network by changing the type argument to Heatmap (Figure 6.6).
Figure 6.6: The Variant-Disease Heatmap for a list of variants
The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network by changing the class argument to DiseaseClass (Figure 6.7).
Figure 6.7: The Variant-Disease Class Network for a list of variants
The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Heatmap by changing the type argument to Heatmap (Figure 6.8).
Figure 6.8: The Variant-Disease Class Heatmap for a list of variants
6.2 Searching by disease
6.2.1 Single disease
The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-variant
## . Database: CLINVAR
## . Score: 0-1
## . Term: UMLS_C1832916
## . Results: 178
In Table 6.4, the variants associated to Timothy syndrome according to ClinVar database.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = " Variants associated to Timothy syndrome according to ClinVar") | variantid | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| rs79891110 | Timothy syndrome | 0.7 | 1993 | 2016 |
| rs786205748 | Timothy syndrome | 0.6 | 1993 | 2020 |
| rs786205753 | Timothy syndrome | 0.6 | 1993 | 2019 |
| rs549476254 | Timothy syndrome | 0.6 | 1993 | 2019 |
| rs80315385 | Timothy syndrome | 0.6 | 1993 | 2015 |
| rs797044881 | Timothy syndrome | 0.5 | 1993 | 2021 |
| rs786205745 | Timothy syndrome | 0.5 | 1993 | 2018 |
| rs374528680 | Timothy syndrome | 0.5 | 1993 | 2015 |
| rs199473391 | Timothy syndrome | 0.4 | 1993 | 2023 |
| rs764212214 | Timothy syndrome | 0.4 | 1993 | 2022 |
The results can be further restricted to keep variants predicted to be deleterious by SIFT and PolyPhen scores, by passing ranges of these scores to the function, using sift and polyphen arguments, like in the example below. Remember that genetic variants with SIFT scores smaller than 0.05 are predicted to be deleterious, while values of PolyPhen greater than 0.908 are classified as Probably Damaging.
results <- disease2variant(disease = c("UMLS_C1832916"),
database = "CLINVAR", sift = c(0,0.05), polyphen = c(0.9,1) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-variant
## . Database: CLINVAR
## . Score: 0-1
## . Term: UMLS_C1832916
## . Results: 95
In Table 6.5, the deleterious variants associated to Timothy syndrome repored in ClinVar database.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "polyphen_score", "sift_score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Deleterious variants associated to Timothy syndrome according to ClinVar") | variantid | disease_name | score | polyphen_score | sift_score | yearInitial | yearFinal |
|---|---|---|---|---|---|---|
| rs79891110 | Timothy syndrome | 0.7 | 1.000 | 0.00 | 1993 | 2016 |
| rs786205748 | Timothy syndrome | 0.6 | 1.000 | 0.00 | 1993 | 2020 |
| rs786205753 | Timothy syndrome | 0.6 | 0.999 | 0.00 | 1993 | 2019 |
| rs549476254 | Timothy syndrome | 0.6 | 0.999 | 0.00 | 1993 | 2019 |
| rs80315385 | Timothy syndrome | 0.6 | 1.000 | 0.00 | 1993 | 2015 |
| rs797044881 | Timothy syndrome | 0.5 | 1.000 | 0.00 | 1993 | 2021 |
| rs786205745 | Timothy syndrome | 0.5 | 1.000 | 0.01 | 1993 | 2018 |
| rs199473391 | Timothy syndrome | 0.4 | 1.000 | 0.00 | 1993 | 2023 |
| rs755846732 | Timothy syndrome | 0.4 | 1.000 | 0.00 | 1993 | 2021 |
| rs761966966 | Timothy syndrome | 0.4 | 1.000 | 0.00 | 1993 | 2019 |
6.2.1.1 Visualizing the variants associated to a single disease
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network (Figure 6.9).
Figure 6.9: The Variant-Disease Network for a single disease
The Variant-Disease Network can be displayed as a Variant-Disease-Gene Network, by setting the showGenes parameter to TRUE (Figure 6.10).
Figure 6.10: The Variant-Gene-Disease Network for a single disease
6.2.1.2 Explore the evidences associated to a single disease
To explore the evidences supporting the VDAs for Timothy syndrome, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Timothy syndrome.
results <- disease2evidence( disease = "UMLS_C1832916",
type = "VDA",
database = "ALL",
score = c( 0.5,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: ALL
## . Score: 0.5-1
## . Term: UMLS_C1832916
## . Results: 73
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>%
select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear)) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Evidences supporting associations") | pmid | associationType | Year | Sentence |
|---|---|---|---|
| 40568156 | GeneticVariation | 2025 | Most TS cases are caused by a de novo single amino acid substitution G406R in the CACNA1C gene that encodes the pore-forming subunit of the voltage-gated L-type calcium channel CaV1.2. |
| 39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
| 38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
| 38826393 | GeneticVariation | 2024 | Timothy syndrome patients were first identified as having a cardiac presentation of Long QT and syndactyly of the fingers and/or toes, and an identical variant in CACNA1C , Gly406Arg. |
| 37271119 | GeneticVariation | 2023 | Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS. |
| 36523353 | GeneticVariation | 2022 | TS showed a high degree of genetic homogeneity, as the p.Gly406Arg mutation either in exon 8 or exon 8A alone was responsible for 70% of the cases. |
| 36162529 | GeneticVariation | 2022 | Individuals with Timothy Syndrome (TS), a genetic disorder caused by CaV1.2 L-type Ca2+ channel (LTCC) gain-of function mutations, such as G406R, exhibit social deficits, repetitive behaviors, and cognitive impairments characteristic of ASD that are phenocopied in TS2-neo mice expressing G406R. |
| 36347939 | GeneticVariation | 2022 | A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS. |
| 33797204 | GeneticVariation | 2021 | In 2015, a variant in CACNA1C (p.R518C) was reported to cause cardiac-only Timothy syndrome, a genetic disorder with a mixed phenotype of congenital heart disease, hypertrophic cardiomyopathy (HCM), and LQTS that lacked extra-cardiac features. |
| 34163037 | CausalMutation | 2021 | Phenotypic expansion of CACNA1C-associated disorders to include isolated neurological manifestations. |
If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene.
results <- disease2evidence( disease = "UMLS_C1832916",
gene = "775", vocabulary = "ENTREZ",
type = "VDA", database = "TEXTMINING_HUMAN",
score = c( 0.7,1 ) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: TEXTMINING_HUMAN
## . Score: 0.7-1
## . Term: UMLS_C1832916
## . Results: 15
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID")%>%
select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear))%>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Selection of evidences supporting associations between C0036341 & CACNA1C") | pmid | associationType | Year | Sentence |
|---|---|---|---|
| 40568156 | GeneticVariation | 2025 | Most TS cases are caused by a de novo single amino acid substitution G406R in the CACNA1C gene that encodes the pore-forming subunit of the voltage-gated L-type calcium channel CaV1.2. |
| 39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
| 38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
| 38826393 | GeneticVariation | 2024 | Timothy syndrome patients were first identified as having a cardiac presentation of Long QT and syndactyly of the fingers and/or toes, and an identical variant in CACNA1C , Gly406Arg. |
| 36523353 | GeneticVariation | 2022 | TS showed a high degree of genetic homogeneity, as the p.Gly406Arg mutation either in exon 8 or exon 8A alone was responsible for 70% of the cases. |
| 36162529 | GeneticVariation | 2022 | Individuals with Timothy Syndrome (TS), a genetic disorder caused by CaV1.2 L-type Ca2+ channel (LTCC) gain-of function mutations, such as G406R, exhibit social deficits, repetitive behaviors, and cognitive impairments characteristic of ASD that are phenocopied in TS2-neo mice expressing G406R. |
| 36347939 | GeneticVariation | 2022 | A novel CACNA1C variant, p.R412M, was found to be associated with atypical TS through the same mechanism as p.G406R, the variant responsible for classical TS. |
| 32437834 | GeneticVariation | 2020 | Timothy syndrome (TS) is a neurodevelopmental disorder caused by mutations in the pore-forming subunit α11.2 of the L-type voltage-gated Ca2+-channel Cav1.2, at positions G406R or G402S. |
| 30984024 | GeneticVariation | 2019 | Timothy syndrome (TS) is a very rare multisystem disorder almost exclusively associated with mutations G402S and G406R in helix IS6 of Cav1.2. |
| 28211989 | GeneticVariation | 2017 | On genetic analysis, the canonical TS1 causing mutation, p.Gly406Arg in exon 8A of the CACNA1C gene was detected. |
6.2.2 Multiple diseases
results <- disease2variant(
disease = paste0("UMLS_",c("C3150943", "C1859062", "C1832916", "C4015695")),
database = "CURATED",
score = c(0.6, 1) )
results## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-variant
## . Database: CURATED
## . Score: 0.6-1
## . Term: UMLS_C3150943 ... UMLS_C4015695
## . Results: 159
Table 6.8 shows the variants associated to a list of Long QT syndromes in the curated data in DISGENET.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
tab[is.na(tab)] <- ""
knitr::kable(tab[1:10,], caption = "Variants associated to a list of Long QT syndromes") | variantid | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|
| rs121912507 | Long Qt Syndrome 2 | 0.7 | 1993 | 2022 |
| rs137854600 | LONG QT SYNDROME 3 | 0.7 | 1993 | 2022 |
| rs137854601 | LONG QT SYNDROME 3 | 0.7 | 1993 | 2022 |
| rs79891110 | Timothy syndrome | 0.7 | 1993 | 2016 |
| rs199472916 | Long Qt Syndrome 2 | 0.7 | ||
| rs76420733 | Long Qt Syndrome 2 | 0.6 | 1990 | 2022 |
| rs199473099 | LONG QT SYNDROME 3 | 0.6 | 1991 | 2015 |
| rs199473435 | Long Qt Syndrome 2 | 0.6 | 1993 | 2023 |
| rs199473108 | LONG QT SYNDROME 3 | 0.6 | 1993 | 2022 |
| rs199473260 | LONG QT SYNDROME 3 | 0.6 | 1993 | 2022 |
6.2.2.1 Visualizing the variants associated to multiple diseases
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 6.11), by changing the class argument from “Network” to “Heatmap”.
Figure 6.11: The Variant-Disease Network for a list of diseases
The results can be visualized as a Heatmap (Figure 6.12).
Figure 6.12: The Variant-Disease Heatmap for a list of diseases
6.3 Searching by gene
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: CURATED
## . Score: 0-1
## . Term: APP
## . Results: 17
Table 6.9 shows the top variants associated to the APP gene in the curated data in DISGENET.
tab <- unique(results@qresult[ ,c("variantid", "gene_symbols", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top variants associated to APP") | variantid | gene_symbols | disease_name | score | yearInitial | yearFinal |
|---|---|---|---|---|---|
| rs63750264 | APP | Alzheimer’s Disease | 0.7 | 1991 | 2020 |
| rs63750579 | APP | Alzheimer’s Disease | 0.6 | 1990 | 2020 |
| rs63750579 | APP | CEREBRAL AMYLOID ANGIOPATHY, APP-RELATED | 0.6 | 1990 | 2019 |
| rs63749964 | APP | ALZHEIMER DISEASE, FAMILIAL, 1 | 0.6 | 1991 | 2020 |
| rs63750264 | APP | ALZHEIMER DISEASE, FAMILIAL, 1 | 0.6 | 1991 | 2020 |
| rs63750671 | APP | ALZHEIMER DISEASE, FAMILIAL, 1 | 0.6 | 1992 | 2020 |
| rs63750066 | APP | Alzheimer’s Disease | 0.6 | 1992 | 2020 |
| rs63751039 | APP | ALZHEIMER DISEASE, FAMILIAL, 1 | 0.6 | 1992 | 2020 |
| rs193922916 | APP | Alzheimer’s Disease | 0.6 | 1993 | 2020 |
| rs63750973 | APP | Alzheimer’s Disease | 0.6 | 1993 | 2020 |
6.3.1 Visualizing the variant-disease associations retrieved for a gene
The results of querying DISGENET variant information with a gene can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap (Figure 6.13), if the input is a list of genes, by changing the class argument from Network to Heatmap. The genes can be shown by setting the showGenes argument to “TRUE”.
Figure 6.13: The Variant-Disease Network for a gene
6.3.2 Filtering by chemical
6.3.2.1 Searching by variant and chemical
results <- variant2disease( variant = "rs121434568",
database = "TEXTMINING_HUMAN",
chemical = "CHEMBL_CHEMBL1173655")
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: rs121434568
## . Results: 6
Table 6.10 shows the VDAs associated to rs121434568 and afatinib.
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "VDAs associated to rs121434568 and afatinib") | variantid | disease_name | chemical_name | score |
|---|---|---|---|
| rs121434568 | Carcinoma of lung | Afatinib | 0.7 |
| rs121434568 | Adenocarcinoma of lung (disorder) | Afatinib | 0.7 |
| rs121434568 | Non-Small Cell Lung Carcinoma | Afatinib | 0.4 |
| rs121434568 | Malignant neoplasm of lung | Afatinib | 0.3 |
| rs121434568 | Dyspnea | Afatinib | 0.1 |
| rs121434568 | Coughing | Afatinib | 0.1 |
To visualize the results use the plot function.
Figure 6.14: VDAs associated to rs121434568 and afatinib
6.3.2.2 Retrieving the chemicals associated to a variant
The variant2chemical function allows to retrieve the chemicals associated to a variant
results <- variant2chemical( variant = "rs1801133",
database = "TEXTMINING_HUMAN" , score = c(0.3,1))
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.3-1
## . Term: rs1801133
## . Results: 35
tab <- results@qresult
tab <-tab%>% dplyr::select( disease_name, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab[1:10, ] %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
`Chemical Effect`=chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Chemicals associated to rs1801133" ) | Disease | Chemical | Chemical Effect | Sentence | pmid | Year |
|---|---|---|---|---|---|
| Multiple Sclerosis | HOMOCYSTEINE | other|other|therapeutic | The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS). | 40929924 | 2025 |
| Multiple Sclerosis | Cyanocobalamin | other|other|therapeutic | The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS). | 40929924 | 2025 |
| Multiple Sclerosis | VITAMIN B12 | other|other|therapeutic | The MTHFR 677C>T rs1801133 genetic variant, homocysteine (Hcy), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) are factors associated with the physiopathology of multiple sclerosis (MS). | 40929924 | 2025 |
| Folic Acid Deficiency | HOMOCYSTEINE | other | Genetic analysis revealed a significant association between homozygous TT genotype of the MTHFR C677T polymorphism, elevated Hcy levels (20.4 ± 7.07; p=0.001) and vitamin B9 deficiency (4.9±3.9; p=0.001). | 39545031 | 2024 |
| Leukopenia | Pemetrexed | toxicity | Therefore, the MTHFR C677T polymorphism could be a predictive factor for leukopenia, neutropenia, nausea, and fatigue toxicities in non-sq NSCLC patients treated with single-agent PEM. | 29186089 | 2017 |
| Leukopenia | Methotrexate | toxicity | Patients with MTHFR 677TT and 677CT + 1298AC were associated with lower frequency of 6-MP and MTX dose reduction due to leukopenia (p < 0.05). | 23865834 | 2014 |
| Myocardial Infarction | VITAMIN B12 | other|other | The MTHFR 677C>T dimorphism showed no association with MI (chi(2) = 0.25, 1df, P=0.62), serum levels of folate and vitamin B12 and plasma level of vitamin B6. | 19565010 | 2005 |
| Myocardial Infarction | Pyridoxine | other|other | The MTHFR 677C>T dimorphism showed no association with MI (chi(2) = 0.25, 1df, P=0.62), serum levels of folate and vitamin B12 and plasma level of vitamin B6. | 19565010 | 2005 |
| Schizophrenia | HOMOCYSTEINE | other | Elevated Hcy levels and, in line with this finding, homozygosity for the 677C–>T mutation in the MTHFR gene were not associated with an increased risk for schizophrenia. | 14572619 | 2003 |
| Folic Acid Deficiency | Uracil | other | Methylenetetrahydrofolate reductase C677T polymorphism does not alter folic acid deficiency-induced uracil incorporation into primary human lymphocyte DNA in vitro. | 11408344 | 2001 |
To visualize the results use the plot function.
Figure 6.15: Chemicals associated to rs1801133
7 Associations involving Chemicals
7.1 Retrieving genes, variants, and diseases associated to chemicals
The chemical2gene function allows to retrieve the GDAS for a specific chemical, or list of chemicals.
## Notice that your query has a maximum of 9 pages.
## By indicating n_pags = 5, your query of 9 pages has been reduced to 5 pages.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gene
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL1009
## . Results: 119
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol,gene_type , chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Genes associated to levodopa") | gene_symbol | gene_type | chemical_name | pmids_chemical |
|---|---|---|---|
| COMT | protein-coding | Levodopa | 20 |
| DRD1 | protein-coding | Levodopa | 17 |
| DRD3 | protein-coding | Levodopa | 15 |
| SNCA | protein-coding | Levodopa | 15 |
| PRKN | protein-coding | Levodopa | 14 |
| TH | protein-coding | Levodopa | 14 |
| DRD2 | protein-coding | Levodopa | 13 |
| GCH1 | protein-coding | Levodopa | 13 |
| GH1 | protein-coding | Levodopa | 12 |
| SLC6A3 | protein-coding | Levodopa | 10 |
The results can be visualized as a Chemical-Gene Network (Figure 7.1).
Figure 7.1: The Chemical-Gene Network for a single chemical
The chemical2disease function allows to retrieve the diseases for a specific chemical, or list of chemicals, and the information cab be extracted from GDAs or VDAs. To specify from where, use the type parameter.
results <- chemical2disease( chemical = "CHEMBL_CHEMBL1009" , type = "GDA", database = "ALL" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-disease
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL1009
## . Results: 172
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Diseases associated to levodopa, type GDA", align= "lllc") | diseaseid | disease_name | chemical_name | pmids_chemical |
|---|---|---|---|
| C0030567 | Parkinson Disease | Levodopa | 187 |
| C0013384 | Dyskinetic syndrome | Levodopa | 149 |
| C0242422 | Parkinsonian Disorders | Levodopa | 54 |
| C0393593 | Dystonia Disorders | Levodopa | 20 |
| C0013421 | Dystonia | Levodopa | 19 |
| C1851920 | Dopa-Responsive Dystonia | Levodopa | 11 |
| C0392702 | Abnormal involuntary movements | Levodopa | 8 |
| C5979810 | Motor dysfunction | Levodopa | 8 |
| C0033975 | Psychotic Disorders | Levodopa | 7 |
| C0349204 | Nonorganic psychosis | Levodopa | 7 |
Figure 7.2: The Chemical-Disease Network for a chemical
A DiseaseClass plot is also available.
Figure 7.3: The Chemical-Disease Class Network for a chemical
For VDAs
results <- chemical2disease( chemical = "CHEMBL_CHEMBL1282" , type = "VDA", database = "ALL" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-disease
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL1282
## . Results: 2
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Diseases associated to imiquimod, type VDA", align= "lllc") | diseaseid | disease_name | chemical_name | pmids_chemical |
|---|---|---|---|
| C0025202 | melanoma | Imiquimod | 1 |
| C4721806 | Skin Basal Cell Carcinoma | Imiquimod | 1 |
Figure 7.4: The Chemical-Disease Network for a chemical
The chemical2variant function allows to retrieve the variants for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-variant
## . Database: ALL
## . Score: 0-1
## . Term:
## . Results: 42
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, most_severe_consequence, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc") | variantid | gene_symbols | most_severe_consequence | chemical_name | pmids_chemical |
|---|---|---|---|---|
| rs1045642 | ABCB1 | missense_variant | Carbamazepine | 8 |
| rs3812718 | SCN1A | splice_donor_5th_base_variant | Carbamazepine | 6 |
| rs2298771 | SCN1A , LOC102724058 | missense_variant | Carbamazepine | 5 |
| rs1801133 | MTHFR | missense_variant | Carbamazepine | 4 |
| rs776746 | ZSCAN25, CYP3A5 | splice_acceptor_variant | Carbamazepine | 4 |
| rs2032582 | ABCB1 | missense_variant | Carbamazepine | 3 |
| rs2234922 | EPHX1 | missense_variant | Carbamazepine | 2 |
| rs2273697 | ABCC2 | missense_variant | Carbamazepine | 2 |
| rs28365083 | ZSCAN25, CYP3A5 | missense_variant | Carbamazepine | 2 |
| rs28383479 | ZSCAN25, CYP3A5 | missense_variant | Carbamazepine | 2 |
The chemical2variant function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.
results <- chemical2variant( chemical = "CHEMBL_CHEMBL108", database = "ALL", sift = c(0,0.05), polyphen = c(0.9,1) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-variant
## . Database: ALL
## . Score: 0-1
## . Term:
## . Results: 7
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, sift_score, polyphen_score, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc") | variantid | gene_symbols | sift_score | polyphen_score | chemical_name | pmids_chemical | |
|---|---|---|---|---|---|---|
| 1 | rs1045642 | ABCB1 | 0.02 | 0.998 | Carbamazepine | 8 |
| 2 | rs1043620 | HSPA1A, HSPA1L | 0.00 | 0.997 | Carbamazepine | 1 |
| 3 | rs1051740 | EPHX1 | 0.00 | 0.987 | Carbamazepine | 1 |
| 4 | rs121912438 | SOD1 | 0.00 | 0.967 | Carbamazepine | 1 |
| 5 | rs211037 | GABRG2 | 0.02 | 0.977 | Carbamazepine | 1 |
| 6 | rs71428908 | SCN9A | 0.00 | 0.995 | Carbamazepine | 1 |
| 7 | rs796052508 | GABRG2 | 0.03 | 0.997 | Carbamazepine | 1 |
| NA | NULL | |||||
| NA.1 | NULL | |||||
| NA.2 | NULL |
Figure 7.5: The Chemical-Variant Network for carbamazepine
7.2 Retrieving GDAs and VDAs associated to chemicals
7.2.1 Exploring the GDAs of a chemical
The chemical2gda function allows to retrieve the GDAS for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL809
## . Results: 221
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score, pmids_chemical)
knitr::kable(tab[1:10,], caption = "GDAs for sertraline") | gene_symbol | disease_name | chemical_name | score | pmids_chemical |
|---|---|---|---|---|
| ICAM1 | Inflammation | Sertraline | 1 | 1 |
| BDNF | Huntington Disease | Sertraline | 1 | 1 |
| BDNF | Mental Depression | Sertraline | 1 | 6 |
| IL1B | Inflammation | Sertraline | 1 | 1 |
| IL10 | Inflammation | Sertraline | 1 | 1 |
| CCL2 | Inflammation | Sertraline | 1 | 1 |
| CRP | Acute Coronary Syndrome | Sertraline | 1 | 2 |
| AGER | Inflammation | Sertraline | 1 | 1 |
| IL6 | Mental Depression | Sertraline | 1 | 6 |
| SLC6A4 | Mental Depression | Sertraline | 1 | 1 |
To visualize the results use the plot function.
Figure 7.6: Network for LEPR and metformin
7.2.2 Exploring the VDAs of a chemical
The chemical2vda function allows to retrieve the VDAS for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL2010601
## . Results: 20
The chemical2vda function can also receive as a parameter sift and polyphen scores to restrict the results to variants predicted as probably deleterious.
results <- chemical2vda( chemical = "CHEMBL_CHEMBL2010601",
database = "ALL",
sift = c(0,0.05) , polyphen = c(0.9,1) )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL2010601
## . Results: 16
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score,pmids_chemical)
knitr::kable(tab[1:10,], caption = "VDAs associated ivacaftor") | variantid | disease_name | chemical_name | score | pmids_chemical |
|---|---|---|---|---|
| rs78655421 | Cystic Fibrosis | Ivacaftor | 0.9 | 2 |
| rs75527207 | Cystic Fibrosis | Ivacaftor | 0.9 | 26 |
| rs139304906 | Cystic Fibrosis | Ivacaftor | 0.8 | 1 |
| rs74503330 | Cystic Fibrosis | Ivacaftor | 0.8 | 1 |
| rs368505753 | Cystic Fibrosis | Ivacaftor | 0.8 | 1 |
| rs397508442 | Cystic Fibrosis | Ivacaftor | 0.5 | 1 |
| rs75527207 | Lung diseases | Ivacaftor | 0.2 | 2 |
| rs75527207 | Weight Gain | Ivacaftor | 0.2 | 3 |
| rs75527207 | Inflammation | Ivacaftor | 0.1 | 1 |
| rs75527207 | Hyperviscosity | Ivacaftor | 0.1 | 1 |
To visualize the results use the plot function.
Figure 7.7: Network of VDAs
7.2.3 Exploring the GDA evidences of a chemical
The chemical2evidence function allows to retrieve the evidences for the GDAS or VDAs for a specific chemical, or list of chemicals.
results <- chemical2evidence( chemical = "CHEMBL_CHEMBL1069", type = "GDA" , database = "ALL" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: ALL
## . Score: 0-1
## . Term: CHEMBL1069
## . Results: 623
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Disease = disease_name, Chemical = chemical_name, `Chemical Effect` =chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference)
tab <- tab[ order(-tab$Year),]
tab[1:10, ] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences for Valsartan" ) | Gene | Disease | Chemical | Sentence | Chemical Effect | pmid | Year |
|---|---|---|---|---|---|---|
| NPPB | Heart failure | Valsartan | In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group. | therapeutic|therapeutic|therapeutic | 41335448 | 2026 |
| NPPB | Congestive heart failure | Valsartan | In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group. | therapeutic|therapeutic|other | 41335448 | 2026 |
| NPPB | Chagas Disease | Valsartan | In patients with HF with reduced ejection fraction due to Chagas disease, there was no significant difference in clinical outcomes between sacubitril/valsartan and enalapril, but there was a greater reduction in NT-proBNP at 12 weeks in patients in the sacubitril/valsartan group. | other|other|other | 41335448 | 2026 |
| STING1 | Diabetes Mellitus, Non-Insulin-Dependent | Valsartan | A variety of inhibitors, including small-molecule compounds (fenofibrate and nicotinamide riboside), proteins (proprotein convertase subtilisin/kexin type 9 monoclonal antibody, Metrnl, Brahma-related gene 1, and irsin, interferon-stimulated gene 15), natural products (rosavin and spermidine), probiotics (ZBiotics and garlic-derived exosomes-like nanoparticles), compound drugs (sacubitril/valsartan), and nanoparticles (Mito-G and Jumonji domain-containing protein 3 inhibitory nanoparticles), can inhibit STING signal transduction, alleviate glucose dysregulation, improve lipid metabolism in T2DM, and reduce organ damage. | other|therapeutic|other|therapeutic|other|other | 41161546 | 2026 |
| CRP | Atherosclerosis | Valsartan | High-sensitivity C-reactive protein (hs-CRP) will be colllected and evaluated at each timepoint | other|other|other|other | NCT06930885 | 2025 |
| NPPB | Heart Failure, Systolic | Valsartan | Sacubitril/valsartan treatment in HFrEF leads to reduced sST2 and NT-proBNP concentrations with distinct decreasing curves, which are linked to reverse CR through LV-related parameters. | other|other | 39889435 | 2025 |
| MME | Inflammation | Valsartan | Neprilysin inhibition by Sacubitril/Valsartan improved adverse cardiac remodelling in experimental DbCM through direct regulation of inflammation, highlighting immunomodulation as a novel mechanism underlying established its cardioprotective actions. | other|toxicity | 40369551 | 2025 |
| MME | Diabetic Cardiomyopathies | Valsartan | Neprilysin inhibition by Sacubitril/Valsartan improved adverse cardiac remodelling in experimental DbCM through direct regulation of inflammation, highlighting immunomodulation as a novel mechanism underlying established its cardioprotective actions. | other|other | 40369551 | 2025 |
| FGF21 | Myocardial Infarction | Valsartan | Sacubitril/Valsartan partially alleviates myocardial infarction injury by activating the FGF21 signaling pathway via PPARs. | therapeutic|therapeutic | 39987117 | 2025 |
| PLA2G4A | Chronic kidney disease stage 5 | Valsartan | The disordered metabolism may reduce the sensitivity of patients to sacubitril/valsartan treatment, and PLA2G4A targeted inhibitors may be a promising therapeutic strategy to improve the sensitivity of patients with ESRD and HF to sacubitril/valsartan treatment. | other|other | 40058083 | 2025 |
To visualize the results use the plot function.
Figure 7.8: Chemicals associated to Parkinson
7.2.4 Exploring the VDA evidences of a chemical
results <- chemical2evidence( chemical = "CHEMBL_CHEMBL502", type = "VDA" , database = "TEXTMINING_HUMAN" )
results## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: CHEMBL502
## . Results: 5
tab <- results@qresult
tab <-tab %>% dplyr::select(variantid, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
`Chemical Effect` =chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference )
tab <- tab[ order(-tab$Year),]
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences for Donepezil" ) | variantid | Disease | Chemical | Sentence | Chemical Effect | pmid | Year |
|---|---|---|---|---|---|---|
| rs1080985 | Alzheimer’s Disease | Donepezil | The CYP2D6 SNP rs1080985 might be a useful pharmacogenetic marker of the long-term therapeutic response to donepezil in patients with AD. | therapeutic | 34120801 | 2022 |
| rs1080985 | Alzheimer’s Disease | Donepezil | Recent data have indicated that the rs1080985 single nucleotide polymorphism (SNP) of the cytochrome P450 (CYP) 2D6 and the common apolipoprotein E (APOE) gene may affect the response to donepezil in patients with Alzheimer’s disease (AD). | therapeutic | 25538729 | 2014 |
| rs1080985 | Alzheimer’s Disease | Donepezil | Recent data indicate that the rs1080985 single nucleotide polymorphism of the cytochrome P450 (CYP) 2D6 gene may affect the response to treatment with donepezil in patients with Alzheimer’s disease. | therapeutic | 23950644 | 2013 |
| rs1080985 | Alzheimer’s Disease | Donepezil | In a sample of 415 AD cases, we found evidence of association between rs1080985 and response to donepezil after 6 months of therapy (OR [95% CI]: 1.74 [1.01-3.00], p = 0.04). | therapeutic | 22465999 | 2012 |
| rs1080985 | Alzheimer’s Disease | Donepezil | The single nucleotide polymorphism rs1080985 in the CYP2D6 gene may influence the clinical efficacy of donepezil in patients with mild to moderate Alzheimer disease (AD). | therapeutic | 19738170 | 2009 |
| NA | ||||||
| NA | ||||||
| NA | ||||||
| NA | ||||||
| NA |
To visualize the results use the plot function.
Figure 7.9: Evidence network
8 Disease-Disease Associations
The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).
8.3 Searching DDAs via semantic relationships
To obtain disease-disease associations via semantic relationships, use the disease2disease function with the argument relationship equal to one of the following types of semantic relations: has_manifestation, has_associated_morphology, manifestation_of, associated_morphology_of, is_finding_of_disease, due_to, has_definitional_manifestation, has_associated_finding, definitional_manifestation_of, disease_has_finding, cause_of, associated_finding_of.
The output is a DataGeNET.DGN object that contains the diseases that have the type of relationship defined in the query with the query disease.
results <- disease2disease(
disease_1 = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_manifestation", min_sokal = 0.7, order_by = "SOKAL",
database = "CURATED" )
results## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-disease-rela
## . Database: CURATED
## . Score:
## . Term: UMLS_C0011860 ... UMLS_C0028754
## . Results: 26
Table 8.6 shows the diseases associated with Obesity and Diabetes Mellitus non Insulin dependent (NIDDM) by the relation type “has_manifestation”.
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","ddaRelation","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab , caption = "Diseases associated with Obesity and NIDDM") | disease1_Name | disease2_Name | ddaRelation | shared_genes | pvalue_jaccard_genes |
|---|---|---|---|---|
| Obesity | Obesity, Hyperphagia, and Developmental Delay | has_manifestation | 1 | 1.9 |
| Obesity | Obesity, Hyperphagia, and Developmental Delay | has_manifestation | 1 | 1.6 |
| Obesity | Pseudopseudohypoparathyroidism | has_manifestation | 1 | 1.9 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13 | has_manifestation | 1 | 2.4 |
| Obesity | Pseudohypoparathyroidism Type 1C | has_manifestation | 1 | 1.6 |
| Obesity | Pseudohypoparathyroidism Type 1C | has_manifestation | 1 | 1.7 |
| Obesity | Bardet-Biedl syndrome 1 | has_manifestation | 1 | 1.1 |
| Obesity | Bardet-Biedl syndrome 2 | has_manifestation | 1 | 1.7 |
| Obesity | Bardet-Biedl syndrome 4 | has_manifestation | 1 | 1.7 |
| Obesity | Pseudohypoparathyroidism, Type Ia | has_manifestation | 1 | 1.7 |
| Obesity | LUSCAN-LUMISH SYNDROME | has_manifestation | 1 | 1.9 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13 | has_manifestation | 1 | 1.6 |
| Obesity | Pseudopseudohypoparathyroidism | has_manifestation | 1 | 1.6 |
| Obesity | BARDET-BIEDL SYNDROME 6 | has_manifestation | 1 | 1.7 |
| Obesity | Pseudohypoparathyroidism, Type Ia | has_manifestation | 1 | 1.9 |
| Obesity | CORTISONE REDUCTASE DEFICIENCY 2 | has_manifestation | 1 | 1.6 |
| Obesity | BARDET-BIEDL SYNDROME 18 | has_manifestation | 1 | 2.2 |
| Obesity | MAGEL2-related Prader-Willi-like syndrome | has_manifestation | 1 | 1.9 |
| Obesity | Pseudohypoparathyroidism Type 1C | has_manifestation | 1 | 1.9 |
| Obesity | Pseudohypoparathyroidism, Type Ia | has_manifestation | 1 | 1.6 |
| Obesity | CHOPS SYNDROME | has_manifestation | 1 | 1.6 |
| Obesity | SHORT STATURE, BRACHYDACTYLY, IMPAIRED INTELLECTUAL DEVELOPMENT, AND SEIZURES | has_manifestation | 1 | 2.2 |
| Obesity | HYPOGONADOTROPIC HYPOGONADISM 27 WITHOUT ANOSMIA | has_manifestation | 1 | 1.6 |
| Obesity | Pseudopseudohypoparathyroidism | has_manifestation | 1 | 1.7 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 13 | has_manifestation | 1 | 1.5 |
| Diabetes Mellitus, Non-Insulin-Dependent | KERATODERMA-ICHTHYOSIS-DEAFNESS SYNDROME, AUTOSOMAL RECESSIVE | has_manifestation | 2 | 4.4 |
8.4 Searching semantically similar diseases
It is possible to obtain the most similar diseases according to the Sokal-Sneath semantic similarity distance using the the get_similar_diseases function. The disease similarity between concepts is computed using the Sokal-Sneath semantic similarity distance (Sánchez and Batet 2011) on the taxonomic relations provided by the Unified Medical Language System Metathesaurus. Only the relationships of type is-a (which describe the taxonomy in any ontology) are taken into account. The get_similar_diseases function uses as input a disease, and as an optional argument min_sokal, a minimum value for the Sokal distance. By default min_sokal = 0.1.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-disease-sokal
## . Database: ALL
## . Score:
## . Term: UMLS_C0011860
## . Results: 134
In the Table 8.7, the top diseases associated to the disease, by Sokal distance
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","sokal")] )
knitr::kable(tab[1:10,], caption = "Diseases semantically similar to NIDDM") | disease1_Name | disease2_Name | sokal |
|---|---|---|
| Diabetes Mellitus, Non-Insulin-Dependent | Maturity onset diabetes mellitus in young | 0.946 |
| Diabetes Mellitus, Non-Insulin-Dependent | Lipoatrophic Diabetes Mellitus | 0.945 |
| Diabetes Mellitus, Non-Insulin-Dependent | Familial partial lipodystrophy | 0.944 |
| Diabetes Mellitus, Non-Insulin-Dependent | Type 2 diabetes mellitus in obese | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 9 (disorder) | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | Type 2 diabetes mellitus with diabetic nephropathy | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | Diabetes mellitus autosomal dominant type II (disorder) | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 3 (disorder) | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | Maturity-Onset Diabetes of the Young, Type 1 | 0.943 |
| Diabetes Mellitus, Non-Insulin-Dependent | MATURITY-ONSET DIABETES OF THE YOUNG, TYPE 6 (disorder) | 0.943 |
9 Disease enrichment
The disease_enrichment function performs a disease enrichment (or over-representation) analysis. It determines whether a user-defined set of genes is statistically significantly associated with a disease gene set in DISGENET.
The function takes as input a list of entities, either genes or variants. They are compared against the gene/variant-disease associations in the selected database (by default, ALL) to determine the diseases associated with the given gene list. The genes can be identified with HGNC, ENSEMBL or Entrez identifiers.
The database parameter allows users to choose which data source to use: CURATED for curated gene-disease associations (the default option), CLINICALTRIALS for associations extracted from ClinicalTrials.gov, or ALL to include all available databases. The number of genes on the selected data source is used as background or universe of the over-representation test.
The common_entities parameter sets the minimum number of entities that must be shared with a disease for it to be considered in the analysis; the default is 1. The max_pvalue parameter sets a threshold for the p-value from the Fisher test (default is 0.05).
9.1 For genes
Below, an example of how to perform a disease enrichment with a list of genes extracted associated to Autism from the Developmental Brain Disorder Gene Database (Gonzalez-Mantilla et al. 2016).
genes <- c("ADNP", "ANKRD11", "ANKRD17", "ASXL1", "BCKDK", "BRSK2", "CDK13", "CDK8", "CHD2", "CHD7", "CHD8", "CLCN2", "CREBBP", "CSDE1", "CTCF", "CTNNB1", "DDX3X", "FOXP1", "GFER", "H4C3", "HNRNPUL2", "IQSEC2", "ITSN1", "JARID2", "LRP2", "MARK2", "MBOAT7", "MYT1L", "NAA15", "NALCN", "NAV3", "NEXMIF" , "NSD1", "PHF21A", "POGZ", "PRR12", "QRICH1", "SCAF1", "SCN1A", "SCN2A", "SETD5", "SHANK3", "SIN3A", "SOX11", "SOX6", "TANC2", "TBCD", "TCF20" , "TCF4", "TCF7L2", "TRAF7", "TRIP12", "WAC", "WDR26", "ZEB2", "ZMYM2", "ZNF292", "ZSWIM6" )
results <- disease_enrichment(
entities = genes,
common_entities = 5,
vocabulary = "HGNC", database = "CURATED")## Your query has 1 page.
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-enrichment
## . Database: CURATED
## . Score:
## . Term: ADNP ... ZSWIM6
In the Table 9.1, the top diseases associated to the list of genes.
tab <- unique(results@qresult[ ,c("diseaseName", "geneRatio", "bgRatio", "oddsRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of genes") | diseaseName | geneRatio | bgRatio | oddsRatio | pvalue |
|---|---|---|---|---|
| Mild intellectual disability | 6/58 | 6/14365 | 114.16436 | 0e+00 |
| Intellectual Disability | 47/58 | 47/14365 | 68.69552 | 0e+00 |
| Rare genetic intellectual disability | 8/58 | 8/14365 | 65.81188 | 0e+00 |
| Neurodevelopmental abnormality | 14/58 | 14/14365 | 53.57193 | 0e+00 |
| Neurodevelopmental delay | 24/58 | 24/14365 | 46.00596 | 0e+00 |
| Neurodevelopmental Disorders | 37/58 | 37/14365 | 35.20002 | 0e+00 |
| Developmental Disabilities | 14/58 | 14/14365 | 33.58076 | 0e+00 |
| Delayed speech and language development | 9/58 | 9/14365 | 33.09268 | 0e+00 |
| Rare genetic syndromic intellectual disability | 8/58 | 8/14365 | 29.74817 | 1e-07 |
| Autosomal dominant non-syndromic intellectual disability | 5/58 | 5/14365 | 29.02424 | 7e-05 |
To visualize the results of the enrichment, use the function plot. Use the argument cutoff to set a minimum p value threshold, and the argument limit to reduce the number of records shown (Figure 9.1). By default, the limit=50. The node size is proportional to the number of intersection between the user list and the disease.
Figure 9.1: The Enrichment plot for a list of genes
9.2 For variants
Below, an example of how to perform a disease enrichment with a list of variants extracted from the publication Genomic Landscape and Mutational Signatures of Deafness-Associated Genes (Azaiez et al. 2018).
results <- disease_enrichment(
entities = c("rs80338902","rs397516871","rs368341987","rs375050157","rs111033280","rs140884994","rs201076440","rs111033439","rs1296612982","rs41281314","rs397516875","rs143282422","rs142381713","rs35818432","rs111033225","rs200104362","rs201004645","rs34988750","rs373169422","rs397517356","rs188376296","rs199897298","rs200263980","rs200416912","rs184866544","rs397517344","rs41281310","rs727503066","rs727504710","rs143240767","rs145771342","rs376898963","rs397516878","rs181255269","rs188498736","rs111033192","rs117966637","rs914189193","rs181611778","rs111033194","rs111033248","rs111033262","rs111033333","rs111033529","rs146824138","rs483353055","rs528089082","rs747131589","rs111033536","rs45629132","rs371142158","rs727504654","rs192524347","rs527236122","rs111033186","rs111033287","rs139889944","rs200454015","rs397517328","rs111033275","rs150822759","rs200038092","rs201709513","rs370155266","rs45500891","rs111033196","rs111033360","rs397517322","rs111033524","rs727505166","rs79444516","rs35730265","rs45549044","rs111033361","rs370696868","rs727504309","rs533231493"),
vocabulary = "DBSNP", database = "CURATED",)## Your query has 1 page.
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-enrichment
## . Database: CURATED
## . Score:
## . Term: rs80338902 ... rs533231493
In the Table 9.2, the top diseases associated to the list of variants
tab <- unique(results@qresult[ ,c("diseaseName", "variantRatio", "bgRatio", "oddsRatio", "pvalue")] )
tab <- tab %>% arrange(pvalue)
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of variants") | diseaseName | variantRatio | bgRatio | oddsRatio | pvalue |
|---|---|---|---|---|
| Usher Syndrome, Type I | 26/77 | 26/1649334 | 599.8726 | 0 |
| USHER SYNDROME, TYPE IIA | 23/77 | 23/1649334 | 417.2769 | 0 |
| Deafness, Autosomal Recessive 1A | 16/77 | 16/1649334 | 1704.8718 | 0 |
| RETINITIS PIGMENTOSA 39 | 20/77 | 20/1649334 | 415.0418 | 0 |
| DEAFNESS, AUTOSOMAL RECESSIVE 2 | 13/77 | 13/1649334 | 595.4655 | 0 |
| Usher syndrome, type 1A | 12/77 | 12/1649334 | 690.7947 | 0 |
| RETINITIS PIGMENTOSA-DEAFNESS SYNDROME | 12/77 | 12/1649334 | 619.9942 | 0 |
| Usher Syndrome, Type III | 12/77 | 12/1649334 | 604.5044 | 0 |
| Usher Syndrome, Type II | 12/77 | 12/1649334 | 549.5802 | 0 |
| Deafness, Autosomal Dominant 3A | 9/77 | 9/1649334 | 1930.0656 | 0 |
Figure 9.2 shows the results of the enrichment.
Figure 9.2: The Enrichment plot for a list of variants
10 Entity Attributes & Metadata
10.1 Gene attributes
The gene2attribute function allows to retrieve the information for a specific gene, or list of genes.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene
## . Database: ALL
## . Score:
## . Term: 3953
The result shows the the Disease Specificity Index (DSI), and the Disease Pleiotropy Index (DPI) for the gene (Table 10.1).
| description | geneid | gene_symbol | ensembl_ids | uniprotids | proteinClasses | ncbi_type | numDiseasesAssociatedToGene | numVariantsAssociatedToGene | numChemicals | numPublications | numCTs | firstRef | lastRef | geneDSI | geneDPI | genepLI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| leptin receptor | 3953 | LEPR | ENSG00000116678 | P48357 | DTO_05007599, DTO , Signaling | protein-coding | 618 | 169 | 50 | 1213 | 34 | 1966 | 2026 | 0.432 | 0.875 | 8.86e-05 |
10.2 Disease attributes & vocabulary mapping
The disease2attribute function allows to retrieve the information for a specific disease
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease
## . Database: ALL
## . Score:
## . Term: UMLS_C0036341
## . Results: 12
The results (Table 10.2) show the mappings to different disease vocabularies, and the disease type.
tab <- results@qresult %>% arrange(desc(vocabulary)) %>% unique()
knitr::kable(tab, caption = "Disease attributes for Schizophrenia") | vocabulary | code | disease_name | type | diseaseClasses_UMLS_ST | diseaseClasses_HPO | diseaseClasses_DO | diseaseClasses_MSH |
|---|---|---|---|---|---|---|---|
| UMLS | C0036341 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| OMIM | 181500 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| NCI | C3362 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| MSH | D012559 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| MONDO | 0005090 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| ICD9CM | 295.90 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| ICD9CM | 295.9 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| ICD9CM | 295 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| ICD10 | F20 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| ICD10 | F20.9 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| HPO | HP:0100753 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
| DO | 5419 | Schizophrenia | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
10.2.1 Retrieving the UMLS CUIs via other vocabularies
It is possible to obtain the CUIs that map to an identifier of interest (example, ICD9CM, MSH, or OMIM) using the the get_umls_from_vocabulary function.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease
## . Database: ALL
## . Score:
## . Term: MSH_D012559
## . Results: 2
The results are shown in Table 10.3.
| VOCABULARIES | code | disease_name |
|---|---|---|
| MSH | D012559 | Schizophrenia |
| UMLS | C0036341 | Schizophrenia |
10.3 Variant attributes
The variant2attribute function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN with attributes of the variant(s) such as the allelic frequency according to GNOMAD data, the most severe consequence type from the Variant Effect Predictor and the DPI, and DSI.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant
## . Database: ALL
## . Score:
## . Term: rs113488022
The results are shown in table 10.4.
tab <- unique(results@qresult )
tab <- tab %>% dplyr::select(-threeletterID, -oneletterID)
knitr::kable(tab, caption = "Attributes for variant rs113488022") | variantid | ref | alt | polyphen_score | sift_score | chromosome | position | mostSevereConsequences | var_gene_symbol | geneid | geneEnsemblID | gene_symbol | numDiseasesAssociatedToVariant | numChemicals | numPublications | firstRef | lastRef | hgvsc | hgvsp | variantDSI | variantDPI | dbsnpclass | source | exome |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rs113488022 | A | C | 0.958 | 0 | 7 | 140753336 | missense_variant | BRAF | 673 | ENSG00000157764 | BRAF | 754 | 184 | 3860 | 1993 | 2026 | ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A | ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu | 0.354 | 0.045 | snv | ||
| rs113488022 | A | G | 0.958 | 0 | 7 | 140753336 | missense_variant | BRAF | 673 | ENSG00000157764 | BRAF | 754 | 184 | 3860 | 1993 | 2026 | ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A | ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu | 0.354 | 0.045 | snv | ||
| rs113488022 | A | T | 0.958 | 0 | 7 | 140753336 | missense_variant | BRAF | 673 | ENSG00000157764 | BRAF | 754 | 184 | 3860 | 1993 | 2026 | ENST00000646891.2:c.1799T>G, ENST00000646891.2:c.1799T>C, ENST00000646891.2:c.1799T>A | ENSP00000493543.1:p.Val600Gly, ENSP00000493543.1:p.Val600Ala, ENSP00000493543.1:p.Val600Glu | 0.354 | 0.045 | snv | GNOMAD | 1.4e-06 |
10.4 Chemical attributes
The chemical2attribute function allows to retrieve the information for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical
## . Database: ALL
## . Score:
## . Term:
## . Results: 5
tab <-results@qresult %>% select(chemID, chemVocabulariesCrossreferences, chemPrefName)
knitr::kable(tab, caption = "Attributes for Acetylsalic acid") | chemID | chemVocabulariesCrossreferences | chemPrefName |
|---|---|---|
| CHEMBL25 | CHEMBL_CHEMBL25 | Acetylsalicylic acid |
| CHEMBL25 | CHEBI_15365 | Acetylsalicylic acid |
| CHEMBL25 | DRUGBANK_DB00945 | Acetylsalicylic acid |
| CHEMBL25 | MESH_D001241 | Acetylsalicylic acid |
| CHEMBL25 | PUBCHEM_2244 | Acetylsalicylic acid |
11 Versions
11.1 Get DISGENET data version
## [1] "{ status : OK , payload :{ apiVersion : 1.9.4 , dataVersion : DISGENET v26.1 , lastUpdate : Mar 30 2026 , version : DISGENET v26.1 }, httpStatus :200}"
11.2 disgenet2r version
## Version: 1.2.5
12 COPYRIGHT
©2025 MedBioinformatics Solutions SL
13 License
disgenet2r is distributed under the GPL-2 license.