disgenet2r: An R package to explore the molecular underpinnings of human diseases
Introduction
The disgenet2r package contains a set of functions to retrieve, visualize and expand DISGENET data (Piñero et al. 2021, 2019). DISGENET is a comprehensive discovery platform that integrates more than 30 millions associations between genes, variants, and human diseases. The information in DISGENET has been extracted from expert-curated resources and from the literature using state-of-the-art text mining technologies (Table 1).
To use DISGENET and the disgenet2r package, you need to acquire a license. Please contact us at info@disgenet.com for license conditions and pricing.
Source_Name | Type_of_data | Description |
---|---|---|
CLINGEN | GDAs | The Clinical Genome Resource |
ORPHANET | GDAs | The portal for rare diseases and orphan drugs |
PSYGENET | GDAs | Psychiatric disorders Gene association NETwork |
HPO | GDAs | Human Phenotype Ontology |
MGD_HUMAN | GDAs | Mouse Genome Database, human data |
MGD_MOUSE | GDAs | Mouse Genome Database, mouse data |
RGD_HUMAN | GDAs | Rat Genome Database, human data |
RGD_RAT | GDAs | Rat Genome Database, rat data |
UNIPROT | GDAs/VDAs | The Universal Protein Resource |
CLINVAR | GDAs/VDAs | ClinVar Database |
GWASCAT | GDAs/VDAs | The NHGRI-EBI GWAS Catalog |
PHEWASCAT | GDAs/VDAs | The PHEWAS Catalog |
TEXT MINING HUMAN | GDAs/VDAs | Data from text mining medline abstracts, human |
TEXT MINING MODELS | GDAs | Data from text mining medline abstracts, models |
CLINICAL TRIALS | GDAs | Data from Clinicaltrials.org |
CURATED | GDAs/VDAs | Human curated sources: ClinGen, UniProt, Orphanet, PsyGeNET, ClinVar, MGD Human |
INFERRED | GDAs | Inferred data from the HPO and the GWAS Catalog |
MODELS | GDAs | Data from animal models: MGD MOUSE and TEXT MINING MODELS |
ALL | GDAs/VDAs | All data sources |
You can test DISGENET and the disgenet2r package by registering for a free trial account here.
disgenet2r package usage limits
Trial account
Please note that the trial account enables you to test all the functions of the disgenet2r package, but the queries to DISGENET database have the following restrictions:
Only the top-30 results ordered by descending DISGENET score are returned (pagination is not supported).
Multiple-entity queries support at most 10 entities (genes, diseases, variants).
The access to DISGENET with a TRIAL account will expire after 7 days from the day of activation.
Other plans
There are limits in place for the disgenet2r package to ensure smooth performance for all users. These limits apply to academics, advanced, and premium users, mirroring the limits of the DISGENET REST API.
Here’s a breakdown of the limitations:
A maximum of 100 pages of results are returned.
Multiple-entity queries support at most 100 entities (genes, diseases, variants).
Important Note: The package will display a warning message if you exceed these limits.
Recommendations for Efficient Use:
To improve performance and avoid exceeding limits, consider querying with smaller batches of entities. You can also use disgenet metrics and annotations to refine your search and reduce the number of returned results.
Installation and first run
The package disgenet2r is available through GitLab. The package requires an R version > 3.5.
Install disgenet2r by typing in R:
To load the package:
Once you have completed the registration process, go to your user profile…
… and retrieve your API key
After retrieving the API key from your user profile, run the lines below so the key is available for all the disgenet2r functions.
In the following document, we illustrate how to use the disgenet2r package through a series of examples.
Quick Start
The functions in the disgenet2r package receive as parameters one entity (gene, disease, variant, and chemical), or a list of entities (up to 100) and combinations of them. In addition, they have the following parameters:
score
A vector with two elements: 1) initial value of score 2) final value of score. Default0-1
.database
Name of the database that will be queried. DefaultCURATED
. It can take the values: ‘CLINGEN’, ‘CLINVAR’, ‘ORPHANET’, ‘PSYGENET’, ‘UNIPROT’, ‘CURATED’, ‘HPO’, ‘GWASCAT’, ‘PHEWASCAT’, ‘INFERRED’, ‘MGD_HUMAN’, ‘MGD_MOUSE’, ‘RGD_HUMAN’, ‘RGD_RAT’, ‘TEXTMINING_MODELS’, ‘MODELS’, ‘TEXTMINING_HUMAN’, “CLINICALTRIALS” , and ‘ALL’.n_pags
A number between 1 and 100 indicating the number of pages to retrieve from the results of the query. Default100
. If a number of pages larger than 100 is indicated, the function will stop.verbose
By defaultFALSE
. Change it to TRUE to enable real-time logging from the function.order_by
By defaultscore
. Depending on the type of query, it can accept the following values: score, dsi, dpi, pli, pmYear, ei, yearInitial, yearFinal, numCTsupportingAssociation.
Below, an example of a query for the BRCA1 gene in ALL the data. Notice that this query retrieves over 300 pages of results. Only the first 10,000 results will be retrieved (100 pages, 100 results per page).
## Notice that your query has a maximum of 342 pages.
## By using the default n_pags (100), your query of 342 pages has been reduced to 100 pages.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-evidence
## . Database: ALL
## . Score: 0-1
## . Term: BRCA1
## . Results: 10000
Retrieving Gene-Disease Associations from DISGENET
Searching by gene
The gene2disease function retrieves the GDAs in DISGENET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary
. By default, vocabulary = "HGNC"
. To switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.
The function also requires the user to specify the source database using the argument database
. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from PsyGeNET, ClinGen, ClinVar, MGD Human data, UniProt, and Orphanet.
The information can be filtered using the DISGENET score. The argument score
consists of a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1)
.
In the example, the query for the Leptin Receptor (Gene Symbol LEPR
, and Entrez NCBI Identifier 3953
) is performed in the curated data in DISGENET.
The function gene2disease produces an object DataGeNET.DGN
that contains the results of the query.
## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"
Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single
or list
), the type of entity (gene-disease
), the selected database (CURATED), the score range used in the search (0-1
), and the gene NCBI identifier (3953
).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: CURATED
## . Score: 0-1
## . Term: 3953
## . Results: 67
To obtain the data frame with the results of the query
## gene_symbol geneid ensemblid geneNcbiType geneDSI geneDPI genepLI
## 1 LEPR 3953 ENSG00000116678 protein-coding 0.413 0.957 8.8607e-05
## 2 LEPR 3953 ENSG00000116678 protein-coding 0.413 0.957 8.8607e-05
## 3 LEPR 3953 ENSG00000116678 protein-coding 0.413 0.957 8.8607e-05
## uniprotids protein_classid protein_class_name
## 1 Q4G138, P48357 DTO_05007599 Signaling
## 2 P48357, Q4G138 DTO_05007599 Signaling
## 3 P48357, Q4G138 DTO_05007599 Signaling
## disease_name diseaseType diseaseUMLSCUI
## 1 Obesity disease C0028754
## 2 Adult-Onset Diabetes Mellitus disease C0011860
## 3 Diabetes Mellitus disease C0011849
## diseaseClasses_MSH
## 1 Nutritional and Metabolic Diseases (C18), Pathological Conditions, Signs and Symptoms (C23)
## 2 Nutritional and Metabolic Diseases (C18), Endocrine System Diseases (C19)
## 3 Nutritional and Metabolic Diseases (C18), Endocrine System Diseases (C19)
## diseaseClasses_UMLS_ST
## 1 Disease or Syndrome (T047)
## 2 Disease or Syndrome (T047)
## 3 Disease or Syndrome (T047)
## diseaseClasses_DO
## 1 disease of metabolism (0014667)
## 2 genetic disease (630), disease of metabolism (0014667)
## 3 genetic disease (630), disease of metabolism (0014667)
## diseaseClasses_HPO
## 1 Growth abnormality (01507)
## 2 Abnormality of the endocrine system (00818), Abnormality of metabolism/homeostasis (01939)
## 3 Abnormality of the endocrine system (00818), Abnormality of metabolism/homeostasis (01939)
## numCTsupportingAssociation numPMIDs
## 1 7 14
## 2 0 4
## 3 1 1
## chemicalsIncludedInEvidence
## 1 C0039601, C0002006, C1145760, C1135174, C0245514, C0039286, C0019392, C0076275, C0045811, C0017986, C0028128, C0014942, C0041984, testosterone, aldosterone, treprostinil, H-Indol-2-one, 3-((3,5-dimethyl-1H-pyrrol-2-yl)methylene)-1,3-dihydro-, troglitazone, tamoxifen, hesperidin, orlistat, 2-amino-1-methyl-6-phenylimidazo(4,5-b)pyridine, glycyrrhetinic acid, nitric oxide, estrone, uridine
## 2 C0038432, C0021936, C1504945, C0041984, streptozocin, inulin, INO-1001, uridine
## 3 C0028193, C0001041, C0039601, C0025598, C0038432, C1307704, nitroprusside, acetylcholine, testosterone, metformin, streptozocin, RUBOXISTAURIN
## numberPmidsWithChemsIncludedInEvidenceBySource
## 1 ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 11, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 0, 1, 2, 0, 0
## 2 ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0
## 3 ALL, CURATED, INFERRED, MODELS, PSYGENET, ORPHANET, CLINGEN, UNIPROT, HPO, GWASCAT, CLINVAR, TEXTMINING_HUMAN, PHEWASCAT, TEXTMINING_MODELS, MGD_HUMAN, MGD_MOUSE, RGD_HUMAN, RGD_RAT, CLINICALTRIALS, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 0, 0, 0
## score yearInitial yearFinal evidence_level evidence_index diseaseid
## 1 1.0 1986 2023 NA 0.8622754 C0028754
## 2 0.9 2010 2016 NA 0.9112903 C0011860
## 3 0.9 2003 2003 NA 0.8260870 C0011849
The same query can be performed using the Gene Symbol (LEPR
) and the data source (TEXTMINING_HUMAN). Notice how the number of diseases associated to the Leptin Receptor has increased.
results <- gene2disease( gene = "LEPR",
vocabulary = "HGNC",
database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: LEPR
## . Results: 420
The same query can be performed using the ENSEMBL gene identifier of the LEPR gene (ENSG00000116678
) by setting the vocabulary to ENSEMBL
.
results <- gene2disease( gene = "ENSG00000116678",
vocabulary = "ENSEMBL",
database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: ENSG00000116678
## . Results: 420
Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.3,1)
is used. Notice how the number of diseases associated to the Leptin Receptor drops when the score is restricted.
results <- gene2disease( gene = "LEPR",
vocabulary = "HGNC",
database = "ALL",
score =c(0.3,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: ALL
## . Score: 0.3-1
## . Term: LEPR
## . Results: 94
In Table 2 are shown the top 20 diseases associated to the LEPR gene
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] )
knitr::kable(tab[1:10,], caption = "Top diseases associated to LEPR" )
gene_symbol | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
LEPR | Obesity | 1.00 | 1966 | 2024 |
LEPR | Adult-Onset Diabetes Mellitus | 0.90 | 1966 | 2024 |
LEPR | Diabetes Mellitus | 0.90 | 1981 | 2023 |
LEPR | High blood pressure | 0.85 | 1998 | 2022 |
LEPR | Polyphagia | 0.85 | 1986 | 2023 |
LEPR | Hyperinsulinism | 0.85 | 1986 | 2022 |
LEPR | Morbid Obesities | 0.85 | 1995 | 2024 |
LEPR | Hyperglycemia | 0.80 | 1986 | 2024 |
LEPR | Liver cell carcinoma | 0.80 | 1997 | 2024 |
LEPR | Insulin Resistance | 0.80 | 1999 | 2024 |
Visualizing the diseases associated to a single gene
The disgenet2r package offers two options to visualize the results of querying a single gene in DISGENET: a network showing the diseases associated to the gene of interest (Gene-Disease Network
), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network
). These graphics can be obtained by changing the class
argument in the plot function.
By default, the plot function produces a Gene-Disease Network
on a DataGeNET.DGN
object (Figure 1). In the Gene-Disease Network
the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop
parameter allows to adjust the size of the nodes, while the eprop
parameter adjusts the width of the edges while keeping the proportionality to the score.
Use interactive = TRUE
to display an interactive plot (Figure 2).
The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class
argument is set to DiseaseClass (Gene-Disease Class Network
, Figure 3). In the Gene-Disease Class Network
, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Leptin Receptor is associated mainly to Nutritional and Metabolic Diseases. There are 2 diseases in the example that do not have annotations to MeSH disease class (shown as a warning).
Exploring the attributes of a gene
The gene2attribute function allows to retrieve the information for a specific gene, or list of genes.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene
## . Database: ALL
## . Score:
## . Term: 3953
The result shows the the Disease Specificity Index (DSI), and the Disease Pleiotropy Index (DPI) for the gene (Table 3).
description | geneid | gene_symbol | ensembl_ids | uniprotids | proteinClasses | ncbi_type | genepLI | geneDSI | geneDPI |
---|---|---|---|---|---|---|---|---|---|
leptin receptor | 3953 | LEPR | ENSG00000116678 | P48357 | DTO_05007599, DTO , Signaling | protein-coding | 8.86e-05 | 0.413 | 0.957 |
leptin receptor | 3953 | LEPR | ENSG00000116678 | Q4G138 | DTO_05007599, DTO , Signaling | protein-coding | 8.86e-05 | 0.413 | 0.957 |
Exploring the evidences associated to a gene
You can extract the evidences associated to a particular gene using the function gene2evidence. Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease
.
results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
disease ="UMLS_C3554225", database = "ALL")
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-evidence
## . Database: ALL
## . Score: 0-1
## . Term: LEPR
## . Results: 18
The results are shown in Table 4.
tab <- results@qresult
tab <- tab %>%
filter(reference_type == "PMID") %>%
select(reference, associationType, pmYear, sentence) %>% arrange(desc(pmYear))
tab <- tab %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate( pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) ) ) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association between LEPR & LEPTIN RECEPTOR DEFICIENCY" )
pmid | associationType | Year | Sentence |
---|---|---|---|
37140700 | GeneticVariation | 2023 | In conclusion, we reported ten new patients with leptin and leptin receptor deficiencies and identified six novel LEPR variants expanding the mutational spectrum of this rare disorder. |
33922961 | GeneticVariation | 2021 | Recently, we discovered a spontaneous compound heterozygous mutation within the leptin receptor, resulting in a considerably more obese phenotype than described for the homozygous leptin receptor deficient mice. |
29158088 | AlteredExpression | 2018 | In this study, we demonstrate that leptin receptor activation directly affects iron metabolism by the finding that serum levels of hepcidin, the master regulator of iron in the whole body, were significantly lower in leptin-deficient (ob/ob) and leptin receptor-deficient (db/db) mice. |
25751111 | GeneticVariation | 2015 | Seven novel deleterious LEPR mutations found in early-onset obesity: a ΔExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect. |
24611737 | CausalMutation | 2014 | Novel variants in the MC4R and LEPR genes among severely obese children from the Iberian population. |
22810975 | GeneticVariation | 2012 | Variants in the LEPR gene are nominally associated with higher BMI and lower 24-h energy expenditure in Pima Indians. |
18703626 | CausalMutation | 2008 | Functional characterization of naturally occurring pathogenic mutations in the human leptin receptor. |
17229951 | CausalMutation | 2007 | Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor. |
16284652 | CausalMutation | 2005 | Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes. |
12646666 | GeneticVariation | 2003 | Binge eating as a major phenotype of melanocortin 4 receptor gene mutations. |
12031989 | AlteredExpression | 2002 | These data demonstrate that leptin is not needed for ObR gene expression, and they suggest that leptin plays a role in receptor downregulation because sObR levels are negatively correlated with leptin levels and BMI in control subjects, whereas sObR levels are not depressed in obese leptin-deficient or leptin receptor-deficient individuals. |
9860295 | GeneticVariation | 1998 | Transmission disequilibrium and sequence variants at the leptin receptor gene in extremely obese German children and adolescents. |
9537324 | GeneticVariation | 1998 | A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. |
9537324 | CausalMutation | 1998 | A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. |
9144432 | GeneticVariation | 1997 | Amino acid variants in the human leptin receptor: lack of association to juvenile onset obesity. |
To visualize the results when there are many evidences, we suggest to use plot the results using the argument Points
(Figure 4). It is important to set the parameter limit
to 10,000, in order to include all the evidences in the plot.
results <- gene2evidence( gene = "LEPR", vocabulary = "HGNC",
database = "ALL", score=c(0.7,1) )
plot(results, type="Points", interactive=T, limit=10000)
Searching multiple genes
The gene2disease function can also receive as input a list of genes, either as Entrez NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 5) and then, we apply the function gene2disease.
Name | Description |
---|---|
KCNE1 | potassium channel, voltage gated subfamily E regulatory beta subunit 1 |
KCNE2 | potassium channel, voltage gated subfamily E regulatory beta subunit 2 |
KCNH1 | potassium channel, voltage gated eag related subfamily H, member 1 |
KCNH2 | potassium channel, voltage gated eag related subfamily H, member 2 |
KCNG1 | potassium voltage-gated channel modifier subfamily G member 1 |
Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.
The gene2disease function also requires the user to specify the source database using the argument database
, and optionally, the DISGENET score
can also be applied to filter the results.
## Your query has 1 page.
## Warning in gene2disease(gene = myListOfGenes, database = "ALL", score = c(0.5, :
## One or more of the genes in the list is not in DISGENET ( 'ALL' ):
## - KCNG1
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: gene-disease
## . Database: ALL
## . Score: 0.5-1
## . Term: KCNE1 ... KCNH2
## . Results: 43
In Table 6, the top 20 diseases associated to the list of genes belonging to the voltage-gated potassium channel family.
tab <- results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] %>% unique() %>%
arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top GDAs for the list of genes belonging to the voltage-gated potassium channel family")
gene_symbol | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
KCNH2 | Long QT Syndrome | 1.00 | 1970 | 2024 |
KCNH2 | Arrhythmia | 1.00 | 1975 | 2024 |
KCNE1 | Jervell Lange Nielsen Syndrome | 1.00 | 1993 | 2021 |
KCNE2 | Long QT Syndrome | 1.00 | 1999 | 2021 |
KCNH2 | LONG QT SYNDROME 2 | 0.95 | 1986 | 2024 |
KCNE2 | Arrhythmia | 0.90 | 1999 | 2024 |
KCNH2 | Cardiac Death, Sudden | 0.90 | 2000 | 2024 |
KCNE1 | Long QT Syndrome | 0.90 | 1975 | 2024 |
KCNE1 | LONG QT SYNDROME 5 | 0.90 | 1991 | 2021 |
KCNH2 | SQT1 | 0.90 | 1999 | 2022 |
Visualizing the diseases associated to multiple genes
By default, plotting a DataGeNET.DGN
resulting of the query with a list of genes produces a Gene-Disease Network
where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 5).
Set the argument interactive = TRUE to see an interactive network (Figure 6).
Setting the argument type
to Heatmap produces a Gene-Disease Heatmap
(Figure 7), where the scale of colors is proportional to the score of the GDA. The argument limit
can be used to limit the number of rows to the top scoring GDAs. The argument nchars
can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.
These results can also be visualized as a Gene-Disease Class Heatmap
by setting the argument type
to Heatmap and class
to DiseaseClass (Figure 8). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.
Alternative, set the arguments type
to Network and class
to DiseaseClass to generate a Gene-Disease Class Network
(Figure 9).
Exploring the evidences associated to a list of genes
First, create the object gene-evidence
using the gene2evidence function.
## Your query has 24 pages.
To visualize the results set the argument class=Points
(Figure 10).
Exploring the Clinical trials associated to a list of genes
First, create the object gene-evidence
using the gene2evidence function.
results <- gene2evidence(gene = c("IL3", "IL4", "IL5", "IL6", "IL0"),
database = "CLINICALTRIALS", verbose = TRUE )
## Your query has 106 pages.
## Notice that your query has a maximum of 106 pages.
## By using the default n_pags (100), your query of 106 pages has been reduced to 100 pages.
## Warning in gene2evidence(gene = c("IL3", "IL4", "IL5", "IL6", "IL0"), database = "CLINICALTRIALS", :
## One or more of the genes in the list is not in DISGENET ('CLINICALTRIALS'): IL0
To visualize the results set the argument class=Points
(Figure 11).
Searching by gene and chemical
You can search GDAs by chemicals by specifying a chemical identifier using the chemical filter in the gene2disease function. Table 7 shows the diseases associated to LEPR associated to metformin.
results <- gene2disease( gene = "LEPR", vocabulary = "HGNC",
database = "TEXTMINING_HUMAN",
chemical = "C0025598" )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: LEPR
## . Results: 4
tab <- results@qresult
tab <-tab%>% dplyr::select(chemical_name, gene_symbol, disease_name, score)
knitr::kable(tab, caption = "GDAs for LEPR and metformin")
chemical_name | gene_symbol | disease_name | score |
---|---|---|---|
metformin | LEPR | Ovary Syndrome, Polycystic | 0.45 |
metformin | LEPR | Hepatic steatosis | 0.35 |
metformin | LEPR | Schizophrenias | 0.20 |
metformin | LEPR | Pulmonary arterial hypertension | 0.10 |
Retrieving the chemicals associated to a gene
For GDAs that have a chemical annotation, we can perform a query with a gene, or list of genes, to retrieve the chemicals annotated to this associations.
results <- gene2chemical( gene = "PDGFRA",
vocabulary = "HGNC",
database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: gene-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.8-1
## . Term: PDGFRA
## . Results: 14
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type == "PMID") %>% dplyr::select(disease_name, chemical_name, chemical_effect,sentence,
reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name,
Chemical = chemical_name, `Chemical effect` = chemical_effect,
Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid ) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Selection of chemicals associated to PDGFRA" )
Disease | Chemical | Chemical effect | Sentence | pmid | Year |
---|---|---|---|---|---|
GIST | Avapritinib | therapeutic | Avapritinib is the only potent and selective inhibitor approved for the treatment of D842V-mutant gastrointestinal stromal tumors (GIST), the most common primary mutation of the platelet-derived growth factor receptor α (PDGFRA). | 38167404 | 2024 |
GIST | Avapritinib | therapeutic|therapeutic | The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively. | 38756640 | 2024 |
GIST | imatinib | therapeutic|therapeutic | The most common driver mutations are KIT and PDGFRA which can be treated with imatinib or avapritinib (for PDGFRA D842V-mutant GIST), respectively. | 38756640 | 2024 |
GIST | Avapritinib | therapeutic | Avapritinib is the only drug for adult patients with PDGFRA exon 18 mutated unresectable or metastatic gastrointestinal stromal tumor (GIST). | 38803186 | 2024 |
GIST | 1-N’-[2,5-difluoro-4-[2-(1-methylpyrazol-4-yl)pyridin-4-yl]oxyphenyl]-1-N’-phenylcyclopropane-1,1-dicarboxamide | therapeutic | Ripretinib, a broad-spectrum inhibitor of the KIT and PDGFRA receptor tyrosine kinases, is designated as a fourth-line treatment for gastrointestinal stromal tumor (GIST). | 38973363 | 2024 |
GIST | sorafenib | therapeutic | Low Dose Sorafenib in Gastric Gastrointestinal Stromal Tumour with PDGFRA p.1843-D846 Deletion in an 88-Year-Old Male. | 38576303 | 2024 |
GIST | Avapritinib | therapeutic|therapeutic | Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation. | 36155864 | 2023 |
GIST | imatinib | therapeutic|therapeutic | Approved in 2020, avapritinib is the first effective targeted therapy for advanced stage GIST harboring an imatinib-resistant PDGFRA D842V mutation. | 36155864 | 2023 |
GIST | imatinib | therapeutic | KIT and PDGFRA Mutations and Survival of Gastrointestinal Stromal Tumor Patients Treated with Adjuvant Imatinib in a Randomized Trial. | 37014660 | 2023 |
GIST | Avapritinib | therapeutic | To create an in vivo model of PDGFRA D842V-mutant gastrointestinal stromal tumor (GIST) and identify the mechanism of tumor persistence following avapritinib therapy. | 36971786 | 2023 |
To visualize the results use the plot function.
Searching by disease
The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), ID is the identifier in the vocabulary, and the database (by default, CURATED
). A threshold value for the score can be set, like in the gene2disease function.
In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.8 to 1.
results <- disease2gene( disease = "UMLS_C0036341",
database = "CURATED",
score = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: UMLS_C0036341
## . Results: 130
In Table 9, the top 20 genes associated to UMLS CUI C0036341.
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>%
arrange(desc(score), yearInitial)
knitr::kable(tab[1:10,], caption = "Top 10 genes associated to Schizophrenia")
gene_symbol | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
DRD3 | Schizophrenias | 1 | 1999 | 1999 |
DRD2 | Schizophrenias | 1 | 2000 | 2011 |
RTN4R | Schizophrenias | 1 | 2004 | 2017 |
HTR2A | Schizophrenias | 1 | 2004 | 2008 |
COMT | Schizophrenias | 1 | 2005 | 2010 |
TNF | Schizophrenias | 1 | 2006 | 2006 |
GABBR1 | Schizophrenias | 1 | 2007 | 2013 |
ZNF804A | Schizophrenias | 1 | 2008 | 2018 |
GRIN2B | Schizophrenias | 1 | 2008 | 2008 |
GRIN2D | Schizophrenias | 1 | 2010 | 2010 |
Visualizing the genes associated to a single disease
There are two options to visualize the results from searching a single disease: a Gene-Disease Network
showing the genes related to the disease of interest (Figure 13), and a Disease-Protein Class Network
with the genes grouped grouped by the the Drug Target Ontology Protein Class (Figure 14).
Figure 13 shows the default Gene-Disease Network
for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.
Alternatively, in the Disease-Protein Class Network
, genes are grouped by the the Drug Target Ontology Protein Class (Figure 14). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class
argument ProteinClass. The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are G-protein coupled receptors. Notice again that not all genes have annotations to Protein classes.
The same results are obtained when querying DISGENET with the MeSH identifier for Schizophrenia (D012559).
results <- disease2gene( disease = "MESH_D012559",
database = "CURATED",
score = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: MESH_D012559
## . Results: 130
The same results are obtained when querying DISGENET with the OMIM identifier for Schizophrenia (181500).
results <- disease2gene( disease = "OMIM_181500",
database = "CURATED",
score = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: OMIM_181500
## . Results: 130
The same results are obtained when querying DISGENET with the ICD9-CM identifier for Schizophrenia (295).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: ICD9CM_295
## . Results: 130
The same results are obtained when querying DISGENET with the NCI identifier for Schizophrenia (C3362).
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: NCI_C3362
## . Results: 130
The same results are obtained when querying DISGENET with the DO identifier for Schizophrenia (5419).
results <- disease2gene( disease = "HPO_HP:0100753",
database = "CURATED",
score = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-gene
## . Database: CURATED
## . Score: 0.8-1
## . Term: HPO_HP:0100753
## . Results: 130
Searching by disease and chemical
You can filter the results to find associations that are mentioned in the context of a chemical, like the example below.
results <- disease2gene( disease = "UMLS_C0006142", chemical = "C0039286",
database = "ALL" , n_pags = 1 )
## Notice that your query has a maximum of 9 pages.
## By indicating n_pags = 1, your query of 9 pages has been reduced to 1 pages.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: ALL
## . Score: 0-1
## . Term: UMLS_C0006142
## . Results: 100
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "chemical_name", "chemicalid")] )%>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to breast cancer")
gene_symbol | disease_name | score | chemical_name | chemicalid |
---|---|---|---|---|
BRCA1 | Cancer, Breast | 1 | tamoxifen | C0039286 |
BRCA2 | Cancer, Breast | 1 | tamoxifen | C0039286 |
CDH1 | Cancer, Breast | 1 | tamoxifen | C0039286 |
ESR1 | Cancer, Breast | 1 | tamoxifen | C0039286 |
FGFR2 | Cancer, Breast | 1 | tamoxifen | C0039286 |
PIK3CA | Cancer, Breast | 1 | tamoxifen | C0039286 |
PTEN | Cancer, Breast | 1 | tamoxifen | C0039286 |
RAD51 | Cancer, Breast | 1 | tamoxifen | C0039286 |
TP53 | Cancer, Breast | 1 | tamoxifen | C0039286 |
CHEK2 | Cancer, Breast | 1 | tamoxifen | C0039286 |
Retrieving the chemicals associated to a disease
For GDAs that have a chemical annotation, we can perform a query with a disease, or list of disease, to retrieve the chemicals annotated to this associations.
results <- disease2chemical( disease = "UMLS_C0010674",
database = "TEXTMINING_MODELS" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-chemical
## . Database: TEXTMINING_MODELS
## . Score: 0.8-1
## . Term: UMLS_C0010674
## . Results: 19
tab <- results@qresult
tab <-tab %>% dplyr::filter(reference_type =="PMID") %>% dplyr::select(gene_symbol, chemical_name,chemical_effect ,sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
`Chemical Effect`=chemical_effect , Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Top chemicals associated to Cystic Fibrosis" )
Gene | Chemical | Chemical Effect | Sentence | pmid | Year |
---|---|---|---|---|---|
CFTR | linaclotide | other|other | These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion. | 38869953 | 2024 |
CFTR | phenobarbital | other|other | These data provide further insights into the action of linaclotide and how DRA may compensate for loss of CFTR in regulating luminal pH. Linaclotide may be a useful therapy for CF individuals with impaired bicarbonate secretion. | 38869953 | 2024 |
CFTR | dinoprostone | other | Additionally, the A140D polymorphism of GSTO1-1 was associated with lower levels of the antiinflammatory mediators PGE2 and 15(S)-HETE, and with lower values of the FEV1/FVC ratio in CF subjects with the homozygous CFTR ΔF508 mutation. | 33583732 | 2021 |
CFTR | lumacaftor | therapeutic|therapeutic | For CF patients and CF mice, we developed a HCO3- drinking test to assess the role of the cystic fibrosis transmembrane conductance regulator (CFTR) in urinary HCO3-excretion and applied it in the patients before and after treatment with the novel CFTR modulator drug, lumacaftor-ivacaftor. β-Intercalated cells express basolateral secretin receptors and apical CFTR and pendrin. | 32703846 | 2020 |
CFTR | ivacaftor | therapeutic|therapeutic | For CF patients and CF mice, we developed a HCO3- drinking test to assess the role of the cystic fibrosis transmembrane conductance regulator (CFTR) in urinary HCO3-excretion and applied it in the patients before and after treatment with the novel CFTR modulator drug, lumacaftor-ivacaftor. β-Intercalated cells express basolateral secretin receptors and apical CFTR and pendrin. | 32703846 | 2020 |
CFTR | lumacaftor | therapeutic | Activity of lumacaftor is not conserved in zebrafish Cftr bearing the major cystic fibrosis-causing mutation. | 32123813 | 2019 |
CFTR | lumacaftor | therapeutic|therapeutic|therapeutic | The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. | 31300729 | 2019 |
CFTR | Tezacaftor | therapeutic|therapeutic|therapeutic | The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. | 31300729 | 2019 |
CFTR | ivacaftor | therapeutic|therapeutic|therapeutic | The recent advent of the FDA-approved CFTR modulator drug ivacaftor, alone or in combination with lumacaftor or tezacaftor, has enabled treatment of the majority of patients suffering from CF. | 31300729 | 2019 |
TNF | digitoxin | therapeutic | The cardiac glycoside digitoxin, which has been shown to inhibit TNFα/NFκB signaling in CF lung epithelial cells, may serve as such a therapy. | 31864360 | 2019 |
To visualize the results use the plot function.
Exploring the attributes of a disease
The disease2attribute function allows to retrieve the information for a specific disease
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease
## . Database: ALL
## . Score:
## . Term: UMLS_C0036341
## . Results: 12
The results (Table 12) show the mappings to different disease vocabularies, and the disease type.
tab <- unique(results@qresult )
knitr::kable(tab[1:10,], caption = "Disease attributes for Schizophrenia")
vocabulary | code | disease_name | type | diseaseClasses_UMLS_ST | diseaseClasses_HPO | diseaseClasses_DO | diseaseClasses_MSH |
---|---|---|---|---|---|---|---|
MSH | D012559 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
ICD10 | F20 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
ICD10 | F20.9 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
OMIM | 181500 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
ICD9CM | 295.90 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
HPO | HP:0100753 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
NCI | C3362 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
ICD9CM | 295.9 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
ICD9CM | 295 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
DO | 5419 | Schizophrenias | disease | Mental or Behavioral Dysfunction (T048) | Abnormality of the nervous system (00707) | disease of mental health (150) | Mental Disorders (F03) |
Retrieving the UMLS CUIs via other vocabularies
It is possible to obtain the CUIs that map to an identifier of interest (example, ICD9CM, MSH, or OMIM) using the the get_umls_from_vocabulary function.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease
## . Database: ALL
## . Score:
## . Term: MSH_D012559
## . Results: 2
The results are shown in Table 13.
VOCABULARIES | code | disease_name |
---|---|---|
MSH | D012559 | Schizophrenias |
UMLS | C0036341 | Schizophrenias |
Finding the CUI associated to the name of a disease of interest
It is possible to obtain the CUIS that correspond to a disease(s) of interest using the the get_umls_from_vocabulary function. For that, we should specify the parameter vocabulary = "NAME"
. Use the the parameter limit
to change the number of CUIs that are retrieved.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease
## . Database: ALL
## . Score:
## . Term: long QT
## . Results: 10
The results are shown in Table 14.
tab <-results@qresult
knitr::kable(tab, caption = "List of CUIs that map to long QT", row.names = F)
VOCABULARIES | code | disease_name |
---|---|---|
UMLS | C1141890 | Inherited long QT syndrome |
UMLS | C0023976 | Long QT Syndrome |
UMLS | C2678485 | LQT9 |
UMLS | C1832916 | TIMOTHY SYNDROME |
UMLS | C2732979 | Aquired long QT syndrome (disorder) |
UMLS | C1867904 | LONG QT SYNDROME 5 |
UMLS | C1859062 | LONG QT SYNDROME 3 |
UMLS | C0151878 | Prolonged QT interval on EKG |
UMLS | C1833154 | LQT4 |
UMLS | C2931401 | Long QT syndrome type 3 |
Exploring the evidences associated to a disease
To explore the evidences supporting the associations for Schizophrenia use the function disease2evidence.
results <- disease2evidence( disease = "UMLS_C0036341",
type = "GDA",
database = "CURATED",
score = c( 0.8,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: CURATED
## . Score: 0.8-1
## . Term: UMLS_C0036341
## . Results: 369
A selection of evidences is shown in Table 15.
tab <- results@qresult
tab <-tab[tab$reference_type == "PMID" & tab$pmYear > 2013 & tab$source =="PSYGENET", ]
tab <- tab[ order(-tab$pmYear), c("gene_symbol","source", "associationType", "sentence", "reference", "pmYear")][1:5,]
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association for Schizophrenia" )
Gene | source | associationType | Sentence | pmid | Year |
---|---|---|---|---|---|
GRIN2A | PSYGENET | Biomarker | GRIN2A (GT)21 may play a significant role in the etiology of schizophrenia among the Chinese Han population of Shaanxi. | 25958346 | 2015 |
NOTCH4 | PSYGENET | Biomarker | Our data indicate that NOTCH4 polymorphism can influence clinical symptoms in Slovenian patients with schizophrenia. | 25529856 | 2015 |
PPARA | PSYGENET | Biomarker | We report significant increases in PPAR?, SREBP1, IL-6 and TNF?, and decreases in PPAR? and C/EPB? and mRNA levels from patients with schizophrenia, with additional BMI interactions, characterizing dysregulation of genes relating to metabolic-inflammation in schizophrenia. | 25433960 | 2015 |
MAGI2 | PSYGENET | Biomarker | One of the rare CNVs found in SZ cohorts is the duplication of Synaptic Scaffolding Molecule (S-SCAM, also called MAGI-2), which encodes a postsynaptic scaffolding protein controlling synaptic AMPA receptor levels, and thus the strength of excitatory synaptic transmission. | 25653350 | 2015 |
NCAM1 | PSYGENET | Biomarker | A growing body of evidence links aberrant levels of NCAM and polySia as well as variation in the ST8SIA2 gene to neuropsychiatric disorders, including schizophrenia. | 24057454 | 2015 |
Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene
.
results <- disease2evidence( disease = "UMLS_C0036341",
gene = c("DRD2", "DRD3"),
type = "GDA",
database = "ALL",
score = c( 0.5,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: ALL
## . Score: 0.5-1
## . Term: UMLS_C0036341
## . Results: 567
The more recent papers are shown in the Table 16.
tab <- results@qresult
tab <- tab %>%
filter(reference_type == "PMID") %>%
select(gene_symbol, associationType, reference, sentence, pmYear) %>% arrange(desc(pmYear)) %>% head(10)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Year=pmYear, Sentence = sentence, pmid = reference)
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences supporting the association between C0036341 & DRD2,DRD3" )
Gene | associationType | pmid | Sentence | Year |
---|---|---|---|---|
DRD2 | GeneticVariation | 38598465 | Adult patients with schizophrenia will be randomized (2: 1) to receive PGx-assisted treatment (drug and regimen selection depending on the results of single-nucleotide polymorphisms in genes DRD2, HTR1A, HTR2C, ABCB1, CYP2D6, CYP3A5, and CYP1A2) or the standard of care. | 2024 |
DRD2 | CausalorOrContributing | 39098130 | Clinically, DRD2 inhibitors demonstrate efficacy in managing positive symptoms of schizophrenia, manic episodes in bipolar disorder, and dopaminergic imbalance in Parkinson’s disease. | 2024 |
DRD2 | CausalorOrContributing | 37422511 | We focus on schizophrenia and the dopamine D2 receptor (DRD2), hot flashes and the neurokinin B receptor (TACR3), cigarette smoking and receptors bound by nicotine (CHRNA5, CHRNA3, CHRNB4), and alcohol use and enzymes that help to break down alcohol (ADH1B, ADH1C, ADH7). | 2024 |
DRD3 | PostTranslationalModification | 38648100 | Schizophrenia subjects exhibited thousands of neuronal and non-neuronal epigenetic differences at regions that included several susceptibility genetic loci, such as NRG1, DISC1, and DRD3. | 2024 |
DRD2 | GeneticVariation | 38421437 | Our significant polymorphism findings, mainly those in DRD2 (rs1800497, rs1799978, and rs2734841), HTR2C (rs3813929), and HTR2A (rs6311), were largely consistent with earlier findings (predictors of RIS effectiveness in adult schizophrenia patients), confirming their validity for identifying ASD children with a greater likelihood of core symptom improvement compared to noncarriers/wild types. | 2024 |
DRD2 | CausalorOrContributing | 38114631 | The Drd2 gene, encoding the dopamine D2 receptor (D2R), was recently indicated as a potential target in the etiology of lowered sociability (i.e., social withdrawal), a symptom of several neuropsychiatric disorders such as Schizophrenia and Major Depression. | 2024 |
DRD2 | GeneticVariation | 39187246 | DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia. | 2024 |
DRD2 | GeneticVariation | 38810489 | Six loci including neurexin-1(NRXN1) (rs1045881), dopamine D1 receptor (DRD1) (rs686, rs4532), chitinase-3-like protein 1 (CHI3L1) (rs4950928), velocardiofacial syndrome (ARVCF) (rs165815), dopamine D2 receptor (DRD2) (rs1076560) were identified higher expression with significant difference in individuals converted into schizophrenia after two years. | 2024 |
DRD2 | CausalorOrContributing | 39036710 | TAAR1 agonists may be less efficacious than dopamine D 2 receptor antagonists already licensed for schizophrenia. | 2024 |
DRD3 | GeneticVariation | 39187246 | DRD2 (rs6276) and DRD3 (rs6280, rs963468) polymorphisms can affect amisulpride tolerability since they are associated with the observed adverse reactions, including cardiac dysfunction and endocrine disorders in Chinese patients with schizophrenia. | 2024 |
Searching multiple diseases
The disease2gene function also accepts as input a list of diseases (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 17 shows the UMLS CUIs and the corresponding disease names.
UMLS_CUI | Disease_Name |
---|---|
C0036341 | Schizophrenia |
C0036341 | Alzheimer’s Disease |
C0030567 | Parkinson Disease |
C0005586 | Bipolar Disorder |
Creating the vector with the list of diseases.
In the example, we will search in CURATED data, using a score range of 0.8-1.
results <- disease2gene(
disease = diseasesOfInterest,
database = "CURATED",
score =c(0.8,1),
verbose = TRUE )
## Your query has 4 pages.
In table 18, the top 20 genes associated to the list of diseases.
tab <- unique(results@qresult[ ,c("gene_symbol", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top Genes associated to a list of diseases")
gene_symbol | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
GBA1 | Parkinson Disease | 1 | 1987 | 2021 |
APP | Alzheimer Disease | 1 | 1989 | 2023 |
SNCA | Parkinson Disease | 1 | 1989 | 2021 |
LRRK2 | Parkinson Disease | 1 | 1993 | 2021 |
MAPT | Alzheimer Disease | 1 | 1993 | 2020 |
PSEN1 | Alzheimer Disease | 1 | 1993 | 2020 |
GRN | Alzheimer Disease | 1 | 1993 | 2020 |
APOE | Alzheimer Disease | 1 | 1993 | 2020 |
PSEN2 | Alzheimer Disease | 1 | 1993 | 2020 |
PRKN | Parkinson Disease | 1 | 1998 | 2022 |
Visualizing the genes associated to multiple diseases
The default plot of the results of querying DISGENET with a list of diseases produces a Gene-Disease Network
where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 16).
To visualize the results as a Gene-Disease Heatmap
(Figure 17) change the argument class
to “Heatmap”. In this plot, the scale of colors is proportional to the score of the GDA. The argument limit
can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.
## [1] "Dataframe of 356 rows has been reduced to 20 rows."
A third visualization option is a Protein Class-Disease Heatmap
(Figure 18), in which genes are grouped by protein class. This plot is obtained by setting the class
argument to “ProteinClass”. In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.
A Protein Class-Disease Network
visualization is also possible (Figure 19).
To explore the evidences supporting the associations, use the function disease2evidence.
results <- disease2evidence( disease = diseasesOfInterest,
type = "GDA",
score=c(0.5,1),
database = "CURATED" )
results
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-evidence
## . Database: CURATED
## . Score: 0.5-1
## . Term: UMLS_C0036341 ... UMLS_C0005586
## . Results: 3403
To visualize the results use the argument Points
(Figure 20).
Searching by disease and chemical
The disease2gene function can also be used to retrieve genes mentioned in the context of a specific disease and chemical (Table 19)
results <- disease2gene( disease = "UMLS_C0030567",
database = "TEXTMINING_HUMAN",
chemical = "C0023570")
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: UMLS_C0030567
## . Results: 105
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "Top GDAs associated to Parkinson and levodopa")
gene_symbol | disease_name | chemical_name | score |
---|---|---|---|
BDNF | Parkinson Disease | levodopa | 1.00 |
GBA1 | Parkinson Disease | levodopa | 1.00 |
GDNF | Parkinson Disease | levodopa | 1.00 |
MAOB | Parkinson Disease | levodopa | 1.00 |
PRKN | Parkinson Disease | levodopa | 1.00 |
SNCA | Parkinson Disease | levodopa | 1.00 |
PARK7 | Parkinson Disease | levodopa | 1.00 |
PINK1 | Parkinson Disease | levodopa | 1.00 |
LRRK2 | Parkinson Disease | levodopa | 1.00 |
DDC | Parkinson Disease | levodopa | 0.95 |
To visualize the results use the function plot (Figure 20)
Retrieving the chemicals associated to a disease
To retrieve the chemicals mentioned in the GDAs involving a specific disease, we can use the disease2chemical function.
results <- disease2chemical( disease = "UMLS_C0030567",
database = "TEXTMINING_HUMAN" , score = c(0.5,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.5-1
## . Term: UMLS_C0030567
## . Results: 173
tab <- results@qresult
tab <-tab%>% dplyr::filter(reference_type == "PMID") %>% dplyr::select(gene_symbol, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Chemical = chemical_name,
`Chemical Effect` = chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid))) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Top Chemicals associated to Parkinson" )
Gene | Chemical | Chemical Effect | Sentence | pmid | Year |
---|---|---|---|---|---|
SNCA | Ganglioside GM1 | other | Research on GM1 ganglioside and its neuroprotective role in Parkinson’s disease (PD), particularly in mitigating the aggregation of α-Synuclein (aSyn), is well established across various model organisms. | 38542297 | 2024 |
SNCA | astemizole | other | We hypothesized that the proposed preclinical benefits of astemizole on PD can be associated with the attenuation of pathological α-synuclein (α-syn) aggregation. | 38540224 | 2024 |
SNCA | dopamine | therapeutic | AAV-mediated overexpression of wildtype human α-synuclein in SNc DA neurons increased the levels of α-synuclein within these cells and augmented phosphorylation of α-synuclein at serine-129, which is considered a pathological feature of PD and other synucleinopathies. | 38746104 | 2024 |
SNCA | dopamine | therapeutic | In the context of Parkinson’s disease (PD), recent advancements have been made in the development of Midbrain organoids (MBOs) models that consider key pathophysiological mechanisms such as alpha-synuclein (α-Syn), Lewy bodies, dopamine loss, and microglia activation. | 38580194 | 2024 |
SNCA | dopamine | therapeutic | Parkinson’s disease (PD) is a complicated neurodegenerative disease, characterized by the accumulation of α-synuclein (α-syn) in Lewy bodies and neurites, and massive loss of midbrain dopamine neurons. | 37755674 | 2024 |
SNCA | dopamine | therapeutic | Mitochondrial oxidative stress, defects in synaptic function, and impaired lysosomal activity have been shown to be linked in PD, resulting in a pathogenic feedback cycle involving the accumulation of toxic oxidized dopamine and alpha-synuclein. | 38306948 | 2024 |
SNCA | dopamine | therapeutic | Dopamine loss and alpha-synuclein accumulation, two hallmarks of Parkinson’s disease (PD) pathology, contribute to synaptic dysfunction and reduced synaptic density in PD. | 37814917 | 2024 |
SNCA | dopamine | therapeutic | Parkinson’s disease (PD) is characterized by the progressive death of dopamine (DA) neurons and the pathological accumulation of α-synuclein (α-syn) fibrils. | 38422699 | 2024 |
SNCA | dopamine | therapeutic | Although evidence indicates that the abnormal accumulation of α-synuclein (α-syn) in dopamine neurons of the substantia nigra is the main pathological feature of Parkinson’s disease (PD), no compounds that have both α-syn antiaggregation and α-syn degradation functions have been successful in treating the disease in the clinic. | 38696266 | 2024 |
LRRK2 | levodopa | therapeutic | This case illustrates that levodopa-responsive clinical PD caused by G2019S LRRK2 mutations can occur without Lewy bodies. | 38757351 | 2024 |
To visualize the results use the function plot
Retrieving Variant-Disease Associations from DISGENET
Searching by variant
The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN
, with Type = "variant-disease"
.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: CURATED
## . Score: 0.7-1
## . Term: rs113488022
## . Results: 16
The results are shown in Table 21.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to variant rs113488022")
variantid | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
rs113488022 | CRC | 0.9 | 1993 | 2024 |
rs113488022 | Melanoma | 0.9 | 2002 | 2021 |
rs113488022 | CARCINOMA OF COLON | 0.9 | 2002 | 2020 |
rs113488022 | CARCINOMA OF LUNG | 0.9 | 2002 | 2019 |
rs113488022 | Carcinoma, Non Small Cell Lung | 0.9 | 2002 | 2019 |
rs113488022 | Papillary Thyroid Carcinoma | 0.9 | 2002 | 2018 |
rs113488022 | Colorectal Neoplasm | 0.9 | 2002 | 2016 |
rs113488022 | GIST | 0.9 | 2002 | 2014 |
rs113488022 | Brain Neoplasms | 0.9 | 2011 | 2016 |
rs113488022 | COLONIC NEOPLASM | 0.9 | 2012 | 2014 |
Visualizing the diseases associated to a single variant
The disgenet2r package offers several options to visualize the results of querying DISGENET for a single variant: a Variant-Disease Network
(Figure 23) showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network
showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network
, Figure 24). These graphics can be obtained by changing the class
argument in the plot function.
By default, the plot function produces a Variant-Disease Network
on a DataGeNET.DGN
object (Figure 23). In the Variant-Disease Network
the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.
Exploring the evidences associated to a variant
You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease
.
results <- variant2evidence( variant = "rs10795668",
disease ="UMLS_C0009402",
database = "ALL",
score =c(0,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-evidence
## . Database: ALL
## . Score: 0-1
## . Term: rs10795668
## . Results: 23
The results are shown in table 22.
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>% select(associationType, reference, pmYear, sentence) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid=reference) %>% dplyr::arrange(desc(Year))
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Evidences supporting the association between C0009402 & rs10795668")
associationType | pmid | Year | Sentence |
---|---|---|---|
GeneticVariation | 23875689 | 2015 | . rs4939827, rs4779584, and rs10795668 may contribute to the risk of CRC in the Korean population as well as in European populations. |
GeneticVariation | 24801760 | 2015 | The CRC SNPs accounted for 4.3% of the variation in multiple adenoma risk, with three SNPs (rs6983267, rs10795668, rs3802842) explaining 3.0% of the variation. |
GeneticVariation | 24968322 | 2014 | . rs4631962 and rs10795668 contribute to CRC risk in the Taiwanese and East Asian populations, and the newly identified rs1338565 was specifically associated with CRC, supporting the ethnic diversity of CRC-susceptibility SNPs. |
GeneticVariation | 24066093 | 2013 | We genotyped four variants previously associated with CRC: rs10795668, rs16892766, rs3802842 and rs4939827. |
GeneticVariation | 23359760 | 2012 | However, no associations with CRC risk were detected for six other loci (rs9929218, rs10411210, rs12701937, rs7014346, rs6983267, and rs10795668), and one SNP, rs16892766, was not polymorphic in any of the study participants. |
GeneticVariation | 22367214 | 2012 | We used meta-analysis of an efficient empirical-Bayes estimator to detect potential multiplicative interactions between each of the SNPs [rs16892766 at 8q23.3 (EIF3H/UTP23), rs6983267 at 8q24 (MYC), rs10795668 at 10p14 (FLJ3802842), rs3802842 at 11q23 (LOC120376), rs4444235 at 14q22.2 (BMP4), rs4779584 at 15q13 (GREM1), rs9929218 at 16q22.1 (CDH1), rs4939827 at 18q21 (SMAD7), rs10411210 at 19q13.1 (RHPN2), and rs961253 at 20p12.3 (BMP2)] and select major CRC risk factors (sex, body mass index, height, smoking status, aspirin/nonsteroidal anti-inflammatory drug use, alcohol use, and dietary intake of calcium, folate, red meat, processed meat, vegetables, fruit, and fiber). |
GeneticVariation | 22363440 | 2012 | We observed an association between the low colorectal cancer risk allele (A) for rs10795668 at 10p14 and increased expression of ATP5C1 (q = 0.024) and between the colorectal cancer high risk allele (C) for rs4444235 at 14q22.2 and increased expression of DLGAP5 (q = 0.041), both in tumor samples. |
GeneticVariation | 22235025 | 2012 | Risk allele carriers for rs3802842 [Odds ratio (OR) = 1.5, 95% confidence interval (CI) 1.1-2.05, P = 0.0096, dominant model) and rs4779584 (OR = 1.39, 95% CI 1.02-1.9, P = 0.0396, dominant model) were more frequent in the CRC<50 group, whereas homozygotes for rs10795668 risk allele were also more frequent in the early-onset CRC (P = 0.02, codominant model). |
GeneticVariation | 21314996 | 2011 | In this study, we aimed to gain insight into the molecular basis of seven low-penetrance CRC loci tagged by rs4779584 at 15q13, rs10795668 at 10p14, rs3802842 at 11q23, rs4444235 at 14q22, rs9929218 at 16q22, rs10411210 at 19q13, and rs961253 at 20p12. |
GeneticVariation | 20659471 | 2010 | In contrast, in African Americans, the opposite allele of rs10795668 at 10p14 was associated with colorectal cancer (odds ratio, 1.35; P = .04), and altogether the odds ratios were in the opposite direction for 9 of the 22 SNPs tested. |
The results can be visualized using the plot function with the argument Points
. This will show the number of publications per year associated to this variant. It is important to set the parameter limit
to 10,000 in order to include all the results in the plot.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-evidence
## . Database: ALL
## . Score: 0-1
## . Term: rs1800629
## . Results: 1883
Exploring the information associated to a variant
The variant2attribute function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN
with attributes of the variant(s) such as the allelic frequency according to GNOMAD data, the most severe consequence type from the Variant Effect Predictor and the DPI, and DSI.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant
## . Database: ALL
## . Score:
## . Term: rs113488022
The results are shown in table 23.
tab <- unique(results@qresult )
tab <- tab %>% dplyr::select(-threeletterID,-source, -var_gene_symbol)
knitr::kable(tab, caption = "Attributes for variant rs113488022")
variantid | ref | alt | polyphen_score | sift_score | chromosome | coord | mostSevereConsequences | geneid | geneEnsemblID | gene_symbol | dbsnpclass | variantDSI | variantDPI | exome |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs113488022 | A | C | 0.958 | 0 | 7 | 140753336 | missense_variant | 673 | ENSG00000157764 | BRAF | snv | 0.329 | 0.045 | |
rs113488022 | A | G | 0.958 | 0 | 7 | 140753336 | missense_variant | 673 | ENSG00000157764 | BRAF | snv | 0.329 | 0.045 | |
rs113488022 | A | T | 0.958 | 0 | 7 | 140753336 | missense_variant | 673 | ENSG00000157764 | BRAF | snv | 0.329 | 0.045 | 1.4e-06 |
Searching multiple variants
The variant2disease function retrieves the information in DISGENET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database
. By default, variant2disease function uses as source database CURATED.
results <- variant2disease(
variant = c("rs121913013", "rs1060500621",
"rs199472709", "rs72552293",
"rs74315445", "rs199472795"),
database = "ALL")
results
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: variant-disease
## . Database: ALL
## . Score: 0-1
## . Term: rs121913013 ... rs199472795
## . Results: 21
In table 24, the top 20 diseases associated to the list of variants.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] )%>% dplyr::arrange(desc(score), desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top diseases associated to the list of variants")
variantid | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
rs74315445 | LONG QT SYNDROME 5 | 0.8 | 1993 | 2023 |
rs199472709 | Romano Ward Syndrome | 0.7 | 1993 | 2022 |
rs199472795 | Romano Ward Syndrome | 0.7 | 1993 | 2022 |
rs72552293 | BRUGADA SYNDROME 2 | 0.7 | 1993 | 2007 |
rs74315445 | JLNS2 | 0.7 | 1993 | 1998 |
rs199472709 | Beckwith Wiedemann Syndrome | 0.6 | 1993 | 2020 |
rs199472795 | Beckwith Wiedemann Syndrome | 0.6 | 1993 | 2020 |
rs1060500621 | Long QT Syndrome | 0.6 | 1999 | 2016 |
rs74315445 | Brugada Syndrome | 0.6 | 1993 | 2015 |
rs74315445 | Jervell Lange Nielsen Syndrome | 0.6 | 1993 | 2015 |
Visualizing the diseases associated to multiple variants
The results of querying DISGENET with a list of variants can be visualized as a Variant-Disease Network
(Figure 26), as a Variant-Gene-Disease Network
(Figure 27), as Variant-Disease Heatmap
(Figure 28), as Variant-Disease Class Network
(Figure 29) and as a Variant-Disease Class Heatmap
(Figure 30).
To obtain the Variant-Gene-Disease Network
(Figure 27), change the showGenes
argument to “TRUE”.
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network
by changing the type
argument to Heatmap
(Figure 28).
The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network
by changing the class
argument to DiseaseClass
(Figure 29).
The results of querying DISGENET variant information with a list of diseases can also be visualized as a Variant-Disease Class Heatmap
by changing the type
argument to Heatmap
(Figure 30).
Searching by disease
The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (each disease should have the format: IDENT_ID where IDENT is one of UMLS, ICD9CM, ICD10, MESH, OMIM, DO, EFO, NCI, HPO, MONDO, or ORDO) and the database (by default, CURATED
). A threshold value for the score can be set, like in the gene2disease function.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-variant
## . Database: CLINVAR
## . Score: 0-1
## . Term: UMLS_C1832916
## . Results: 152
In Table 25, the variants associated to Timothy syndrome according to ClinVar database.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = " Variants associated to Timothy syndrome according to ClinVar")
variantid | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
rs786205753 | TIMOTHY SYNDROME | 0.8 | 1993 | 2019 |
rs79891110 | TIMOTHY SYNDROME | 0.8 | 1993 | 2018 |
rs786205745 | TIMOTHY SYNDROME | 0.8 | 1993 | 2004 |
rs786205748 | TIMOTHY SYNDROME | 0.7 | 1993 | 2020 |
rs549476254 | TIMOTHY SYNDROME | 0.7 | 1993 | 2019 |
rs374528680 | TIMOTHY SYNDROME | 0.7 | 1993 | 2015 |
rs80315385 | TIMOTHY SYNDROME | 0.7 | 1993 | 2015 |
rs797044881 | TIMOTHY SYNDROME | 0.7 | 1993 | 2015 |
rs587782933 | TIMOTHY SYNDROME | 0.7 | 1993 | 1993 |
rs369246066 | TIMOTHY SYNDROME | 0.6 | 1993 | 2020 |
The results can be further restricted to keep variants predicted to be deleterious by SIFT and PolyPhen scores, by passing ranges of these scores to the function, using sift
and polyphen
arguments, like in the example below. Remember that genetic variants with SIFT scores smaller than 0.05 are predicted to be deleterious, while values of PolyPhen greater than 0.908 are classified as Probably Damaging.
results <- disease2variant(disease = c("UMLS_C1832916"),
database = "CLINVAR", sift = c(0,0.05), polyphen = c(0.9,1) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-variant
## . Database: CLINVAR
## . Score: 0-1
## . Term: UMLS_C1832916
## . Results: 84
In Table 26, the deleterious variants associated to Timothy syndrome repored in ClinVar database.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "polyphen_score", "sift_score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Deleterious variants associated to Timothy syndrome according to ClinVar")
variantid | disease_name | score | polyphen_score | sift_score | yearInitial | yearFinal |
---|---|---|---|---|---|---|
rs786205753 | TIMOTHY SYNDROME | 0.8 | 0.999 | 0.00 | 1993 | 2019 |
rs79891110 | TIMOTHY SYNDROME | 0.8 | 1.000 | 0.00 | 1993 | 2018 |
rs786205745 | TIMOTHY SYNDROME | 0.8 | 1.000 | 0.01 | 1993 | 2004 |
rs786205748 | TIMOTHY SYNDROME | 0.7 | 1.000 | 0.00 | 1993 | 2020 |
rs549476254 | TIMOTHY SYNDROME | 0.7 | 0.999 | 0.00 | 1993 | 2019 |
rs80315385 | TIMOTHY SYNDROME | 0.7 | 1.000 | 0.00 | 1993 | 2015 |
rs797044881 | TIMOTHY SYNDROME | 0.7 | 1.000 | 0.00 | 1993 | 2015 |
rs587782933 | TIMOTHY SYNDROME | 0.7 | 1.000 | 0.00 | 1993 | 1993 |
rs199473391 | TIMOTHY SYNDROME | 0.6 | 1.000 | 0.00 | 1993 | 2019 |
rs761966966 | TIMOTHY SYNDROME | 0.6 | 1.000 | 0.00 | 1993 | 2019 |
Visualizing the variants associated to a single disease
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network
(Figure 31).
The Variant-Disease Network
can be displayed as a Variant-Disease-Gene Network
, by setting the showGenes
parameter to TRUE
(Figure 32).
Explore the evidences associated to a single disease
To explore the evidences supporting the VDAs for Timothy syndrome, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Timothy syndrome.
results <- disease2evidence( disease = "UMLS_C1832916",
type = "VDA",
database = "ALL",
score = c( 0.5,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: ALL
## . Score: 0.5-1
## . Term: UMLS_C1832916
## . Results: 238
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID") %>%
select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear)) %>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Evidences supporting associations")
pmid | associationType | Year | Sentence |
---|---|---|---|
39079396 | GeneticVariation | 2024 | In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R). |
39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
38826393 | GeneticVariation | 2024 | Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms. |
38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
39079396 | GeneticVariation | 2024 | In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R). |
38826393 | GeneticVariation | 2024 | Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms. |
37271119 | GeneticVariation | 2023 | Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS. |
36162529 | GeneticVariation | 2022 | The CaV1.2 G406R mutation decreases synaptic inhibition and alters L-type Ca2+ channel-dependent LTP at hippocampal synapses in a mouse model of Timothy Syndrome. |
If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene
.
results <- disease2evidence( disease = "UMLS_C1832916",
gene = "775", vocabulary = "ENTREZ",
type = "VDA", database = "TEXTMINING_HUMAN",
score = c( 0.7,1 ) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-evidence
## . Database: TEXTMINING_HUMAN
## . Score: 0.7-1
## . Term: UMLS_C1832916
## . Results: 26
results <- results@qresult
results <- results %>% dplyr::filter(reference_type =="PMID")%>%
select(reference, associationType, pmYear, sentence) %>% dplyr::arrange(desc(pmYear))%>% head(10)
results <- results %>% dplyr::rename(Year=pmYear, Sentence = sentence, pmid = reference)
results %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption ="Selection of evidences supporting associations between C0036341 & CACNA1C")
pmid | associationType | Year | Sentence |
---|---|---|---|
39079396 | GeneticVariation | 2024 | In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R). |
39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
39420001 | GeneticVariation | 2024 | The canonical G406R mutation that increases Ca2+ influx through the CACNA1C-encoded CaV1.2 Ca2+ channel underlies the multisystem disorder Timothy syndrome (TS), characterized by life-threatening arrhythmias. |
38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
38826393 | GeneticVariation | 2024 | Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms. |
38968219 | GeneticVariation | 2024 | Long QT Syndrome type 8 (LQT8) is a cardiac arrhythmic disorder associated with Timothy Syndrome, stemming from mutations in the CACNA1C gene, particularly the G406R mutation. |
39079396 | GeneticVariation | 2024 | In this study, we generated a human induced pluripotent stem cell (iPSC) line from a Timothy syndrome infant carrying heterozygous CACNA1C mutation (transcript variant NM_000719.7c.1216G>A: p.G406R). |
38826393 | GeneticVariation | 2024 | Furthermore, it has remained underexplored whether individuals harboring canonical Gly406Arg variants in mutually exclusive exon 8A (Timothy syndrome 1) or exon 8 (Timothy syndrome 2) have additional symptoms. |
37271119 | GeneticVariation | 2023 | Some CACNA1C mutations, such as R858H described here, cause LQTS without the extracardiac manifestations observed in classic Timothy syndrome and should be included in the genetic testing for LQTS. |
36162529 | GeneticVariation | 2022 | The CaV1.2 G406R mutation decreases synaptic inhibition and alters L-type Ca2+ channel-dependent LTP at hippocampal synapses in a mouse model of Timothy Syndrome. |
Searching multiple diseases
results <- disease2variant(
disease = paste0("UMLS_",c("C3150943", "C1859062", "C1832916", "C4015695")),
database = "CURATED",
score = c(0.7, 1) )
results
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-variant
## . Database: CURATED
## . Score: 0.7-1
## . Term: UMLS_C3150943 ... UMLS_C4015695
## . Results: 155
Table 29 shows the variants associated to a list of Long QT syndromes in the curated data in DISGENET.
tab <- unique(results@qresult[ ,c("variantid", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Variants associated to a list of Long QT syndromes")
variantid | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|
rs137854600 | LONG QT SYNDROME 3 | 0.9 | 1993 | 2022 |
rs199473428 | LONG QT SYNDROME 2 | 0.8 | 1993 | 2022 |
rs199473524 | LONG QT SYNDROME 2 | 0.8 | 1993 | 2022 |
rs199472961 | LONG QT SYNDROME 2 | 0.8 | 1993 | 2022 |
rs9333649 | LONG QT SYNDROME 2 | 0.8 | 1993 | 2022 |
rs137854601 | LONG QT SYNDROME 3 | 0.8 | 1993 | 2022 |
rs786205753 | TIMOTHY SYNDROME | 0.8 | 1993 | 2019 |
rs79891110 | TIMOTHY SYNDROME | 0.8 | 1993 | 2018 |
rs786205745 | TIMOTHY SYNDROME | 0.8 | 1993 | 2004 |
rs199473317 | LONG QT SYNDROME 3 | 0.8 |
Visualizing the variants associated to multiple diseases
The results of querying DISGENET variant information with a list of diseases can be visualized as a Variant-Disease Network
, or as a Variant-Disease Heatmap
(Figure 33), by changing the class
argument from “Network” to “Heatmap”.
The results can be visualized as a Heatmap (Figure 34).
Searching by gene
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: CURATED
## . Score: 0-1
## . Term: APP
## . Results: 17
Table 30 shows the top variants associated to the APP gene in the curated data in DISGENET.
tab <- unique(results@qresult[ ,c("variantid", "gene_symbols", "disease_name","score", "yearInitial", "yearFinal")] ) %>% dplyr::arrange(desc(score), yearInitial, desc(yearFinal))
knitr::kable(tab[1:10,], caption = "Top variants associated to APP")
variantid | gene_symbols | disease_name | score | yearInitial | yearFinal |
---|---|---|---|---|---|
rs63750264 | APP | Alzheimer Disease | 0.9 | 1991 | 2020 |
rs63750579 | APP | Alzheimer Disease | 0.8 | 1990 | 2020 |
rs63750066 | APP | Alzheimer Disease | 0.8 | 1992 | 2020 |
rs193922916 | APP | Alzheimer Disease | 0.8 | 1993 | 2020 |
rs63750734 | APP | Alzheimer Disease | 0.8 | 1993 | 2020 |
rs63750579 | APP | CEREBRAL AMYLOID ANGIOPATHY, APP-RELATED | 0.7 | 1990 | 2019 |
rs63750264 | APP | AD1 | 0.7 | 1991 | 2020 |
rs63749964 | APP | AD1 | 0.7 | 1991 | 2020 |
rs63751039 | APP | AD1 | 0.7 | 1992 | 2020 |
rs63750671 | APP | AD1 | 0.7 | 1992 | 2020 |
Visualizing the variant-disease associations retrieved for a gene
The results of querying DISGENET variant information with a gene can be visualized as a Variant-Disease Network
, or as a Variant-Disease Heatmap
(Figure 35), if the input is a list of genes, by changing the class
argument from Network to Heatmap. The genes can be shown by setting the showGenes
argument to “TRUE”.
Searching by variant and chemical
results <- variant2disease( variant = "rs121434568",
database = "TEXTMINING_HUMAN",
chemical = "C2987648")
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-disease
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: rs121434568
## . Results: 13
Table 31 shows the VDAs associated to rs121434568 and afatinib.
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score) %>% dplyr::arrange(desc(score))
knitr::kable(tab[1:10,], caption = "VDAs associated to rs121434568 and afatinib")
variantid | disease_name | chemical_name | score |
---|---|---|---|
rs121434568 | CARCINOMA OF LUNG | afatinib | 0.9 |
rs121434568 | Carcinoma, Non Small Cell Lung | afatinib | 0.9 |
rs121434568 | Lung adenocarcinoma | afatinib | 0.9 |
rs121434568 | Cancer, Lung | afatinib | 0.3 |
rs121434568 | Lung Neoplasm | afatinib | 0.3 |
rs121434568 | Advanced Lung Adenocarcinoma | afatinib | 0.3 |
rs121434568 | Metastatic Neoplasm to the Brain | afatinib | 0.3 |
rs121434568 | Stage IV Non-Oat Cell Carcinoma of the Lung | afatinib | 0.2 |
rs121434568 | Metastatic Neoplasm to the Leptomeninges | afatinib | 0.2 |
rs121434568 | Metastatic Lung Adenocarcinoma | afatinib | 0.2 |
To visualize the results use the plot function.
Retrieving the chemicals associated to a variant
The variant2chemical function allows to retrieve the chemicals associated to a variant
results <- variant2chemical( variant = "rs1801133",
database = "TEXTMINING_HUMAN" , score = c(0.8,1))
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: variant-chemical
## . Database: TEXTMINING_HUMAN
## . Score: 0.8-1
## . Term: rs1801133
## . Results: 5
tab <- results@qresult
tab <-tab%>% dplyr::select( disease_name, chemical_name, chemical_effect, sentence, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
`Chemical Effect`=chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference) %>% dplyr::arrange(desc(Year))
tab %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Chemicals associated to rs1801133" )
Disease | Chemical | Chemical Effect | Sentence | pmid | Year |
---|---|---|---|---|---|
Multiple Sclerosis | vitamin B12 | other|other|other | The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. | 38648773 | 2024 |
Multiple Sclerosis | pyridoxine | other|other|other | The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. | 38648773 | 2024 |
Multiple Sclerosis | vitamin B6 | other|other|other | The contents of homocysteine (HCy), cyanocobalamin (vitamin B12), folic acid (vitamin B9), and pyridoxine (vitamin B6) were analyzed and the genotypes of the main gene polymorphisms associated with folate metabolism (C677T and A1298C of the MTHFR gene, A2756G of the MTR gene and A66G of the MTRR gene) were determined in children at the onset of multiple sclerosis (MS) (with disease duration of no more than six months), healthy children under 18 years (control group), healthy adults without neurological pathology, adult patients with MS at the onset of disease, and adult patients with long-term MS. | 38648773 | 2024 |
Schizophrenias | risperidone | therapeutic | C677T Polymorphism in the MTHFR Gene Is Associated With Risperidone-Induced Weight Gain in Schizophrenia. | 32714219 | 2020 |
Schizophrenias | dopamine | other | A second polymorphism, methylenetetrahydrofolate reductase (MTHFR) 677C –> T (rs1801133), has been associated with overall schizophrenia risk and executive function impairment in patients, and may influence dopamine signaling through mechanisms upstream of COMT effects. | 18988738 | 2008 |
To visualize the results use the plot function.
Retrieving associations involving Chemicals from DISGENET
Retrieving genes, variants, and diseases associated to chemicals
The chemical2gene function allows to retrieve the GDAS for a specific chemical, or list of chemicals.
## Notice that your query has a maximum of 17 pages.
## By indicating n_pags = 5, your query of 17 pages has been reduced to 5 pages.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gene
## . Database: ALL
## . Score: 0-1
## . Term: C0023570
## . Results: 91
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol,gene_type , chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Genes associated to levodopa")
gene_symbol | gene_type | chemical_name | pmids_chemical |
---|---|---|---|
COMT | protein-coding | levodopa | 45 |
DDC | protein-coding | levodopa | 31 |
GCH1 | protein-coding | levodopa | 20 |
SLC6A3 | protein-coding | levodopa | 20 |
GH1 | protein-coding | levodopa | 18 |
MAOB | protein-coding | levodopa | 18 |
DRD2 | protein-coding | levodopa | 17 |
PRKN | protein-coding | levodopa | 15 |
TH | protein-coding | levodopa | 13 |
SNCA | protein-coding | levodopa | 12 |
The results can be visualized as a Chemical-Gene Network
(Figure 38).
The chemical2disease function allows to retrieve the diseases for a specific chemical, or list of chemicals, and the information cab be extracted from GDAs or VDAs. To specify from where, use the type parameter.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-disease
## . Database: CURATED
## . Score: 0-1
## . Term: C0023570
## . Results: 45
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "Diseases associated to levodopa, type GDA", align= "lllc")
diseaseid | disease_name | chemical_name | pmids_chemical |
---|---|---|---|
C0013386 | Drug-Induced Dyskinesia | levodopa | 12 |
C0013386 | Drug-Induced Dyskinesia | levodopa | 12 |
C0268467 | GTP cyclohydrolase I deficiency (disorder) | levodopa | 7 |
C0268467 | GTP cyclohydrolase I deficiency (disorder) | levodopa | 7 |
C1851920 | DRD | levodopa | 6 |
C0013421 | Dystonia | levodopa | 3 |
C0013384 | Dyskinesia | levodopa | 2 |
C0026650 | Movement Disorders | levodopa | 2 |
C0030567 | Parkinson Disease | levodopa | 2 |
C0393593 | Dystonia | levodopa | 2 |
A DiseaseClass
plot is also available.
For VDAs
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-disease
## . Database: ALL
## . Score: 0-1
## . Term: C0165032
## . Results: 5
tab <- results@qresult
tab <-tab%>% dplyr::select(diseaseid, disease_name, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab, caption = "Diseases associated to imiquimod, type VDA", align= "lllc")
diseaseid | disease_name | chemical_name | pmids_chemical |
---|---|---|---|
C4721806 | Basal cell carcinoma | imiquimod | 2 |
C0025202 | Melanoma | imiquimod | 1 |
C0151779 | Malignant melanoma of skin | imiquimod | 1 |
C0524910 | Chronic viral hepatitis C | imiquimod | 1 |
C0596263 | carcinogenesis | imiquimod | 1 |
The chemical2variant function allows to retrieve the variants for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-variant
## . Database: ALL
## . Score: 0-1
## . Term: C0006949
## . Results: 53
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, most_severe_consequence, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc")
variantid | gene_symbols | most_severe_consequence | chemical_name | pmids_chemical |
---|---|---|---|---|
rs3812718 | SCN1A | splice_donor_5th_base_variant | carbamazepine | 9 |
rs1061235 | HLA-A , LOC124901298 | 3_prime_UTR_variant | carbamazepine | 6 |
rs776746 | ZSCAN25, CYP3A5 | splice_acceptor_variant | carbamazepine | 6 |
rs1045642 | ABCB1 | missense_variant | carbamazepine | 5 |
rs1801133 | MTHFR | missense_variant | carbamazepine | 4 |
rs2298771 | SCN1A , LOC102724058 | missense_variant | carbamazepine | 4 |
rs2032582 | ABCB1 | missense_variant | carbamazepine | 3 |
rs1051740 | EPHX1 | missense_variant | carbamazepine | 2 |
rs1057910 | CYP2C9 | missense_variant | carbamazepine | 2 |
rs1389503611 | EPHX1 | missense_variant | carbamazepine | 2 |
The chemical2variant function can also receive as a parameter sift
and polyphen
scores to restrict the results to variants predicted as probably deleterious.
results <- chemical2variant( chemical = "C0006949", database = "ALL", sift = c(0,0.05), polyphen = c(0.9,1) )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-variant
## . Database: ALL
## . Score: 0-1
## . Term: C0006949
## . Results: 14
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, gene_symbols, sift_score, polyphen_score, chemical_name, pmids_chemical) %>% dplyr::arrange(desc(pmids_chemical))
knitr::kable(tab[1:10,], caption = "VDAs for carbamazepine", align= "llllc")
variantid | gene_symbols | sift_score | polyphen_score | chemical_name | pmids_chemical |
---|---|---|---|---|---|
rs1045642 | ABCB1 | 0.02 | 0.998 | carbamazepine | 5 |
rs1051740 | EPHX1 | 0.00 | 0.987 | carbamazepine | 2 |
rs1389503611 | EPHX1 | 0.01 | 0.995 | carbamazepine | 2 |
rs762468188 | TMEM63A, EPHX1 | 0.00 | 1.000 | carbamazepine | 2 |
rs118192218 | KCNQ2 , LOC105372721 | 0.01 | 0.999 | carbamazepine | 1 |
rs121912438 | SOD1 | 0.00 | 0.967 | carbamazepine | 1 |
rs140908982 | GRIA3 | 0.00 | 0.996 | carbamazepine | 1 |
rs1553491169 | SCN1A-AS1, SCN9A | 0.00 | 0.956 | carbamazepine | 1 |
rs1555085798 | KCNA1 | 0.00 | 1.000 | carbamazepine | 1 |
rs201682634 | ABCC8 , LOC124902641 | 0.00 | 1.000 | carbamazepine | 1 |
Retrieving GDAs and VDAs associated to chemicals
Exploring the GDAs of a chemical
The chemical2gda function allows to retrieve the GDAS for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: ALL
## . Score: 0-1
## . Term: C0074393
## . Results: 151
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, score, pmids_chemical)
knitr::kable(tab[1:10,], caption = "GDAs for sertraline")
gene_symbol | disease_name | chemical_name | score | pmids_chemical |
---|---|---|---|---|
BDNF | Chorea, Huntington | sertraline | 1.00 | 9 |
NR3C1 | Depressive neurosis | sertraline | 1.00 | 18 |
HTT | Chorea, Huntington | sertraline | 1.00 | 44 |
ZBTB20 | Primrose syndrome | sertraline | 1.00 | 2 |
BDNF | Depression | sertraline | 1.00 | 98 |
SLC6A4 | Depression | sertraline | 1.00 | 68 |
SLC6A4 | Depressive neurosis | sertraline | 1.00 | 73 |
IL6 | Depression | sertraline | 1.00 | 30 |
BCHE | Alzheimer Disease | sertraline | 1.00 | 158 |
IL6 | Depressive neurosis | sertraline | 0.95 | 30 |
To visualize the results use the plot function.
Exploring the VDAs of a chemical
The chemical2vda function allows to retrieve the VDAS for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: CURATED
## . Score: 0-1
## . Term: C3264621
## . Results: 209
The chemical2vda function can also receive as a parameter sift
and polyphen
scores to restrict the results to variants predicted as probably deleterious.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: CURATED
## . Score: 0-1
## . Term: C3264621
## . Results: 96
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, score,pmids_chemical)
knitr::kable(tab[1:10,], caption = "VDAs associated ivacaftor")
variantid | disease_name | chemical_name | score | pmids_chemical |
---|---|---|---|---|
rs75527207 | Cystic Fibrosis | ivacaftor | 1.0 | 3 |
rs78655421 | Cystic Fibrosis | ivacaftor | 1.0 | 2 |
rs80034486 | Cystic Fibrosis | ivacaftor | 1.0 | 1 |
rs121908758 | Cystic Fibrosis | ivacaftor | 0.9 | 1 |
rs368505753 | Cystic Fibrosis | ivacaftor | 0.9 | 1 |
rs77834169 | Cystic Fibrosis | ivacaftor | 0.9 | 2 |
rs121909047 | Cystic Fibrosis | ivacaftor | 0.9 | 2 |
rs75961395 | Cystic Fibrosis | ivacaftor | 0.9 | 2 |
rs121908752 | Cystic Fibrosis | ivacaftor | 0.9 | 1 |
rs77010898 | Cystic Fibrosis | ivacaftor | 0.9 | 1 |
To visualize the results use the plot function.
Exploring the GDA evidences of a chemical
The chemical2evidence function allows to retrieve the evidences for the GDAS or VDAs for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-gda
## . Database: CURATED
## . Score: 0-1
## . Term: C0023570
## . Results: 112
tab <- results@qresult
tab <-tab%>% dplyr::select(gene_symbol, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename(Gene = gene_symbol, Disease = disease_name, Chemical = chemical_name, `Chemical Effect` =chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference)
tab <- tab[ order(-tab$Year),]
tab[1:10, ] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences for levodopa" )
Gene | Disease | Chemical | Sentence | Chemical Effect | pmid | Year |
---|---|---|---|---|---|---|
PNPLA6 | SPASTIC PARAPLEGIA 39, AUTOSOMAL RECESSIVE | levodopa | PNPLA6-Related Disorder with Levodopa-Responsive Parkinsonism. | other | 36825042 | 2023 |
PNPLA6 | SPASTIC PARAPLEGIA 39, AUTOSOMAL RECESSIVE | levodopa | PNPLA6-Related Disorder with Levodopa-Responsive Parkinsonism. | other | 36825042 | 2023 |
CLN6 | CEROID LIPOFUSCINOSIS, NEURONAL, 6B (KUFS TYPE) | levodopa | Pearls & Oy-sters: Levodopa-Responsive Adult NCL (Type B Kufs Disease) Due to CLN6 Mutation. | other | 33875558 | 2021 |
GCH1 | DRD | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
GCH1 | DRD | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
GCH1 | GTP cyclohydrolase I deficiency (disorder) | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
GCH1 | GTP cyclohydrolase I deficiency (disorder) | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
LOC130055692 | DRD | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
LOC130055692 | GTP cyclohydrolase I deficiency (disorder) | levodopa | Residual signs of dopa-responsive dystonia with GCH1 mutation following levodopa treatment are uncommon in Korean patients. | other | 31213404 | 2019 |
SPG7 | SPASTIC PARAPLEGIA 7, AUTOSOMAL RECESSIVE | levodopa | SPG7 with parkinsonism responsive to levodopa and dopaminergic deficit. | other | 29246844 | 2018 |
To visualize the results use the plot function.
Exploring the VDA evidences of a chemical
results <- chemical2evidence( chemical = "C0042291", type = "VDA" , database = "TEXTMINING_HUMAN" )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical-vda
## . Database: TEXTMINING_HUMAN
## . Score: 0-1
## . Term: C0042291
## . Results: 220
tab <- results@qresult
tab <-tab%>% dplyr::select(variantid, disease_name, chemical_name, sentence,chemical_effect, reference, pmYear)
tab <- tab %>% dplyr::rename( Disease = disease_name, Chemical = chemical_name,
`Chemical Effect` =chemical_effect, Year=pmYear, Sentence = sentence, pmid = reference )
tab <- tab[ order(-tab$Year),]
tab[1:10,] %>% dplyr::mutate(
pmid = kableExtra::cell_spec(pmid, link = paste0("https://pubmed.ncbi.nlm.nih.gov/", pmid) )) %>%
knitr::kable(format = 'markdown', row.names = F, caption = "Evidences for valproic acid" )
variantid | Disease | Chemical | Sentence | Chemical Effect | pmid | Year |
---|---|---|---|---|---|---|
rs3812718 | Epilepsies | valproic acid | Five single nucleotide polymorphisms (SNPs), including SCN1A (rs10188577, rs2298771, rs3812718) and SCN2A (rs2304016, rs17183814), were genotyped in 233 epilepsy patients undergoing VPA therapy. | therapeutic | 38837984 | 2024 |
rs776746 | Epilepsies | valproic acid | Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. | therapeutic|therapeutic | 38381330 | 2024 |
rs2298771 | Epilepsies | valproic acid | Our study suggests the findings of this investigation indicate that the polymorphisms SCN1A rs2298771 and SCN2A rs17183814 could potentially act as predictive biomarkers for the responsiveness to VPA among Chinese epilepsy patients. | therapeutic | 38837984 | 2024 |
rs55965422 | Epilepsies | valproic acid | Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. | therapeutic|therapeutic | 38381330 | 2024 |
rs2304016 | Epilepsies | valproic acid | Five single nucleotide polymorphisms (SNPs), including SCN1A (rs10188577, rs2298771, rs3812718) and SCN2A (rs2304016, rs17183814), were genotyped in 233 epilepsy patients undergoing VPA therapy. | therapeutic | 38837984 | 2024 |
rs1401813450 | Epilepsies | valproic acid | Patients with epilepsy carrying the UGT1A6 A541G mutant genotype may have VPA-induced tremors, and early detection of this genotype will help guide the clinical individualizsation of VPA treatment. | therapeutic | 38908142 | 2024 |
rs1458644938 | Epilepsies | valproic acid | Patients with epilepsy carrying the UGT1A6 A541G mutant genotype may have VPA-induced tremors, and early detection of this genotype will help guide the clinical individualizsation of VPA treatment. | therapeutic | 38908142 | 2024 |
rs28383479 | Epilepsies | valproic acid | Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. | therapeutic|therapeutic | 38381330 | 2024 |
rs56411402 | Epilepsies | valproic acid | Age younger than 4 years, comedication with enzyme inducers or valproic acid, and possession of the CYP3A5*3 genotype potentially predicted PER exposure in pediatric patients with epilepsy. | therapeutic|therapeutic | 38381330 | 2024 |
rs17183814 | Epilepsies | valproic acid | Our study suggests the findings of this investigation indicate that the polymorphisms SCN1A rs2298771 and SCN2A rs17183814 could potentially act as predictive biomarkers for the responsiveness to VPA among Chinese epilepsy patients. | therapeutic | 38837984 | 2024 |
To visualize the results use the plot function.
Exploring the attributes of a chemical
The chemical2attribute function allows to retrieve the information for a specific chemical, or list of chemicals.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: chemical
## . Database: ALL
## . Score:
## . Term: C0023570
## . Results: 1
chemicalid | chemical_name | numPmids | numGDAs | numVDAs |
---|---|---|---|---|
C0023570 | levodopa | 690 | 1042 | 174 |
Retrieving Disease-Disease Associations from DISGENET
The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).
Searching DDAs by genes for a single disease
To obtain disease-disease associations, use the disease2disease function. This function uses as input a disease, in the same format that in disease2gene, the database to perform the search (by default, CURATED), and the argument relationship
, to indicate the type of relationship of the disease pair. If the relationship
is set to “has_shared_genes”, arguments such as min_genes
, the minimum number of shared genes between the disease(s) of interest, and jg
, the Jaccard Index for genes, can be defined. By default min_genes = 0
. If the relationship
is set to “has_shared_variants”, similar arguments to filter the results of the search can be defined.
The output is a DataGeNET.DGN
object that contains the top diseases that share genes with the disease that has been searched.
The DataGeNET.DGN
object produced by the disease2disease function also contains the Jaccard Index, also known as the Jaccard similarity coefficient for each disease pair. The Jaccard Coefficient is a similarity metric, computed as the size of the intersection divided by the size of the union of two sample sets, in this case, the genes associates to each disease:
We calculate a p value to estimate the significance of the Jaccard coefficient for a list of disease pairs. The p value is estimated using a Fisher exact test. The pvalue
column displays the minus logarithm of the p value for the Jaccard Index, and is available for disease-disease associations by shared genes and by shared variants.
results <- disease2disease(
disease = "UMLS_C0010674", relationship = "has_shared_genes",
database = "CURATED" , min_genes =2 )
results
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-disease-gene
## . Database: CURATED
## . Score:
## . Term: UMLS_C0010674
## . Results: 11
Table 43 shows the diseases that share at least a gene with Cystic Fibrosis (UMLS_C0010674) in DISGENET curated.
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share genes with Cystic Fibrosis")
disease1_Name | disease2_Name | jaccard_genes | shared_genes | pvalue_jaccard_genes |
---|---|---|---|---|
Cystic Fibrosis | COPD | 0.11724 | 17 | 22.4 |
Cystic Fibrosis | BESC1 | 0.13793 | 8 | 19.2 |
Cystic Fibrosis | SYSTEMIC LUPUS ERYTHEMATOSIS | 0.08589 | 14 | 16.3 |
Cystic Fibrosis | CBAVD | 0.11864 | 7 | 15.8 |
Cystic Fibrosis | Hereditary pancreatitis | 0.12308 | 8 | 15.4 |
Cystic Fibrosis | High blood pressure | 0.04971 | 17 | 14.4 |
Cystic Fibrosis | Alzheimer Disease | 0.05534 | 14 | 12.8 |
Cystic Fibrosis | Adult-Onset Diabetes Mellitus | 0.04043 | 15 | 11.3 |
Cystic Fibrosis | Obstructive azoospermia | 0.05085 | 3 | 6.5 |
Cystic Fibrosis | Cardiomyopathy | 0.02952 | 8 | 5.4 |
Visualizing the diseases associated to a single disease
The plot function applied to the DataGeNET.DGN
object generated by the disease2disease function results in a Disease-Disease Network
, where the node in dark blue is the disease of interest and nodes in light blue are the diseases that share genes with it (Figure 47). The node size is proportional to the number of genes associated to each disease.
Searching DDAs via genes for multiple diseases
The function disease2disease can also use as an input a list of diseases in any of the previously described vocabularies. It will retrieve the top diseases that share genes with each of the diseases in the input list.
Table 44 shows the disease list selected for illustrating the disease2disease function
UMLS_CUI | Disease_Name |
---|---|
C0162671 | MELAS Syndrome |
C0023264 | Leigh Disease |
C0917796 | Optic Atrophy, Hereditary, Leber |
diseasesOfInterest <- paste0("UMLS_", c("C0162671", "C0023264", "C0917796"))
results <- disease2disease(
disease = diseasesOfInterest, relationship = "has_shared_genes",
database = "CURATED",
min_genes = 20,
order_by = "JACCARD_GENES" )
results
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-disease-gene
## . Database: CURATED
## . Score:
## . Term: UMLS_C0162671 ... UMLS_C0917796
## . Results: 35
Table 45 shows the diseases that share at least 20 genes with the diseases of interest.
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","jaccard_genes","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab[1:10,], caption = "Diseases that share at list 20 genes with the diseases of interest")
disease1_Name | disease2_Name | jaccard_genes | shared_genes | pvalue_jaccard_genes |
---|---|---|---|---|
Leber’s optic atrophy | MELAS Syndrome | 0.62963 | 34 | 84 |
MELAS Syndrome | Leber’s optic atrophy | 0.62963 | 34 | 84 |
Encephalomyelopathies, Subacute Necrotizing | Mitochondrial Diseases | 0.23741 | 66 | 83 |
MELAS Syndrome | Mitochondrial Diseases | 0.20652 | 38 | 69 |
Leber’s optic atrophy | MC5DM1 | 0.55319 | 26 | 68 |
Leber’s optic atrophy | NEUROPATHY, ATAXIA, AND RETINITIS PIGMENTOSA | 0.55319 | 26 | 68 |
Leber’s optic atrophy | Camptodactyly of proximal interphalangeal joint | 0.54167 | 26 | 66 |
Leber’s optic atrophy | Wide spaced nipples (finding) | 0.50980 | 26 | 63 |
Leber’s optic atrophy | Scrotal hypoplasia | 0.50980 | 26 | 63 |
Leber’s optic atrophy | postaxial polydactyly hands (physical finding) | 0.50000 | 26 | 63 |
To obtain the network, set the class
argument of the plot function to Network
(Figure 48). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.
Searching DDAs via semantic relationships
To obtain disease-disease associations via semantic relationships, use the disease2disease function with the argument relationship equal to one of the following types of semantic relations: has_manifestation, has_associated_morphology, manifestation_of, associated_morphology_of, is_finding_of_disease, due_to, has_definitional_manifestation, has_associated_finding, definitional_manifestation_of, disease_has_finding, cause_of, associated_finding_of.
The output is a DataGeNET.DGN
object that contains the diseases that have the type of relationship defined in the query with the query disease.
results <- disease2disease(
disease = c("UMLS_C0011860", "UMLS_C0028754"),relationship = "has_manifestation", min_sokal = 0.7, order_by = "SOKAL",
database = "CURATED" )
results
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-disease-rela
## . Database: CURATED
## . Score:
## . Term: UMLS_C0011860 ... UMLS_C0028754
## . Results: 20
Table 47 shows the diseases associated with Obesity and Diabetes Mellitus non Insulin dependent (NIDDM) by the relation type “has_manifestation”.
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","ddaRelation","shared_genes", "pvalue_jaccard_genes")] )
knitr::kable(tab , caption = "Diseases associated with Obesity and NIDDM")
disease1_Name | disease2_Name | ddaRelation | shared_genes | pvalue_jaccard_genes |
---|---|---|---|---|
Adult-Onset Diabetes Mellitus | KERATODERMA-ICHTHYOSIS-DEAFNESS SYNDROME, AUTOSOMAL RECESSIVE | has_manifestation | 2 | 2.77 |
Obesity | OBESITY, HYPERPHAGIA, AND DEVELOPMENTAL DELAY | has_manifestation | 1 | 1.66 |
Obesity | PHP1C | has_manifestation | 1 | 1.66 |
Obesity | BARDET-BIEDL SYNDROME 18 | has_manifestation | 1 | 1.66 |
Obesity | Bardet-Biedl syndrome 4 | has_manifestation | 1 | 1.66 |
Obesity | SBIDDS | has_manifestation | 1 | 1.66 |
Obesity | Pseudo Pseudohypoparathyroidism | has_manifestation | 1 | 1.66 |
Obesity | CHOPS SYNDROME | has_manifestation | 1 | 1.66 |
Adult-Onset Diabetes Mellitus | MODY, TYPE 13 | has_manifestation | 1 | 1.62 |
Obesity | BBS1 | has_manifestation | 2 | 1.44 |
Obesity | PSEUDOHYPOPARATHYROIDISM, TYPE IA | has_manifestation | 1 | 1.36 |
Obesity | PWLS | has_manifestation | 1 | 1.36 |
Obesity | HYPOGONADOTROPIC HYPOGONADISM 27 WITHOUT ANOSMIA | has_manifestation | 1 | 1.36 |
Obesity | CORTRD2 | has_manifestation | 1 | 1.36 |
Adult-Onset Diabetes Mellitus | IDDHH | has_manifestation | 1 | 1.32 |
Obesity | BARDET-BIEDL SYNDROME 6 | has_manifestation | 1 | 1.19 |
Obesity | Bardet-Biedl syndrome 2 | has_manifestation | 1 | 1.19 |
Obesity | WAGR Syndrome | has_manifestation | 1 | 0.90 |
Obesity | 9q- Syndrome | has_manifestation | 1 | 0.84 |
Obesity | DiGeorge’s syndrome | has_manifestation | 1 | 0.46 |
Searching diseases similar to a disease of interest
It is possible to obtain the most similar diseases according to the Sokal-Sneath semantic similarity distance using the the get_similar_diseases function. The disease similarity between concepts is computed using the Sokal-Sneath semantic similarity distance (Sánchez and Batet 2011) on the taxonomic relations provided by the Unified Medical Language System Metathesaurus. Only the relationships of type is-a (which describe the taxonomy in any ontology) are taken into account. The get_similar_diseases function uses as input a disease, and as an optional argument min_sokal
, a minimum value for the Sokal distance. By default min_sokal = 0.1
.
## Object of class 'DataGeNET.DGN'
## . Search: single
## . Type: disease-disease-sokal
## . Database: ALL
## . Score:
## . Term: UMLS_C0011860
## . Results: 143
In the Table 48, the top diseases associated to the disease, by Sokal distance
tab <- unique(results@qresult[ ,c("disease1_Name", "disease2_Name","sokal")] )
knitr::kable(tab[1:10,], caption = "Diseases semantically similar to NIDDM")
disease1_Name | disease2_Name | sokal |
---|---|---|
Adult-Onset Diabetes Mellitus | Diabetes Mellitus | 0.830 |
Adult-Onset Diabetes Mellitus | Glucose Intolerance | 0.821 |
Adult-Onset Diabetes Mellitus | Diabetes Mellitus, Insulin-Dependent | 0.706 |
Adult-Onset Diabetes Mellitus | Hyperglycemia | 0.695 |
Adult-Onset Diabetes Mellitus | Diabetic Retinopathies | 0.687 |
Adult-Onset Diabetes Mellitus | Diabetic Nephropathies | 0.685 |
Adult-Onset Diabetes Mellitus | Diabetes, Gestational | 0.684 |
Adult-Onset Diabetes Mellitus | Syndrome X, Reaven | 0.677 |
Adult-Onset Diabetes Mellitus | Prediabetic State | 0.677 |
Adult-Onset Diabetes Mellitus | Insulin Resistance | 0.668 |
Disease enrichment
The disease_enrichment function performs a disease enrichment (or over-representation) analysis. It determines whether a user-defined set of genes is statistically significantly associated with a disease gene set in DISGENET.
The function takes as input a list of entities, either genes or variants. They are compared against the gene/variant-disease associations in the selected database (by default, ALL
) to determine the diseases associated with the given gene list. The genes can be identified with HGNC, ENSEMBL or Entrez identifiers.
The database
parameter allows users to choose which data source to use: CURATED
for curated gene-disease associations (the default option), CLINICALTRIALS
for associations extracted from ClinicalTrials.gov, or ALL
to include all available databases. The number of genes on the selected data source is used as background or universe of the over-representation test.
The common_entities
parameter sets the minimum number of entities that must be shared with a disease for it to be considered in the analysis; the default is 1
. The max_pvalue
parameter sets a threshold for the p-value from the Fisher test (default is 0.05
).
For genes
Below, an example of how to perform a disease enrichment with a list of genes extracted associated to Autism from the Developmental Brain Disorder Gene Database (Gonzalez-Mantilla et al. 2016).
genes <- c("ADNP", "ANKRD11", "ANKRD17", "ASXL1", "BCKDK", "BRSK2", "CDK13", "CDK8", "CHD2", "CHD7", "CHD8", "CLCN2", "CREBBP", "CSDE1", "CTCF", "CTNNB1", "DDX3X", "FOXP1", "GFER", "H4C3", "HNRNPUL2", "IQSEC2", "ITSN1", "JARID2", "LRP2", "MARK2", "MBOAT7", "MYT1L", "NAA15", "NALCN", "NAV3", "NEXMIF" , "NSD1", "PHF21A", "POGZ", "PRR12", "QRICH1", "SCAF1", "SCN1A", "SCN2A", "SETD5", "SHANK3", "SIN3A", "SOX11", "SOX6", "TANC2", "TBCD", "TCF20" , "TCF4", "TCF7L2", "TRAF7", "TRIP12", "WAC", "WDR26", "ZEB2", "ZMYM2", "ZNF292", "ZSWIM6" )
results <- disease_enrichment(
entities = genes,
common_entities = 5,
vocabulary = "HGNC", database = "CURATED")
## Your query has 1 page.
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-enrichment
## . Database: CURATED
## . Score:
## . Term: ADNP ... ZSWIM6
In the Table 49, the top diseases associated to the list of genes.
tab <- unique(results@qresult[ ,c("diseaseName", "geneRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of genes")
diseaseName | geneRatio | bgRatio | pvalue |
---|---|---|---|
Mental retardation, nonspecific | 44/58 | 44/13460 | 0 |
Neurodevelopmental Disorder | 36/58 | 36/13460 | 0 |
Neurodevelopmental delay | 24/58 | 24/13460 | 0 |
Non-specific syndromic intellectual disability | 18/58 | 18/13460 | 0 |
Childhood autism | 23/58 | 23/13460 | 0 |
Seizure | 22/58 | 22/13460 | 0 |
AUTISM SPECTRUM DISORDER | 22/58 | 22/13460 | 0 |
Child Development Disorder | 14/58 | 14/13460 | 0 |
Global developmental delay | 19/58 | 19/13460 | 0 |
Rare genetic intellectual disability | 8/58 | 8/13460 | 0 |
To visualize the results of the enrichment, use the function plot. Use the argument cutoff
to set a minimum p value threshold, and the argument limit
to reduce the number of records shown (Figure 50). By default, the limit=50
. The node size is proportional to the number of intersection between the user list and the disease.
For variants
Below, an example of how to perform a disease enrichment with a list of variants extracted from the publication Genomic Landscape and Mutational Signatures of Deafness-Associated Genes (Azaiez et al. 2018).
results <- disease_enrichment(
entities = c("rs80338902","rs397516871","rs368341987","rs375050157","rs111033280","rs140884994","rs201076440","rs111033439","rs1296612982","rs41281314","rs397516875","rs143282422","rs142381713","rs35818432","rs111033225","rs200104362","rs201004645","rs34988750","rs373169422","rs397517356","rs188376296","rs199897298","rs200263980","rs200416912","rs184866544","rs397517344","rs41281310","rs727503066","rs727504710","rs143240767","rs145771342","rs376898963","rs397516878","rs181255269","rs188498736","rs111033192","rs117966637","rs914189193","rs181611778","rs111033194","rs111033248","rs111033262","rs111033333","rs111033529","rs146824138","rs483353055","rs528089082","rs747131589","rs111033536","rs45629132","rs371142158","rs727504654","rs192524347","rs527236122","rs111033186","rs111033287","rs139889944","rs200454015","rs397517328","rs111033275","rs150822759","rs200038092","rs201709513","rs370155266","rs45500891","rs111033196","rs111033360","rs397517322","rs111033524","rs727505166","rs79444516","rs35730265","rs45549044","rs111033361","rs370696868","rs727504309","rs533231493"),
vocabulary = "DBSNP", database = "CURATED",)
## Your query has 1 page.
## Object of class 'DataGeNET.DGN'
## . Search: list
## . Type: disease-enrichment
## . Database: CURATED
## . Score:
## . Term: rs80338902 ... rs533231493
In the Table 50, the top diseases associated to the list of variants
tab <- unique(results@qresult[ ,c("diseaseName", "variantRatio", "bgRatio","pvalue")] )
knitr::kable(tab[1:10,], caption = "Diseases significantly associated with the list of variants")
diseaseName | variantRatio | bgRatio | pvalue |
---|---|---|---|
USH2A | 28/77 | 28/687727 | 0 |
USH1A, FORMERLY | 26/77 | 26/687727 | 0 |
RETINITIS PIGMENTOSA 39 | 21/77 | 21/687727 | 0 |
DFNB1A | 15/77 | 15/687727 | 0 |
USHER SYNDROME, TYPE ID | 12/77 | 12/687727 | 0 |
DFNB2 | 12/77 | 12/687727 | 0 |
DFNA3A | 8/77 | 8/687727 | 0 |
DFNB12 | 10/77 | 10/687727 | 0 |
Usher syndrome | 9/77 | 9/687727 | 0 |
Senter syndrome | 6/77 | 6/687727 | 0 |
Figure 51 shows the results of the enrichment.
Versions
Get DISGENET data version
## [1] "{ status : OK , payload :{ apiVersion : 1.7.0 , dataVersion : DISGENET v24.4 , lastUpdate : 4 Dec 2024 , version : DISGENET v24.4 }, httpStatus :200}"
disgenet2r version
## Version: 1.2.2
COPYRIGHT
©2024 MedBioinformatics Solutions SL
License
disgenet2r is distributed under the GPL-2 license.