Meet TxGNN: A New Model that Utilizes Geometric Deep Learning and Human-Centered AI to Make Zero-Shot Predictions of Therapeutic Use Across a Vast Range of 17,080 Diseases

There is an urgent need to create therapeutics to meet the healthcare needs of billions of people worldwide. Yet, only a small fraction of clinically recognized illnesses currently have authorized treatments. Alterations to gene function and the molecules they make are common causes of disease. Drugs that may restore normal molecular activities are a potential defense against these illnesses. Unfortunately, therapeutic approaches to restore the biological activities of damaged genes are still difficult to achieve for many disorders. In addition, most illnesses are caused by changes in many genes, and individuals might have widely varying mutation patterns even within a single gene. Interactomes, or networks of genes that engage in disease-associated processes and activities, are a great tool to explain these genetic events. To decipher genetic architecture disrupted in illness and aid in creating medicines to target it, machine learning has been used to analyze high-throughput molecular interactomes and electronic medical record data.

New drug development is challenging, particularly for illnesses with few treatment choices, but it can replace inefficient medications with safer, more effective ones. The FDA authorizes treatments for just 500 of the hundreds of human illnesses. Just 1,363 of the 17,080 clinically recognized disorders included in the analysis had pharmaceuticals specifically prescribed for them; of these, 435 had only one prescription, 182 had two, and 128 had three. Finding novel medications is therapeutically significant, even for illnesses with therapies. It provides more therapy alternatives with fewer adverse effects and replaces unsuccessful drugs in certain patient populations.

TXGNN, a geometric deep learning technique for therapeutic usage prediction, is introduced by researchers interested in illnesses for which there needs to be more knowledgeable about their molecular causes and potential treatments. TXGNN is taught using a therapeutics-focused graph that is layered with disease-perturbed networks that are currently being treated. This knowledge graph integrates and compiles decades of biological study on 17,080 common and uncommon illnesses. It is optimized to mirror the geometry of TXGNN’s therapeutics-centered graph. A graph neural network model integrates therapeutic candidates and illnesses into a latent representation space. TXGNN employs a metric learning module that works in the latent representation space and may transfer TXGNN’s model from illnesses seen during training to neglected diseases to circumvent the restriction of supervised deep learning in predicting therapeutic usage for neglected diseases.

TxGNN is a graph neural network pre-trained on a knowledge graph including 17,080 clinically-recognized disorders and 7,957 treatment candidates. It can perform different therapeutic tasks in a unified formulation. Zero-shot inference on untrained illnesses is possible with TxGNN since it does not need fine-tuning of ground-truth labels or extra parameters after training. Compared to state-of-the-art approaches, TxGNN significantly outperforms the competition, with an increase in accuracy of up to 49.2 percent for indication tasks and 35.1 percent for contraindication tasks.

Experimental Design and Methodology – Partitioning Datasets for Comprehensive Performance Evaluation

  • Disease area splits:

Many illnesses have therapeutic potential but no effective therapies and little to no biological understanding. TXGNN’s potential for predicting drug-disease connections in such cases is tested by simulating well-studied illnesses as though they were molecularly uncharacterized using data divides developed by the study team.

First, the group’s illnesses and associated drug-disease edges are copied to the test set. This means that during training, TXGNN is blind to the existence of edges representing current indications and contraindications for the selected illness category. This mimics the difficulty of treating disorders with unknown underlying biological mechanisms.

  • Systematic dataset splits:

Predicting untreatable illnesses should strongly suit the machine learning model being implemented. It’s far simpler to foresee potential therapies for illnesses that currently have treatments in place than it is for those that don’t. The researchers devised this divide to rigorously investigate the model’s ability to forecast previously undiscovered illnesses. Researchers began by dividing all illnesses at random. When no therapies are recognized during training, and the testing set comprises unique illnesses, researchers transfer all drug-disease relations associated with the test set to the test set. Over one hundred unique illnesses are included in each iteration of the testing set.

  • Disease-centric dataset splits:

The researchers use a disease-centered assessment to model how medication candidates might be used in the clinic. First, researchers link all medications in the KG with all diseases in the test set, excluding the drug-disease associations in the training set. After then, researchers rate all possible pairings based on how likely they interact with one another. The researchers then calculate the recall by retrieving the top K medications (i.e., how many drugs and diseases in the testing set are in the full K). The last step is establishing a random screening baseline, in which the top K medicines in the drug set are randomly sampled, and the recall is calculated.


  • Therapeutic application prediction using geometric biological priors in TXGNN. TXGNN is based on the hypothesis that medications that target disease-disturbed networks in the protein interactome will have the greatest chance of success. Optimized to capture the geometry of TXGNN’s knowledge graph, TXGNN is a knowledge-grounded GNN that maps treatment candidates and disorders (disease concepts) into the latent representation space.
  • Using a reference TXGNN for zero-shot therapeutic application prediction. Researchers test TXGNN’s ability to forecast indications and contraindications. Since TXGNN is meant to treat diseases like Stargardt disease16 and hyperoxaluria, for which no treatments are currently available, its performance is measured using a metric called zero-shot performance, in which the model is asked to predict therapeutic use for diseases in a separate set of data known as the hold-out (test) set that was not seen during model training.
  • 100% accuracy in predicting therapeutic usage for five illness types. Similar therapies might be used for disorders that have similar biological bases.
  • Failing to forecast therapeutic usage in patients who routinely refuse treatment.
  • 100% accuracy with respect to 1,363 disorders for which there are indications and 1,195 conditions for which there are contraindications.
  • Giving careful consideration to which treatments are recommended and which are contraindicated.
  • Comparing TXGNN prognoses with current treatment options. Researchers considered 10 newly launched medicines authorized after TXGNN’s dataset and model development were complete to show that TXGNN is not driven by confirmation bias. In the TXGNN dataset, no drug-disease nodes are directly connected. The TXGNN was then asked to provide predictions for the researchers.


  • Regarding disorders for which no medicines exist, and our molecular knowledge is poor, TXGNN has a “zero-shot” predictive capacity for therapeutic usage.
  • Despite the practical limitation of knowing no medicines for a specific condition and needing to extrapolate to a new disease area not observed during training, TXGNN may greatly enhance therapeutic usage prediction across various disorders.
  • In addition, TXGNN’s predicted therapies show a high degree of correlation with data from actual electronic health records, and it can be used to test a large number of therapeutic hypotheses simultaneously by locating disease cohorts that have or have not been prescribed a particular medication utilizing patient populations followed for several years.
  • TXGNN’s predictions were presented to a group of physicians, and the audience could learn more about the self-explaining model used by TXGNN to treat illness. The importance of clinician-centered design in moving machine learning from development to biomedical implementation is highlighted by the results of a usability study that shows researchers using the interactive TXGNN Explorer can reproduce machine learning models and more easily identify and debug failure points of models.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.