Research in Biotechnology and Pharmaceuticals Using Large Language Models

ProGen

Progen is a deep-learning LLM capable of generating protein sequences with a predictable function across large protein families. ProGen was trained on 280M protein sequences from more than 19,000 families, and the model is augmented with control tags specifying the property of the protein. ProGen can be fine-tuned to create more accurate protein sequences using specific sequences and tags.

ChemCrow

Although LLMs have shown great performance in tasks across various domains, they often struggle with chemistry-related problems. Additionally, these models do not have access to external sources, which limits their usefulness in scientific research. ChemCrow is an LLM chemistry agent that aims to solve this issue. The model is designed to accomplish tasks across drug discovery, organic synthesis, and materials design.

13 expert-designed tools have been integrated to develop ChemCrow, which augments its performance in chemistry. The model has the ability to aid expert chemists and lower barriers for non-experts. Moreover, It can facilitate scientific advancement by bridging the gap between experimental and computational chemistry.

ChatGPT in Drug Discovery

Researchers from Michigan State University have explored the use of ChatGPT in drug discovery. They have come up with the following results:

ChatGPT can be fine-tuned on scientific literature and can be used to generate summaries of the latest research on a given disease. This can help researchers identify new potential targets or better understand the current state of research in a specific area.
By training ChatGPT on a set of established drug-like molecules, it is possible to produce novel chemical structures with similar characteristics. This approach can help scientists identify new lead compounds with a higher success rate in pre-clinical and clinical studies.
ChatGPT can predict the pharmacokinetics and pharmacodynamics of new drugs and support the virtual screening of chemical libraries in early-stage drug discovery.
ChatGPT can be trained on a dataset of toxicity data and then used to predict the potential toxic effects of new drugs.

Use of ChatGPT/GPT-4 in Computational Biology

Following are some of the ways that computational biologists can optimize their workflow using ChatGPT/GPT-4:

Code readability and documentation can be improved using ChatGPT.
ChatGPT can assist in writing efficient codes.
Researchers can integrate ChatGPT into their IDEs via plugins for RStudio and Visual Studio Code.
ChatGPT can improve scientific writing by providing aid in expressing ideas more clearly.
ChatGPT can be used for cleaning and reconciling data.
Data visualization can be improved as ChatGPT can suggest new visualization techniques and enhance existing figures.
The GPT API can be used to fine-tune the system for specific applications, and parameters can be adjusted to control the creativity and repetitiveness of responses.

ChatGPT in Bioinformatics

A group of researchers has demonstrated the feasibility of using ChatGPT in bioinformatics education to assist students in generating code for scientific data analysis tasks. In their study, ChatGPT generated code to align the short reads to the human reference genome and summarized the alignments into count numbers across the genome.

ChatGPT can also assist students in phylogenetic analyses. The researchers created a phylogenetic tree for nine species using R code generated by the model. In their study, the researchers also showed that ChatGPT could act as a virtual teaching assistant to teach the divide-and-conquer approach to a student.

ChatGPT in Drug Development

A group of researchers demonstrated the effectiveness of ChatGPT in predicting and explaining common Drug-Drug Interactions (DDI). They prepared a total of 40 DDIs list from previously published literature. Their study showed that ChatGPT is partially effective in predicting and explaining DDIs.

Patients, who do not have immediate access to the healthcare facility, may take help from ChatGPT to get information about DDIs. However, occasionally, the model may provide incomplete guidance. Therefore further improvement is required for potential usage by patients to get ideas about DDI.

ChatGPT in Pharmacometrics

Following are the use cases of ChatGPT in pharmacometrics:

ChatGPT can accurately obtain typical PK parameters from the scientific literature.
The model can generate a population PK model in R.
ChatGPT is capable of developing an interactive Shiny application for visualization.
Using ChatGPT, R code can be developed with minimal coding knowledge. Moreover, debugging of errors can be easily done using the same.

GeneGPT

GeneGPT is a novel method for teaching LLMs to utilize the National Center for Biotechnology Information (NCBI) Web API for answering genomics questions. GeneGPT has achieved state-of-the-art results on 75% of one-shot tasks and 80% of zero-shot tasks in the GeneTuring dataset. GeneGPT can potentially augment LLMs with domain tools to improve access to biomedical information.

CancerGPT

CancerGPT is a first-of-its-kind few-shot learning model that utilizes LLMs to predict the drug pairs synergy in rare tissues lacking structured data and features. It contains around 124M parameters and is even comparable to the larger fine-tuned GPT-3 model with 175B parameters. CancerGPT shows the potential of LLMs to offer an alternative approach for biological inference.

ChatGPT in Medical Research

ChatGPT can analyze large volumes of data, including scientific articles, medical reports, and patient reports. All of this analysis can provide new insights into the symptoms and treatment options for orthopedic conditions.

ChatGPT can extract relevant information from the text and present it in a structured form. ChatGPT can also assist in the creation of new hypotheses for researchers. Additionally, ChatGPT can be useful in developing clinical decisions and support systems by analyzing patient records and identifying common patterns.

ChatGPT in Medicine

ChatGPT can inform researchers about the latest literature in a given area. It can write a discharge summary for patients following surgery. The model can aid with patient discharge notes, summarize recent trials, provide information on ethical guidelines, etc.