In the field of synthetic biology, making breakthroughs has traditionally relied on trial and error due to the complex nature of biological systems. However, the introduction of artificial intelligence (AI) and machine learning (ML) has the potential to revolutionize the way we approach biomanufacturing and biotechnology decision-making.
This piece explores the promising role of AI, particularly ML, in synthetic biology research and highlights the benefits it offers in terms of data collection, organization, and analysis.
Let’s dive in.
We’re aware of this but I’ll say it anyway, machine learning, a branch of AI, leverages data-driven approaches to correlate inputs (features) to outputs (productions), thereby aiding in the modeling and prediction of microbial cell factories.
However, ML traditionally requires a large amount of experimental data, which can be time-consuming and costly to obtain. To overcome this challenge, transfer learning (TL) offers a solution by transferring knowledge from well-studied domains to less-explored scenarios, reducing computational costs and expediting the learning process.
One of the key advantages of utilizing AI, specifically natural language processing (NLP), is the ability to process text at a large scale. With the help of NLP tools like GPT-4, vast amounts of information can be automatically extracted from published journal articles, enabling efficient topic organization and knowledge mining. This automated extraction eliminates the labor-intensive manual effort and minimizes human errors in interpreting and structuring the data.
In the context of synthetic biology, biomanufacturing features such as thermodynamics, reaction stoichiometry, bioreactor conditions, and genetic engineering methods are of utmost importance.
While mechanistic models struggle to capture all influential factors and simulate microbial productions under realistic conditions, ML approaches can effectively leverage previous knowledge to model and optimize microbial cell factories.
By extracting and centralizing data from numerous publications, an AI-assisted database can support various applications, including but not limited to:
· biomanufacturing design,
· commercial decision-making,
· project quality/risk assessment, and
· monitoring adverse drug events from electronic health record notes.
The development of AI, particularly tools like GPT-4, has significantly accelerated the data extraction process for ML applications in synthetic biology.
GPT-4 exhibits sparks of artificial general intelligence and can rapidly parse text based on user-provided context. By utilizing GPT-4, relevant bioprocess features and outcomes from published papers can be extracted, thereby facilitating rapid database growth.
A notable example of AI's contribution to synthetic biology is the case of Rhodosporidium toruloides, a novel yeast gaining attention for its high lipid content and native carotenoid production. Extracting manual data for this yeast can be time-consuming, but with the aid of GPT-4, this challenge can be effectively overcome.
As per the authors, extracting data from one paper typically takes 20 minutes and involves labeling, dividing article sections, entering prompts, recording GPT responses, combining results into an ML-ready dataset, and quality checking. Time consuming, to say the least.
GPT-4 was used to extract knowledge from 176 publications, resulting in 2037 data instances uploaded to a crowdsourcing online database. GPT-4 can summarize experiment results and methods into accessible tables when clearly conveyed in the text.
Synthetic biology tools have been developed to engineer microbes for sustainability. GPT-4 can also accelerate data extraction for machine learning to predict microbial performance under complex conditions.
The field of synthetic biology historically relied on trial and error, but AI and ML can improve the design-build-test-learn (DBTL) cycles.
The method described in the context (in this paper) has shown potential in automating information extraction from research articles and facilitating biomanufacturing and biotech commercial decisions.
Let’s have a look at the benefits of the new method:
- Inexpensive strategy for supporting ML approaches by mining knowledge from published journal articles.
- Transfer learning reduces computational costs and speeds up the learning process by transferring knowledge from well-studied domains.
- NLP tools enable large-scale text processing and organization of topics in published articles.
- AI applications supported by the resulting database can assist in biomanufacturing design, decision-making, and risk assessment.
- ML approaches can predict microbial performance, optimize bioprocesses, and recommend strain engineering approaches.
The novel method does come with its host of challenges:
- Manual extraction of data from articles is labor-intensive and prone to errors.
- Lack of standardized formats in reported data requires substantial efforts for interpretation and organization into ML-ready data.
- Mechanistic models struggle to incorporate all influential factors for simulating microbial productions under realistic bioreactor conditions.
- Data reporting from publications is often sparse and inconsistent, making data extraction without losing important knowledge challenging.
- AI output's accuracy is improving but still requires human supervision.
- Incorporating AI and ML in synthetic biology still requires human supervision and is not entirely automated at this stage.
What does the A.I Scientist think?
The integration of AI, particularly machine learning, in synthetic biology research and biomanufacturing processes holds immense promise. AI-powered tools have the potential to automate information extraction from research articles, streamline data organization, and advance design-build-test-learn (DBTL) cycles.
While the accuracy of AI outputs continues to improve, human supervision and validation remain crucial. By harnessing the power of AI, we can enhance our understanding of biological systems, optimize microbial performance, and accelerate advancements in synthetic biology.
Overall, the use of machine learning and AI in synthetic biology has the potential to revolutionize the field, offering more efficient data collection, organization, and analysis.
It holds promise for solving complex challenges, reducing experimental trials, and improving the effectiveness of strain development. While there are still limitations and challenges that need to be addressed, the continued development of AI and its integration into synthetic biology research opens exciting possibilities for the future.
Live long and prosper!