AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents

Fengze Liu, Haoyu Wang, Joonhyuk Cho, Dan Roth, Andrew W. Lo·June 04, 2025

Summary

AUTOCT framework merges large language models & classical ML to autonomously generate clinical trial prediction features. It uses Monte Carlo Tree Search for optimization, needing fewer iterations than current methods. AUTOCT constructs tabular features, enhancing predictive capabilities with unstructured data. It proposes 10 features for phase 1 trial outcome prediction, focusing on intervention type, demographics, previous trial success, research team, funding, primary outcome, location, and eligibility criteria. Key factors for future feature design include route of administration, dosing regimen, previous exposure, safety profiles, and participant health status.

Introduction

Background

Overview of clinical trial prediction challenges

Importance of accurate prediction in clinical research

Objective

Aim of the AUTOCT framework

How it addresses limitations in current clinical trial prediction methods

Method

Data Collection

Types of data utilized (structured, unstructured)

Sources of data (public databases, clinical trial registries)

Data Preprocessing

Techniques for handling unstructured data

Methods for integrating structured and unstructured data

Model Integration

How large language models are combined with classical ML

Role of Monte Carlo Tree Search in optimization

Feature Construction

Generation of tabular features from unstructured data

Selection of 10 key features for phase 1 trial outcome prediction

Evaluation

Metrics for assessing prediction accuracy

Comparison with existing methods

Key Features for Phase 1 Trial Outcome Prediction

Intervention Type

Importance in predicting trial success

Demographics

Influence on trial outcomes

Previous Trial Success

Relevance in assessing potential

Research Team

Impact on trial execution and success

Funding

Role in resource allocation and trial quality

Primary Outcome

Significance in defining trial objectives

Location

Considerations for geographical and logistical factors

Eligibility Criteria

Importance in participant selection and trial design

Future Feature Design Considerations

Route of Administration

Impact on drug efficacy and safety

Dosing Regimen

Influence on treatment outcomes and compliance

Previous Exposure

Relevance in understanding patient response

Safety Profiles

Importance in risk assessment and trial planning

Participant Health Status

Role in predicting potential complications and outcomes

Conclusion

Summary of AUTOCT's contributions

Future directions and potential improvements

Impact on clinical trial design and management

Basic info

papers

machine learning

artificial intelligence

Advanced features

Insights

What are the 10 proposed features by AUTOCT for predicting phase 1 clinical trial outcomes, and what data sources do they utilize?

What are the key advantages of using Monte Carlo Tree Search within the AUTOCT framework compared to existing optimization methods in clinical trial feature engineering?

How does the AUTOCT framework integrate large language models with classical machine learning techniques for autonomous feature generation in clinical trial prediction?

According to the text, what are the key factors to consider when designing future features for clinical trial outcome prediction, beyond the initial 10 proposed by AUTOCT?