An Explainable Disease Surveillance System for Early Prediction of Multiple Chronic Diseases

Shaheer Ahmad Khan, Muhammad Usamah Shahid, Ahmad Abdullah, Ibrahim Hashmat, Muddassar Farooq·January 27, 2025

Summary

An explainable disease surveillance system uses routine EHR data to predict multiple chronic diseases 3-12 months before diagnosis, focusing on medical history, vitals, diagnoses, and medications. It trains three models for each disease, internally validated with F1 scores and AUROC, and further evaluated by expert physicians for clinical relevance. The system aims to enhance explainability through Shapely attributes, surrogate models, and a new rule engineering framework. It addresses the need for a surveillance system capable of predicting multiple chronic conditions, focusing on routine EHR data to develop a clinically useful, practical, and explainable predictor for risks one year in advance, aiming to improve preventive measures and reduce healthcare costs.

Key findings

2

Introduction
Background
Overview of disease surveillance systems
Importance of early disease prediction
Challenges in traditional disease prediction methods
Objective
Aim of the explainable disease surveillance system
Key features and benefits
Method
Data Collection
Sources of routine EHR data
Data types included (medical history, vitals, diagnoses, medications)
Data Preprocessing
Data cleaning and normalization
Handling missing values
Model Training
Selection of models for each disease
Internal validation using F1 scores and AUROC
Clinical Relevance Evaluation
Expert physician review process
Criteria for clinical relevance
Enhancing Explainability
Shapely Attributes
Explanation of Shapely values
How they contribute to model interpretability
Surrogate Models
Use of simpler models to explain complex predictions
Benefits and limitations
Rule Engineering Framework
Development of rules for model predictions
Integration with clinical guidelines
System Evaluation
Performance Metrics
Metrics used for model evaluation
Comparison with existing systems
Clinical Utility
Assessment of system's impact on healthcare
Case studies or pilot project results
Conclusion
Future Directions
Ongoing research and development
Potential for scalability and integration
Impact on Healthcare
Expected improvements in preventive measures
Reduction in healthcare costs
Summary of Key Findings
Recap of system's capabilities and benefits
Basic info
papers
machine learning
artificial intelligence
Advanced features
Insights
What methods are used to enhance the explainability of the disease surveillance system?
What is the ultimate goal of developing this explainable disease surveillance system?
How does the system predict multiple chronic diseases 3-12 months before diagnosis?
What is the main idea behind the explainable disease surveillance system mentioned in the text?