Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs

Yushi Feng, Tsai Hor Chan, Guosheng Yin, Lequan Yu·February 19, 2025

Summary

Proposed DemoGraph uses large language models for black-box context-driven graph data augmentation. It generates knowledge graphs through text prompts, integrating structural interactions into original graphs during training. A granularity-aware prompting strategy and instruction fine-tuning module control sparsity, enhancing predictive performance and interpretability, especially in electronic health records. The framework addresses open-world settings with proper sparsity control, democratizing access to extensive knowledge without model weights or source codes. Key contributions include dynamic merging, granularity-aware prompting, and sequential prompting with instruction fine-tuning. This method improves graph learning tasks, particularly in electronic health records, by maximizing contextual information utilization, leading to better predictive performance and interpretability. Extensive experiments validate its effectiveness over existing methods, demonstrating high scalability across datasets.

Key findings

5

Introduction
Background
Context-driven graph data augmentation
Role of large language models in graph learning
Objective
Enhancing graph learning through text prompts
Improving predictive performance and interpretability in electronic health records
Method
Data Collection
Text prompts for generating knowledge graphs
Data Preprocessing
Integration of structural interactions into original graphs
Training
Training process with granularity-aware prompting strategy
Instruction fine-tuning module for sparsity control
Framework Components
Dynamic merging for efficient graph representation
Granularity-aware prompting for controlled sparsity
Sequential prompting with instruction fine-tuning for enhanced interpretability
Key Contributions
Dynamic Merging
Efficiently combines graph elements for better representation
Granularity-Aware Prompting
Controls sparsity to optimize predictive performance
Sequential Prompting with Instruction Fine-Tuning
Enhances interpretability in complex graph learning tasks
Application in Electronic Health Records
Challenges in Open-World Settings
Addressing scalability and interpretability issues
Improvements in Predictive Performance
Maximizing utilization of contextual information
Interpretability Enhancements
Better understanding of graph learning outcomes
Experiments and Validation
Method Comparison
Effectiveness over existing graph augmentation methods
Scalability Across Datasets
Demonstrating high scalability in diverse applications
Conclusion
Summary of Contributions
Future Directions
Potential for further advancements in graph learning
Expanding applications in healthcare and beyond
Basic info
papers
machine learning
artificial intelligence
Advanced features