Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding

Yasha Ektefaie, Olivia Viessmann, Siddharth Narayanan, Drew Dresser, J. Mark Kim, Armen Mkrtchyan·October 22, 2024

Summary

RL-DIF, a model combining sequence recovery and reinforcement learning, enhances foldable diversity in protein inverse folding. It achieves comparable sequence recovery and structural consistency to benchmarks, with a 29% foldable diversity on CATH 4.2, surpassing 23% from comparable models. This innovation broadens options for downstream optimizations in protein design. The paper introduces RL-DIF, a protein inverse folding model that combines sequence recovery pre-training with reinforcement learning to optimize structural consistency. Evaluated on four benchmarks, RL-DIF balances sequence diversity and structural consistency, increasing foldable diversity by 29% compared to the best prior model, while maintaining competitive structural consistency.

Key findings

9

Tables

1

Advanced features