FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael P. Brenner, Peter Norgaard·April 08, 2025
Summary
FEABench assesses large language models' physics, math, and engineering skills via finite element analysis. It features a natural language-interactive COMSOL scheme for task automation and precision enhancement. The text covers mathematical problem-solving, code generation, scientific machine learning, and mechanics. AI tools like Toolllm, with access to 16,000+ real-world APIs, and a multimodal scientific paper question dataset are discussed. In 2024, AI research focused on multimodal understanding, Q&A benchmarks, medical applications, and language agents. Key studies included advancements in executable code actions, automated software engineering, theorem proving, and Gemini's multimodal medical capabilities. FEABench's model calculates tangential edge stress, applies symmetry and load conditions, and exports results. A Java API computes a target quantity, saved to a path. A 'main.py' code evaluates a partial differential equation system, assessing the target value's reasonableness, which is 39.
Advanced features