Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma·November 06, 2024

Summary

PolyCom, a novel activation function for transformers, enhances model performance in capturing complex data relationships. It overcomes limitations of traditional functions like ReLU by facilitating more complex pattern modeling. Empirical experiments on large language models show PolyCom improves accuracy and convergence rates, outperforming other activation functions. The paper introduces PolyCom, including PolyReLU and PolyNorm, for integration into Transformer architecture, improving model capacity and performance. PolyReLU networks have equivalent expressivity to ReLU and polynomial activations, offering stronger approximation abilities with fewer parameters. PolyCom variants are shown to improve accuracy and convergence rates in large language models, outperforming other functions. The summary highlights the effectiveness of PolyCom in enhancing model expressivity and performance in neural networks and transformers.

Key findings

Tables

Advanced features