Token-based Decision Criteria Are Suboptimal in In-context Learning

Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue·June 24, 2024

Summary

This paper investigates the limitations of token-based classification in In-Context Learning (ICL) for language models, particularly in terms of biases and under-calibration. The authors propose Hidden Calibration, which replaces token probabilities with nearest centroid classification on hidden states, improving performance by about 20% across 10 datasets. Hidden Calibration leverages the linear separability of language models' hidden representations and outperforms token-based methods, even with minimal computational cost. The study demonstrates that hidden state-based approaches, like Hidden Calibration, yield better classification criteria with lower inter-category overlap, and that the method benefits from demonstrations, enhancing linear separability and transferability across tasks with the same label space. The paper also suggests future research on automatic token selection and combining Hidden Calibration with other calibration techniques.

Key findings

8

Tables

2

Introduction

Background

[1] Token-based classification in ICL limitations

[2] Biases and under-calibration issues

Objective

[3] To propose Hidden Calibration as a solution

[4] Aim for 20% performance improvement across 10 datasets

Method

Data Collection

[5] Selection of 10 diverse datasets for evaluation

Data Preprocessing

[6] Analysis of token probabilities and hidden states

[7] Identifying linear separability in hidden representations

Hidden Calibration Approach

[8] Replacing token probabilities with nearest centroid classification

[9] Minimal computational cost

[10] Leveraging linear separability

Performance Evaluation

[11] Comparison with token-based methods

[12] Better classification criteria and lower inter-category overlap

Transferability and Demonstrations

[13] Enhanced linear separability with demonstrations

[14] Transferability across tasks with same label space

Future Research Directions

[15] Automatic token selection

[16] Combining Hidden Calibration with other calibration techniques

Results and Discussion

[17] Improved performance statistics

[18] Case studies and real-world implications

Conclusion

[19] Summary of findings and contributions

[20] Implications for the field of language model calibration and ICL

References

[21] Cited works and literature review

Basic info

papers

computation and language

machine learning

artificial intelligence

Advanced features

Insights

What does the paper focus on in the context of In-Context Learning for language models?

What is the main advantage of Hidden Calibration over token-based methods, as mentioned in the study?

How does the use of demonstrations affect the performance of Hidden Calibration, according to the paper?

How much improvement does Hidden Calibration achieve compared to token-based classification?