Probing the Emergence of Cross-lingual Alignment during LLM Training

Hetong Wang, Pasquale Minervini, Edoardo M. Ponti·June 19, 2024

Summary

This study investigates the emergence of cross-lingual alignment in Large Language Models (LLMs), specifically focusing on BLOOM, a multilingual autoregressive model. The research uses neuron overlap as a measure, which is positively correlated with zero-shot cross-lingual transfer performance. It finds that alignment improves during pre-training, enabling effective transfer even without parallel data, but also reveals non-monotonic behavior with drops in smaller models. The study examines neuron overlap in models of different scales (Bloom-560m, Bloom-1b1, and Bloom-1b7) and across layers, analyzing linguistic features like number, gender, and part-of-speech tagging. The results suggest that larger models follow a scaling law for better alignment after a certain point, and the correlation between alignment and performance highlights the importance of shared neurons for cross-lingual capabilities. The study contributes to understanding the dynamics of multilingual pre-training and the role of alignment in improving cross-lingual transfer.

Key findings

9

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper aims to investigate the emergence of cross-lingual alignment during Large Language Model (LLM) training, specifically focusing on probing the shared neurons and their impact on cross-lingual transfer ability . This problem is not entirely new, but the paper contributes to understanding how multilingual LLMs implicitly align information across languages without the need for parallel data, shedding light on the mechanisms behind cross-lingual generalization .


What scientific hypothesis does this paper seek to validate?

This paper aims to validate the hypothesis that shared neurons in multilingual Language Models (LLMs) are closely linked to the zero-shot cross-lingual transfer ability of these models. The study explores how the same sub-networks activated during inference and fine-tuning contribute to the cross-lingual generalization ability of LLMs. Additionally, the research investigates the correlation between neuron overlap and downstream task performance in syntactic and semantic tasks across different model scales .


What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" introduces several novel ideas, methods, and models in the field of natural language processing . Here are some key points from the paper:

  1. Intrinsic Probing Method: The paper proposes an intrinsic probing method to analyze the information encoded in the hidden representations of large language models (LLMs) . This method involves training individual probes for specific morphosyntactic features in different languages to identify the neurons that carry the most relevant information .

  2. Dataset Collection: The authors collected a dataset (D) from Universal Dependencies treebanks in 13 languages, where Universal Dependencies labels were mapped to the UniMorph Schema for a unified label scheme across languages . The contextual representations of words were computed using BLOOM at selected layers, and the embedding-label pairs were grouped by linguistic features and split into train, validation, and test sets .

  3. Model Evaluation: The paper evaluates the zero-shot cross-lingual transfer ability of the checkpoint models on part-of-speech tagging and natural language inference tasks in multiple languages . The study finds a strong correlation between neuron overlap rates and downstream performance across different model scales .

  4. Latent Variable Model: The authors utilize a latent variable model for intrinsic probing to identify the subset of dimensions within a representation that encode information for specific linguistic features . This model helps in understanding how different languages activate subnetworks within LLMs .

  5. Cross-lingual Alignment: The paper explores the emergence of cross-lingual alignment during LLM training and investigates the relation between implicit alignment and downstream performance . The study reports unexpected findings, such as non-monotonic growth in neuron overlap rates and severe drops during pre-training in smaller model scales .

Overall, the paper contributes to the understanding of cross-lingual alignment, model interpretability, and the probing of large language models to uncover linguistic features encoded in their representations . The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" introduces novel characteristics and advantages compared to previous methods in the field of natural language processing . Here are some key points highlighting these aspects:

  1. Intrinsic Probing Method: The paper utilizes an intrinsic probing method to analyze the information encoded in the hidden representations of large language models (LLMs) . This method involves training individual probes for specific morphosyntactic features in different languages to identify the neurons carrying the most relevant information .

  2. Dataset Collection and Training: The authors collected a dataset from Universal Dependencies treebanks in 13 languages, mapped the labels to the UniMorph Schema for a unified label scheme, computed contextual representations using BLOOM, and grouped embedding-label pairs by linguistic features . The probes were trained on these datasets to identify neurons encoding specific morphosyntactic features in different languages .

  3. Model Evaluation: The paper evaluates the zero-shot cross-lingual transfer ability of checkpoint models on part-of-speech tagging and natural language inference tasks in multiple languages . The study reports a strong correlation between neuron overlap rates and downstream performance across different model scales .

  4. Latent Variable Model: The authors employ a latent variable model for intrinsic probing to identify the subset of dimensions within a representation encoding information for specific linguistic features . This model aids in understanding how different languages activate subnetworks within LLMs .

  5. Cross-lingual Alignment Analysis: The paper delves into the emergence of cross-lingual alignment during LLM training and investigates the relationship between implicit alignment and downstream performance . The study reveals unexpected findings, such as non-monotonic growth in neuron overlap rates and severe drops during pre-training in smaller model scales .

Overall, the paper's approach offers a detailed analysis of cross-lingual alignment, model interpretability, and probing techniques to uncover linguistic features encoded in LLM representations, providing valuable insights into the workings of multilingual language models .


Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research studies exist in the field of multilingual large language models (LLMs) and cross-lingual alignment. Noteworthy researchers in this field include Yizhong Wang, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith, Ilya Loshchilov, Frank Hutter, Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky, Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah, Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers, Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald, Telmo Pires, Eva Schlinger, Dan Garrette, Karolina Sta´nczak, Lucas Torroba Hennigen, Adina Williams, Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova, Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, and many others .

The key to the solution mentioned in the paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" involves leveraging intrinsic probing techniques to identify subsets of neurons encoding linguistic features. By correlating the degree of cross-lingual neuron overlap with zero-shot cross-lingual transfer performance, the study sheds light on the conditions leading to effective cross-lingual transfer in multilingual LLMs. The research observes a high correlation between neuron overlap and downstream performance, supporting the hypothesis on effective cross-lingual transfer. Additionally, the study detects phases during the pre-training process where there is a degradation of both implicit alignment and multilingual abilities, providing new insights into multilingual pretraining dynamics .


How were the experiments in the paper designed?

The experiments in the paper were designed to study the cross-lingual ability of BLOOM, an autoregressive multilingual LM trained on data from 46 natural languages and 13 programming languages . The experiments considered three model sizes: 560m, 1b1, and 1b7, with different valid intermediate model checkpoints . These models were trained on an equivalent amount of tokens from the ROOTS corpus and shared the same tokenizer, allowing for consistent study of their training trajectories across scales . The study focused on two main metrics: neuron overlap between languages and zero-shot cross-lingual transfer performance on tasks like XNLI and POS tagging to assess multilingual semantic and syntactic knowledge .


What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is the dataset D, which was collected from annotated sentences from Universal Dependencies (UD) treebanks v2.1 from 13 languages . The code used in the study is open source and can be accessed through the Hugging Face repository .


Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that need to be verified. The study extensively examines the emergence of cross-lingual alignment during Large Language Model (LLM) training by probing intrinsic linguistic features across different layers and languages . The research delves into identifying subnetworks activated by specific linguistic features within LLMs, demonstrating a thorough analysis of the alignment process . Additionally, the study evaluates the correlation between neuron overlap and downstream task performance, highlighting a significant relationship across various model scales .

Moreover, the paper investigates the zero-shot cross-lingual transfer ability of the checkpoint models on part-of-speech tagging and natural language inference tasks, providing valuable insights into the generalization capabilities of multilingual LLMs . The findings reveal unexpected trends in alignment dynamics during pre-training, including notable drops at different stages, particularly in smaller model scales, challenging conventional assumptions . This nuanced analysis enhances the understanding of how multilingual LLMs implicitly align information across languages without the need for parallel data .

Furthermore, the study meticulously selects specific layers and linguistic features for analysis, focusing on informative features like Number and Gender at layers 13 and 17, which exhibit the highest overlap rates, contributing to a comprehensive evaluation of cross-lingual alignment . By comparing the alignment trends across different features and layers, the research provides a detailed exploration of the alignment process in LLMs, shedding light on the underlying mechanisms . Overall, the experiments and results in the paper offer robust empirical evidence to validate the scientific hypotheses related to cross-lingual alignment in LLM training, contributing significantly to the field of computational linguistics and language modeling research.


What are the contributions of this paper?

The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" makes several contributions:

  • It explores the emergence of cross-lingual alignment during LLM training, specifically focusing on what RoBERTa knows and when .
  • The paper delves into the decoupled weight decay regularization technique proposed by Ilya Loshchilov and Frank Hutter .
  • It discusses the integration of Universal Dependencies and Universal Morphology by Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, and David Yarowsky .
  • The study also examines the cross-lingual ability of multilingual BERT models, emphasizing the importance of alignment before prediction .
  • Additionally, it contributes to understanding deep subjecthood and higher-order grammatical features in multilingual BERT models .

What work can be continued in depth?

To delve deeper into the research, further investigation can be conducted on the intrinsic probing data collected from Universal Dependencies treebanks across 13 languages . This data involves mapping UD labels to the UniMorph Schema, computing contextual representations of words using BLOOM at selected layers, and training individual probes to identify neurons encoding morphosyntactic features in specific languages . Additionally, exploring the language-wise neuron overlap rate and zero-shot cross-lingual performance on downstream tasks could provide valuable insights for continued analysis .


Introduction
Background
Emergence of cross-lingual capabilities in LLMs
BLOOM: a multilingual autoregressive model
Objective
To study neuron overlap as a measure of alignment
Investigate zero-shot cross-lingual transfer performance
Identify trends in model scaling and linguistic feature transfer
Method
Data Collection
Neuron activation data from Bloom-560m, Bloom-1b1, and Bloom-1b7
Zero-shot cross-lingual transfer tasks
Data Preprocessing
Extraction of neuron overlap statistics
Analysis of linguistic feature representations
Model Analysis
Model Scaling:
Bloom-560m vs. Bloom-1b1 vs. Bloom-1b7
Scaling law for alignment improvement
Layer-wise Analysis:
Different layers' contribution to alignment
Linguistic Features:
Number, gender, and part-of-speech tagging
Correlation with alignment and performance
Results
Monotonic behavior of alignment during pre-training
Drops in alignment in smaller models
Scaling law for optimal alignment in larger models
Strong correlation between alignment and cross-lingual transfer
Discussion
Implications for multilingual pre-training dynamics
The role of shared neurons in cross-lingual capabilities
Future directions for improving cross-lingual transfer
Conclusion
Summary of key findings
Contributions to the understanding of LLMs and cross-lingual transfer
Limitations and suggestions for future research
Basic info
papers
computation and language
machine learning
artificial intelligence
Advanced features
Insights
How does the research measure cross-lingual alignment in LLMs?
What is the primary focus of the study on BLOOM?
Does the study find any relationship between neuron overlap and zero-shot cross-lingual transfer?
What does the analysis of neuron overlap across different model scales reveal about the improvement in alignment?

Probing the Emergence of Cross-lingual Alignment during LLM Training

Hetong Wang, Pasquale Minervini, Edoardo M. Ponti·June 19, 2024

Summary

This study investigates the emergence of cross-lingual alignment in Large Language Models (LLMs), specifically focusing on BLOOM, a multilingual autoregressive model. The research uses neuron overlap as a measure, which is positively correlated with zero-shot cross-lingual transfer performance. It finds that alignment improves during pre-training, enabling effective transfer even without parallel data, but also reveals non-monotonic behavior with drops in smaller models. The study examines neuron overlap in models of different scales (Bloom-560m, Bloom-1b1, and Bloom-1b7) and across layers, analyzing linguistic features like number, gender, and part-of-speech tagging. The results suggest that larger models follow a scaling law for better alignment after a certain point, and the correlation between alignment and performance highlights the importance of shared neurons for cross-lingual capabilities. The study contributes to understanding the dynamics of multilingual pre-training and the role of alignment in improving cross-lingual transfer.
Mind map
Correlation with alignment and performance
Number, gender, and part-of-speech tagging
Scaling law for alignment improvement
Bloom-560m vs. Bloom-1b1 vs. Bloom-1b7
Linguistic Features:
Different layers' contribution to alignment
Layer-wise Analysis:
Model Scaling:
Model Analysis
Zero-shot cross-lingual transfer tasks
Neuron activation data from Bloom-560m, Bloom-1b1, and Bloom-1b7
Identify trends in model scaling and linguistic feature transfer
Investigate zero-shot cross-lingual transfer performance
To study neuron overlap as a measure of alignment
BLOOM: a multilingual autoregressive model
Emergence of cross-lingual capabilities in LLMs
Limitations and suggestions for future research
Contributions to the understanding of LLMs and cross-lingual transfer
Summary of key findings
Future directions for improving cross-lingual transfer
The role of shared neurons in cross-lingual capabilities
Implications for multilingual pre-training dynamics
Strong correlation between alignment and cross-lingual transfer
Scaling law for optimal alignment in larger models
Drops in alignment in smaller models
Monotonic behavior of alignment during pre-training
Data Preprocessing
Data Collection
Objective
Background
Conclusion
Discussion
Results
Method
Introduction
Outline
Introduction
Background
Emergence of cross-lingual capabilities in LLMs
BLOOM: a multilingual autoregressive model
Objective
To study neuron overlap as a measure of alignment
Investigate zero-shot cross-lingual transfer performance
Identify trends in model scaling and linguistic feature transfer
Method
Data Collection
Neuron activation data from Bloom-560m, Bloom-1b1, and Bloom-1b7
Zero-shot cross-lingual transfer tasks
Data Preprocessing
Extraction of neuron overlap statistics
Analysis of linguistic feature representations
Model Analysis
Model Scaling:
Bloom-560m vs. Bloom-1b1 vs. Bloom-1b7
Scaling law for alignment improvement
Layer-wise Analysis:
Different layers' contribution to alignment
Linguistic Features:
Number, gender, and part-of-speech tagging
Correlation with alignment and performance
Results
Monotonic behavior of alignment during pre-training
Drops in alignment in smaller models
Scaling law for optimal alignment in larger models
Strong correlation between alignment and cross-lingual transfer
Discussion
Implications for multilingual pre-training dynamics
The role of shared neurons in cross-lingual capabilities
Future directions for improving cross-lingual transfer
Conclusion
Summary of key findings
Contributions to the understanding of LLMs and cross-lingual transfer
Limitations and suggestions for future research
Key findings
9

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper aims to investigate the emergence of cross-lingual alignment during Large Language Model (LLM) training, specifically focusing on probing the shared neurons and their impact on cross-lingual transfer ability . This problem is not entirely new, but the paper contributes to understanding how multilingual LLMs implicitly align information across languages without the need for parallel data, shedding light on the mechanisms behind cross-lingual generalization .


What scientific hypothesis does this paper seek to validate?

This paper aims to validate the hypothesis that shared neurons in multilingual Language Models (LLMs) are closely linked to the zero-shot cross-lingual transfer ability of these models. The study explores how the same sub-networks activated during inference and fine-tuning contribute to the cross-lingual generalization ability of LLMs. Additionally, the research investigates the correlation between neuron overlap and downstream task performance in syntactic and semantic tasks across different model scales .


What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" introduces several novel ideas, methods, and models in the field of natural language processing . Here are some key points from the paper:

  1. Intrinsic Probing Method: The paper proposes an intrinsic probing method to analyze the information encoded in the hidden representations of large language models (LLMs) . This method involves training individual probes for specific morphosyntactic features in different languages to identify the neurons that carry the most relevant information .

  2. Dataset Collection: The authors collected a dataset (D) from Universal Dependencies treebanks in 13 languages, where Universal Dependencies labels were mapped to the UniMorph Schema for a unified label scheme across languages . The contextual representations of words were computed using BLOOM at selected layers, and the embedding-label pairs were grouped by linguistic features and split into train, validation, and test sets .

  3. Model Evaluation: The paper evaluates the zero-shot cross-lingual transfer ability of the checkpoint models on part-of-speech tagging and natural language inference tasks in multiple languages . The study finds a strong correlation between neuron overlap rates and downstream performance across different model scales .

  4. Latent Variable Model: The authors utilize a latent variable model for intrinsic probing to identify the subset of dimensions within a representation that encode information for specific linguistic features . This model helps in understanding how different languages activate subnetworks within LLMs .

  5. Cross-lingual Alignment: The paper explores the emergence of cross-lingual alignment during LLM training and investigates the relation between implicit alignment and downstream performance . The study reports unexpected findings, such as non-monotonic growth in neuron overlap rates and severe drops during pre-training in smaller model scales .

Overall, the paper contributes to the understanding of cross-lingual alignment, model interpretability, and the probing of large language models to uncover linguistic features encoded in their representations . The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" introduces novel characteristics and advantages compared to previous methods in the field of natural language processing . Here are some key points highlighting these aspects:

  1. Intrinsic Probing Method: The paper utilizes an intrinsic probing method to analyze the information encoded in the hidden representations of large language models (LLMs) . This method involves training individual probes for specific morphosyntactic features in different languages to identify the neurons carrying the most relevant information .

  2. Dataset Collection and Training: The authors collected a dataset from Universal Dependencies treebanks in 13 languages, mapped the labels to the UniMorph Schema for a unified label scheme, computed contextual representations using BLOOM, and grouped embedding-label pairs by linguistic features . The probes were trained on these datasets to identify neurons encoding specific morphosyntactic features in different languages .

  3. Model Evaluation: The paper evaluates the zero-shot cross-lingual transfer ability of checkpoint models on part-of-speech tagging and natural language inference tasks in multiple languages . The study reports a strong correlation between neuron overlap rates and downstream performance across different model scales .

  4. Latent Variable Model: The authors employ a latent variable model for intrinsic probing to identify the subset of dimensions within a representation encoding information for specific linguistic features . This model aids in understanding how different languages activate subnetworks within LLMs .

  5. Cross-lingual Alignment Analysis: The paper delves into the emergence of cross-lingual alignment during LLM training and investigates the relationship between implicit alignment and downstream performance . The study reveals unexpected findings, such as non-monotonic growth in neuron overlap rates and severe drops during pre-training in smaller model scales .

Overall, the paper's approach offers a detailed analysis of cross-lingual alignment, model interpretability, and probing techniques to uncover linguistic features encoded in LLM representations, providing valuable insights into the workings of multilingual language models .


Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research studies exist in the field of multilingual large language models (LLMs) and cross-lingual alignment. Noteworthy researchers in this field include Yizhong Wang, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith, Ilya Loshchilov, Frank Hutter, Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky, Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah, Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers, Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald, Telmo Pires, Eva Schlinger, Dan Garrette, Karolina Sta´nczak, Lucas Torroba Hennigen, Adina Williams, Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova, Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, and many others .

The key to the solution mentioned in the paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" involves leveraging intrinsic probing techniques to identify subsets of neurons encoding linguistic features. By correlating the degree of cross-lingual neuron overlap with zero-shot cross-lingual transfer performance, the study sheds light on the conditions leading to effective cross-lingual transfer in multilingual LLMs. The research observes a high correlation between neuron overlap and downstream performance, supporting the hypothesis on effective cross-lingual transfer. Additionally, the study detects phases during the pre-training process where there is a degradation of both implicit alignment and multilingual abilities, providing new insights into multilingual pretraining dynamics .


How were the experiments in the paper designed?

The experiments in the paper were designed to study the cross-lingual ability of BLOOM, an autoregressive multilingual LM trained on data from 46 natural languages and 13 programming languages . The experiments considered three model sizes: 560m, 1b1, and 1b7, with different valid intermediate model checkpoints . These models were trained on an equivalent amount of tokens from the ROOTS corpus and shared the same tokenizer, allowing for consistent study of their training trajectories across scales . The study focused on two main metrics: neuron overlap between languages and zero-shot cross-lingual transfer performance on tasks like XNLI and POS tagging to assess multilingual semantic and syntactic knowledge .


What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is the dataset D, which was collected from annotated sentences from Universal Dependencies (UD) treebanks v2.1 from 13 languages . The code used in the study is open source and can be accessed through the Hugging Face repository .


Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that need to be verified. The study extensively examines the emergence of cross-lingual alignment during Large Language Model (LLM) training by probing intrinsic linguistic features across different layers and languages . The research delves into identifying subnetworks activated by specific linguistic features within LLMs, demonstrating a thorough analysis of the alignment process . Additionally, the study evaluates the correlation between neuron overlap and downstream task performance, highlighting a significant relationship across various model scales .

Moreover, the paper investigates the zero-shot cross-lingual transfer ability of the checkpoint models on part-of-speech tagging and natural language inference tasks, providing valuable insights into the generalization capabilities of multilingual LLMs . The findings reveal unexpected trends in alignment dynamics during pre-training, including notable drops at different stages, particularly in smaller model scales, challenging conventional assumptions . This nuanced analysis enhances the understanding of how multilingual LLMs implicitly align information across languages without the need for parallel data .

Furthermore, the study meticulously selects specific layers and linguistic features for analysis, focusing on informative features like Number and Gender at layers 13 and 17, which exhibit the highest overlap rates, contributing to a comprehensive evaluation of cross-lingual alignment . By comparing the alignment trends across different features and layers, the research provides a detailed exploration of the alignment process in LLMs, shedding light on the underlying mechanisms . Overall, the experiments and results in the paper offer robust empirical evidence to validate the scientific hypotheses related to cross-lingual alignment in LLM training, contributing significantly to the field of computational linguistics and language modeling research.


What are the contributions of this paper?

The paper "Probing the Emergence of Cross-lingual Alignment during LLM Training" makes several contributions:

  • It explores the emergence of cross-lingual alignment during LLM training, specifically focusing on what RoBERTa knows and when .
  • The paper delves into the decoupled weight decay regularization technique proposed by Ilya Loshchilov and Frank Hutter .
  • It discusses the integration of Universal Dependencies and Universal Morphology by Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, and David Yarowsky .
  • The study also examines the cross-lingual ability of multilingual BERT models, emphasizing the importance of alignment before prediction .
  • Additionally, it contributes to understanding deep subjecthood and higher-order grammatical features in multilingual BERT models .

What work can be continued in depth?

To delve deeper into the research, further investigation can be conducted on the intrinsic probing data collected from Universal Dependencies treebanks across 13 languages . This data involves mapping UD labels to the UniMorph Schema, computing contextual representations of words using BLOOM at selected layers, and training individual probes to identify neurons encoding morphosyntactic features in specific languages . Additionally, exploring the language-wise neuron overlap rate and zero-shot cross-lingual performance on downstream tasks could provide valuable insights for continued analysis .

Scan the QR code to ask more questions about the paper
© 2025 Powerdrill. All rights reserved.