Rating Multi-Modal Time-Series Forecasting Models (MM-TSFM) for Robustness Through a Causal Lens

Kausik Lakkaraju, Rachneet Kaur, Zhen Zeng, Parisa Zehtabi, Sunandita Patra, Biplav Srivastava, Marco Valtorta·June 12, 2024

Summary

This paper investigates the robustness of multi-modal time-series forecasting models (MM-TSFM) in AI systems, focusing on their performance under noisy, biased, and adversarial conditions. The authors propose a causal analysis-based rating methodology to assess the impact of different attributes on forecasting accuracy, using stock price predictions from six leading companies as a case study. Key findings include: 1. Multi-modal forecasting, known for improved accuracy, also exhibits higher robustness, with the ViT-num-spec model showing greater resilience when combined with visual data. 2. Semantic perturbations are the most disruptive, with MM-TSFM demonstrating 30% lower confounding bias and 60% lower MASE compared to purely numerical models under such attacks. 3. The study highlights the vulnerability of financial models to adversarial attacks and emphasizes the need for causal analysis in rating AI systems for resilience. 4. Causal diagrams and backdoor adjustment techniques are employed to account for confounding effects, differentiating between time-varying and time-invariant treatment effects. 5. The research evaluates models like ViT-num-spec, ARIMA, and biased systems using a modified rating approach, considering statistical bias, confounding, and the impact of perturbations on stock prices. The study contributes to the understanding of trust (robustness) vs. accuracy trade-offs in model selection and promotes more informed decision-making in time-series forecasting. It also underscores the importance of causal analysis and the need for explainability in AI systems, particularly in finance and time-sensitive applications.

Key findings

Paper digest

What problem does the paper attempt to solve? Is this a new problem?

The paper aims to address the issue of robustness in Artificial Intelligence (AI) systems, particularly in the context of time-series forecasting models, by evaluating them through a causal lens . This problem is not entirely new, as the fragility and lack of robustness in AI systems have been recognized previously, leading to challenges in gaining user trust and widespread adoption . The paper focuses on assessing the robustness of AI models, considering factors beyond performance, such as opaqueness and alignment to human values, to enhance decision-making and user trust in technology .

What scientific hypothesis does this paper seek to validate?

This paper aims to validate several scientific hypotheses related to time-series forecasting models through a causal lens. The hypotheses investigated include:

Does a specific attribute affect the residual of a system, even when that attribute has no direct effect on another variable? This hypothesis explores the impact of a company on the residual of a system despite having no direct effect on perturbation .
Does a certain attribute influence the relationship between two variables when it affects one of them? This hypothesis examines how a company can impact the relationship between perturbation and the residual of a system when it affects perturbation .
Does a particular variable affect the residual of a system when influenced by another variable? This hypothesis investigates how perturbation can affect the residual of a system when influenced by a company .
Does a variable impact the performance of a system in the presence of another variable? This hypothesis explores how semantic and compositional perturbations can degrade the performance of a system .

What new ideas, methods, or models does the paper propose? What are the characteristics and advantages compared to previous methods?

The paper "Rating Multi-Modal Time-Series Forecasting Models for Robustness Through a Causal Lens" proposes several innovative ideas, methods, and models in the field of time-series forecasting and AI systems evaluation . Here are some key points from the paper:

Evaluation Metrics: The paper introduces new evaluation metrics such as Weighted Rejection Score (WRS), APE, and PIE % tailored for assessing the robustness of Multi-Modal Time-Series Forecasting Models (MM-TSFM) .
Causal Analysis: The study focuses on evaluating different MM-TSFM for robustness using a causally grounded setup, specifically assessing two variations of the ViT-num-spec model introduced in previous works .
Adversarial Attacks in Finance Domain: The paper discusses the vulnerability of ML models in the finance domain to adversarial attacks, highlighting the need for mitigation strategies .
Transformer-Based Approaches: The research leverages transformer-based approaches, especially multi-modal transformer-based models, which have shown to be more efficient and accurate compared to traditional uni-modal neural network architectures .
Innovative Test Systems: The paper introduces different test systems, including a biased system (Sb) and a random system (Sr), to evaluate the performance and biases of MM-TSFM models .
Human-Centric Decision Support: The study emphasizes the importance of hypothesis-driven decision support in AI-assisted decision-making, aiming to involve human decision-makers in the process to improve the interpretability and trustworthiness of AI systems .

Overall, the paper presents a comprehensive approach to evaluating MM-TSFM models, introducing novel evaluation metrics, addressing adversarial attacks in the finance domain, and emphasizing the importance of human involvement in decision-making processes for AI systems. The paper "Rating Multi-Modal Time-Series Forecasting Models for Robustness Through a Causal Lens" introduces several key characteristics and advantages compared to previous methods in the field of time-series forecasting and AI system evaluation .

Evaluation Metrics: The paper proposes new evaluation metrics such as Weighted Rejection Score (WRS), APE, and PIE % tailored for assessing the robustness of Multi-Modal Time-Series Forecasting Models (MM-TSFM) . These metrics provide a more comprehensive and nuanced approach to evaluating model performance and bias compared to traditional methods.
Causal Analysis: The study focuses on evaluating MM-TSFM for robustness using a causally grounded setup, specifically assessing the impact of perturbations on model performance across different settings . This causal analysis approach allows for a deeper understanding of the direct influence of various attributes on forecasting accuracy and helps in mitigating confounding effects.
Resilience and Robustness: The research demonstrates that multi-modal forecasting, which combines numeric and visual data, exhibits greater robustness compared to purely numeric forecasting . MM-TSFM models show lower confounding bias and Mean Absolute Scaled Error (MASE) under perturbations, highlighting their resilience in diverse experimental conditions.
Practical Tool for Stakeholders: The paper provides a practical tool for stakeholders to select robust solutions tailored to their specific data and operational requirements . By offering a systematic rating system, stakeholders can make informed decisions when choosing MM-TSFM models, considering the data provided and the model's performance under perturbations.
Advantages of Causal Analysis: The use of causal analysis over statistical analysis allows for determining accountability, aligning with humanistic values, and quantifying the direct influence of attributes on forecasting accuracy . This approach helps in isolating specific impacts, mitigating confounding effects, and providing a more in-depth understanding of model behavior.

In summary, the paper's innovative characteristics lie in its introduction of new evaluation metrics, emphasis on causal analysis for model robustness, demonstration of resilience in multi-modal forecasting, provision of a practical tool for stakeholders, and the advantages of using causal analysis in assessing AI systems for robustness and bias mitigation.

Do any related researches exist? Who are the noteworthy researchers on this topic in this field?What is the key to the solution mentioned in the paper?

Several related research studies exist in the field of time-series forecasting models and causal analysis. Noteworthy researchers in this field include Thriemer, Price, Ghimire, Liu, Tsang, Xie, Tian, Huang, Chen, Lau, Feng, Du, Chu, Longo, Goebel, Lecue, Kieseberg, Holzinger, Miller, Mishev, Gjorgjevikj, Vodenska, Chitkushev, Trajanov, Abdia, Kulasekera, Datta, Boakye, Kong, Asan, Bayrak, Choudhury, Bernagozzi, Srivastava, Rossi, Usmani, Bica, Alaa, Van Der Schaar, Bondielli, Passaro, Boukherouaa, Shabsigh, AlAjmi, Deodoro, Farias, Iskender, Mirestean, Ravikumar, Cheng, Yang, Xiang, Xu, Gretton, Yu, Ling, Dong, Lu, Zeng, Kaur, Siddagangappa, Balch, Veloso, Zhu, Yin, Lyu, Zhang, Luo, Moraffah, Sheth, Karami, Bhattacharya, Wang, Tahir, Raglin, Nehemya, Mathov, Shabtai, Elovici, Pialla, Ismail Fawaz, Devanne, Weber, Idoumghar, Muller, Bergmeir, Schmidt, Webb, Forestier, Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Robins, Greenland, Hu, Rosenbaum, Rubin, Jin, Ekambaram, Manglik, Mukherjee, Sajja, Dwivedi, Raykar, Daubechies, Fang, Zhang, Cui, Tang, Gu, Li, Zhou, Fursov, Morozov, Kaploukhaya, Kovtun, Rivera-Castro, Gusev, Babaev, Kireev, Zaytsev, Burnaev, Gallagher, Pitropakis, Chrysoulas, Papadopoulos, Mylonas, Katsikas, Govindarajulu, Amballa, Kulkarni, Parmar, Lakkaraju, Srivastava, Valtorta, and Lakkaraju.

The key to the solution mentioned in the paper involves a causally grounded setup to rate different Multi-Modal Time-Series Forecasting Models (MM-TSFM) for robustness. The authors evaluate two variations of the ViT-num-spec model introduced in their work by implementing a causal analysis approach to assess the models' performance and robustness . This method involves a detailed causal analysis to understand the impact of perturbations on the outcome of time-series forecasting systems, considering various confounders and backdoor paths in the causal relationships . The researchers also focus on addressing the vulnerability of AI models in the finance domain to adversarial attacks, highlighting the importance of mitigation strategies in the financial sector .

How were the experiments in the paper designed?

The experiments in the paper were designed to evaluate various time-series forecasting systems for robustness through a causal lens . The experimental setup involved the evaluation of different forecasting models, including:

ViT-num-spec(Sv) models trained on daily stock time series data from before and during the COVID-19 outbreak, with distinct training periods and methodologies .
ARIMA (Sa) model, which is a statistical approach combining Autoregressive (AR), differencing (I), and moving average (MA) components for time series forecasting .
A Biased system (Sb) designed as an extreme baseline biased towards specific technology companies .
A Random system (Sr) generating random price predictions based on each company's stock prices within a specific range .

The evaluation/test data for the experiments included daily stock prices from six different companies spanning various sectors, collected from Yahoo! Finance . The data was used to predict stock prices for the following month using a sliding window technique for sampling input-output pairs .

The experiments also defined evaluation metrics such as Weighted Rejection Score (WRS), Absolute Percentage Error (APE), and Percentage Improvement in Error (PIE %) tailored to address specific research questions . The WRS metric, originally proposed in previous works, was used to measure statistical bias by comparing max residual distributions for different values of the protected attribute .

What is the dataset used for quantitative evaluation? Is the code open source?

The dataset used for quantitative evaluation in the study is daily stock prices from six companies: Meta Platforms, Inc. (META), Google (GOO), Pfizer Inc. (PFE), Merck (MRK), Wells Fargo (WFC), and Citigroup Inc. (C) . The code used in the study is not explicitly mentioned to be open source in the provided context .

Do the experiments and results in the paper provide good support for the scientific hypotheses that need to be verified? Please analyze.

The experiments and results presented in the paper provide strong support for the scientific hypotheses that needed verification. The study conducted a comprehensive analysis to evaluate the impact of perturbations and confounders on Multi-Modal Time-Series Forecasting Models (MM-TSFM) through a causal lens . The research questions were clearly defined, hypotheses were formulated, and causal diagrams were constructed to investigate the relationships between different variables . The metrics used in the experiments were carefully selected to address the specific research questions and hypotheses, ensuring a thorough evaluation .

The experiments involved testing the impact of perturbations and confounders on the performance of MM-TSFM models, particularly in the context of stock price prediction . Various perturbations were applied, and the performance metrics such as SMAPE, MASE, and Sign Accuracy were used to assess the models under different conditions . The results of the experiments provided valuable insights into how perturbations and confounders affect the robustness and accuracy of the forecasting models .

The paper utilized advanced techniques such as propensity score matching and weighted sampling to analyze the impact of confounders on the relationship between perturbations and model outcomes . By considering different distributions and conducting thorough analyses, the study was able to quantify biases and discrepancies in the MM-TSFM models based on industry and company factors . This detailed approach enhanced the credibility and reliability of the study's findings .

Overall, the experiments and results presented in the paper offer strong empirical support for the scientific hypotheses under investigation. The rigorous methodology, careful selection of metrics, and in-depth analysis of perturbations and confounders contribute to the robustness and validity of the study's conclusions .

What are the contributions of this paper?

The paper makes several contributions:

It analyzes erroneous data entries in paper-based and electronic data collection .
It explores preferences for artificial intelligence clinicians before and during the COVID-19 pandemic .
It surveys visual transformers in the context of neural networks and learning systems .
It discusses explainable artificial intelligence, its concepts, applications, research challenges, and visions .
It evaluates sentiment analysis in various contexts .
It delves into the use of propensity scores in observational studies for causal effects .
It advances the automatic rating of the trustworthiness of text processing services .
It contributes to the composable bias rating of AI systems to promote trustable applications .
It explores the use of neural mean embedding for back-door and front-door adjustment in time series forecasting .
It investigates adversarial attacks on deep neural networks for time series prediction .
It discusses imperceptible adversarial attacks on the S channel of the HSV colorspace .

What work can be continued in depth?

To delve deeper into the research presented in the document, several avenues for further exploration and continuation of work can be considered:

Expansion of Historical Data Collection: Future work could involve collecting historical data for more than two companies in each industry to enhance the analysis and evaluation .
Exploration of Additional Confounders: Further research could involve considering other confounders such as seasonal trends and additional external factors, like financial news, to conduct causal analysis and enhance the understanding of the impact of perturbations on MM-TSFM outcomes .
Robustness Testing of Time-series Forecasting Models: Building on the existing research, there is an opportunity to explore the robustness of time-series forecasting models under different attacks and perturbations, including those specific to transformer-based or multi-modal models, to assess their performance and resilience .
Evaluation of Perturbations and Confounders: Continuing the evaluation of the impact of perturbations and confounders on MM-TSFM outcomes, further studies could focus on quantifying biases created in models' predictions and measuring the isolated impact of perturbations through causal analysis .
Enhancement of Performance Metrics: Future work could involve refining and expanding the performance metrics used for evaluating test systems across different perturbations to provide a more comprehensive assessment of model performance and robustness .

Introduction

Background

Evolution of AI systems in time-series forecasting

Importance of robustness in financial applications

Objective

To assess MM-TSFM robustness under noisy, biased, and adversarial conditions

To propose a causal analysis-based rating methodology

Method

Data Collection

Case study: Stock price data from six leading companies

Data sources: Publicly available financial data

Data Preprocessing

Cleaning and normalization of time-series data

Integration of numerical and visual data (ViT-num-spec model)

Causal Analysis

Causal Rating Methodology

Development of a rating system for evaluating robustness

Factors: Statistical bias, confounding, and adversarial impact

Perturbation Analysis

Semantic perturbations: Impact on forecasting accuracy

Comparison with numerical models (ARIMA)

Causal Diagrams and Backdoor Adjustment

Construction of causal diagrams for understanding relationships

Backdoor adjustment for controlling confounding effects

Treatment Effects

Time-varying vs. time-invariant treatment effects

Identification and quantification of effects

Model Evaluation

ViT-num-spec, ARIMA, and biased systems comparison

Modified rating approach considering robustness and accuracy

Findings

Multi-modal Forecasting Robustness

ViT-num-spec model's resilience with visual data

Improved accuracy and robustness trade-off

Semantic Perturbations

Disruptiveness and MM-TSFM's response

Lower confounding bias and MASE

Adversarial Vulnerability

Financial models' susceptibility to attacks

Causal analysis as a safeguard

Explainability and Trust

Causal analysis in decision-making

Importance in finance and time-sensitive applications

Conclusion

Contributions to understanding robustness-accuracy trade-off

Recommendations for model selection and explainability in AI systems

Future research directions in time-series forecasting and causal analysis

Basic info

papers

machine learning

artificial intelligence

methodology

Advanced features

Insights

What techniques are used to account for confounding effects in the causal analysis of MM-TSFM models?

What type of analysis does the paper employ to evaluate the robustness of MM-TSFM models?

What vulnerability does the study reveal in financial models regarding adversarial attacks?

How much lower was the confounding bias for MM-TSFM models under semantic perturbations compared to purely numerical models?