TabFSBench

Tabular Benchmark for Feature Shifts in Open Environment

1 School of Intelligence Science and Technology, Nanjing University, China
2 National Key Laboratory for Novel Software Technology, Nanjing University, China
3 School of Artificial Intelligence, Nanjing University, China
Corresponding Author



Abstract


Tabular data is widely utilized in various machine learning tasks. Current tabular learning research predominantly focuses on closed environments, while in real-world applications, open environments are often encountered, where distribution and feature shifts occur, leading to significant degradation in model performance. Previous research has primarily concentrated on mitigating distribution shifts, whereas feature shifts, a distinctive and unexplored challenge of tabular data, have garnered limited attention. To this end, this paper conducts the first comprehensive study on feature shifts in tabular data and introduces the first tabular feature-shift benchmark (TabFSBench). TabFSBench evaluates impacts of four distinct feature-shift scenarios on four tabular model categories across various datasets and assesses the performance of large language models (LLMs) and tabular LLMs in the tabular benchmark for the first time. Our study demonstrates three main observations: (1) most tabular models have the limited applicability in feature-shift scenarios; (2) the shifted feature set importance has a linear relationship with model performance degradation; (3) model performance in closed environments correlates with feature-shift performance. Future research direction is also explored for each observation.

News


  • [2025-03] Results from TabPFNv2 are added.
  • [2025-02] Our project page is released.
  • [2025-01] Our code is available now.
  • [2025-01] Our paper is accessible now.

If you have any questions, feel free to contact us at chengzj@lamda.nju.edu.cn or submit an issue in the project issue.

BibTeX

@article{cheng2025tabfsbench,
    author       = {Zi-Jian Cheng and Zi-Yi Jia and Zhi Zhou and Lan-Zhe Guo and Yu-Feng Li},
    title        = {TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment},
    booktitle    = {arXiv preprint arXiv:2501.18935},
    year         = {2025}
}