Publications

2026

Not All Steps are Informative: On the Linearity of LLMs RLVR Training
Not All Steps are Informative: On the Linearity of LLMs' RLVR Training
Tianle Wang, Zhongyuan Wu, Shenghao Jin, Hao Xu, Wei Chen, Ning Miao
arXiv  ·  25 Jan 2026  ·  arxiv:2601.04537

2025

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu, Hao Xu, Xuhong Chen, Wei Chen, Yee Whye Teh, Ning Miao
arXiv  ·  03 Oct 2025  ·  arxiv:2510.01925