Adaptive Layer-skipping in Pre-trained LLMs X. Luo, W. Wang, X.
Yan COLM'25
(Conference on Language Modeling), arXiv:2503.23798v2, 2025 [arxiv]
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Weizhi Wang, Yu Tian, Linjie Yang, Heng Wang, Xifeng Yan COLM'25 (Conference
on Language Modeling),
arXiv:2504.00595v2, 2025 [arxiv]
Language Models Augmented with Decoupled Memory, by W. Wang, L.
Dong, H. Cheng, X. Liu, X. Yan, J. Gao, F. Wei NeurIPS'23(The Thirty-seventh Annual Conference on Neural Information Processing
Systems), 2023 [arxiv]
Visually-augmented language modeling by W. Wang, L. Dong, H. Cheng, H.
Song, X. Liu, X. Yan, J. Gao, F. Wei
ICLR'23
(Proceedings of Int. Conf. on Learning Representations) [pdf]
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer
on Time Series Forecasting, by S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, X. Yan NeurIPS'19 (The Thirty-third Annual Conference on Neural Information Processing
Systems) [pdf]