Adaptive Layer-skipping in Pre-trained LLMs X. Luo, W. Wang, X.
Yan
arXiv:2503.23798v2, 2025 [arxiv]
Language Models Augmented with Decoupled Memory, by W. Wang, L.
Dong, H. Cheng, X. Liu, X. Yan, J. Gao, F. Wei NeurIPS'23(The Thirty-seventh Annual Conference on Neural Information Processing
Systems), 2023 [arxiv]
Visually-augmented language modeling by W. Wang, L. Dong, H. Cheng, H.
Song, X. Liu, X. Yan, J. Gao, F. Wei
ICLR'23
(Proceedings of Int. Conf. on Learning Representations) [pdf]
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer
on Time Series Forecasting, by S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, X. Yan NeurIPS'19 (The Thirty-third Annual Conference on Neural Information Processing
Systems) [pdf]