Mingsheng Li

AI Researcher @ Qwen Team, Alibaba Group

Fudan University

I am currently an AI researcher at the Qwen Team, Alibaba Group. Before that, I graduated from Fudan University under the supervision of Prof. Tao Chen. Previously, I have had the wonderful experience of working with Dr. Hongyuan Zhu from A*STAR, Dr. Gang Yu, Dr. Chi Zhang, Dr. Xin Chen from Tencent, and Dr. Bo Zhang from Shanghai AI Lab.

My current work and research focus on Large Vision-Language Models, Multi-agent System, Generative AI, and Embodied AI.

News

May 2026 🚀 We present Qwen-VLA, a unified VLA generalist model for manipulation, navigation, and trajectory prediction.
May 2026 📄 We release FineVLA and CalibAll for fine-grained VLA policies and cross-embodiment learning.
Apr. 2026 🛠️ StarVLA technical report released — a Lego-like development codebase for agile embodied intelligence.
Mar. 2026 🎉 StructChart accepted to T-PAMI!
Jan. 2026 🎉 VLM4VLA accepted to ICLR 2026!
Nov. 2025 📑 Qwen3-VL technical report released.
Jul. 2025 🎉 Chimera accepted to ICCV 2025!
Jan. 2025 🎉 GeoX accepted to ICLR 2025.
Jan. 2025 🚀 WI3D accepted to T-MM 2025.
Sep. 2024 🎉 3DET-Mamba accepted to NeurIPS 2024!
Jul. 2024 🎉 M3DBench accepted to ECCV 2024.
Apr. 2024 🎉 Vote2Cap-DETR++ accepted to T-PAMI 2024.
Feb. 2024 🎉 LL3DA accepted to CVPR 2024.

Collaborations

We are looking for candidates passionate about (1) native multimodal large models and (2) agents with deep reasoning and planning. We also welcome collaborations from academia and industry — email limingsheng.lms@alibaba-inc.com.

Selected Publications

Tech Reports

Technical Report Qwen-VLA

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Co-First Author

project paper github

An all-in-one unified vision-language-action generalist model that handles complex manipulation tasks, long-horizon navigation, and trajectory prediction within a single framework, generalizing across diverse tasks, environments, and robot embodiments.

Technical Report Qwen3-VL

Qwen3-VL Technical Report

Core Author

project paper github

A next-generation vision-language model featuring advanced visual reasoning, native dynamic-resolution processing, long-context video understanding, and multilingual support with both dense and MoE variants.

Technical Report Qwen3.5-Omni

Qwen3.5-Omni Technical Report

Author

project paper

An upgraded end-to-end omni model with enhanced multimodal reasoning, real-time streaming interaction, and stronger cross-modal understanding and generation across text, image, audio, and video modalities.

Technical Report Qwen3-Omni

Qwen3-Omni Technical Report

Author

project paper github

A natively end-to-end multilingual omni model supporting real-time streaming interaction across text, image, audio, and video with unified perception, understanding, and generation.

Technical Report StarVLA

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

Author

project paper github

A modular, Lego-like open-source codebase for agile VLA development, supporting seamless integration of diverse VLMs, action heads, and world models with unified training, evaluation, and deployment across robotics benchmarks.

Preprints

arXiv 2026

Unify Robot Actions in Camera Frame

Sicheng Xie, Lingchen Meng, Zijie Diao, Haidong Cao, Zhiying Du, Shuyuan Tu, Jiaqi Leng, Qiuyue Wang, Mingsheng Li, Shuai Bai, Zuxuan Wu, Yu-Gang Jiang

paper github

A training-free, robot-independent pipeline that estimates camera extrinsics for offline datasets and unifies robot actions in camera frame, enabling scalable cross-embodiment learning without any manual calibration.

arXiv 2026 FineVLA

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Xintong Hu, Xuhong Huang, Jinyu Zhang, Yutong Yao, Yuchong Sun, Qiuyue Wang, Mingsheng Li, Sicheng Xie, Yitao Liu, Junhao Chen, Yixuan Chen, Yingming Zheng, Shuai Bai, Tao Yu

paper project github benchmark

Fine-grained VLA supervision that improves steerable robot control.

Published

T-PAMI 2026 StructChart

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan, Bo Zhang

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2026.

project paper github

A unified approach for visual chart perception and reasoning using Structured Triplet Representations.

ICLR 2026

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

Jianke Zhang, Xiaoyu Chen, Qiuyue Wang, Mingsheng Li, Yanjiang Guo, Yucheng Hu, Jiajun Zhang, Shuai Bai, Junyang Lin, Jianyu Chen

International Conference on Learning Representations (ICLR), 2026.

paper github

A unified framework for studying how VLMs affect VLA performance.

ICCV 2025

Chimera: Improving Generalist Model with Domain-Specific Experts

Tianshuo Peng, Mingsheng Li (co-first), Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue

IEEE/CVF International Conference on Computer Vision (ICCV), 2025.

paper github

Cost-effective general-specialist collaboration to improve LMMs.

ICLR 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Renqiu Xia, Mingsheng Li (co-first), Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang

International Conference on Learning Representations (ICLR), 2025.

paper github

Formalized pre-training for geometry problem solving.

T-MM 2025

WI3D: Weakly Incremental 3D Detection via Vision Foundation Models

Mingsheng Li, Sijin Chen, Shengji Tang, Hongyuan Zhu, Yanyan Fang, Xin Chen, Zhuoyuan Li, Fukun Yin, Tao Chen

IEEE Transactions on Multimedia (T-MM), 2025.

project paper

Introducing new categories to 3D detectors via 2D foundation models.

NeurIPS 2024 3DET-Mamba

3DET-Mamba: State Space Model for End-to-End 3D Object Detection

Mingsheng Li, Jiakang Yuan, Sijin Chen, Lin Zhang, Anyu Zhu, Xin Chen, Tao Chen

Conference on Neural Information Processing Systems (NeurIPS), 2024.

project paper

End-to-end 3D detection with Mamba-based representation learning.

ECCV 2024

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

European Conference on Computer Vision (ECCV), 2024.

project paper github

Large-scale 3D-language dataset with interleaved multimodal prompts.

T-PAMI 2024 Vote2Cap-DETR++

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2024.

paper github

Decoupled queries for 3D localization and dense captioning.

CVPR 2024

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

project paper github

3D-LLMs for visual and textual interactions in complex 3D scenes.

T-MM 2024

Lightweight Model Pre-training via Language Guided Knowledge Distillation

Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen

IEEE Transactions on Multimedia (T-MM), 2024.

paper github

Language-guided knowledge distillation for lightweight model pre-training.

Experiences

Senior AI Researcher

Apr. 2025 - Present

Qwen Team, Alibaba Group

Researcher

Jun. 2024 - Apr. 2025

Tencent Tech

Researcher

Oct. 2023 - Jun. 2024

Shanghai AI Lab

Awards

2025 Tencent Rhino-Bird Elite Talent Program
2025 Outstanding Graduate of Shanghai
2024 National Scholarship
2023 First-Class Academic Scholarship
2022 Outstanding Graduate of Fudan University
2020 National 2nd Prize, China Undergraduate Mathematical Contest in Modeling
2019 National 1st Prize, Chinese Mathematics Competitions (Top 20)

Services

Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR, ACM MM

Journal Reviewer: T-PAMI, T-MM