8th VSIP 2026-Video, Signal and Image Processing | 第八届视频，信号与图像处理国际会议

Special Session 3: Intelligent Visual Perception and Understanding

Brief Description

Intelligent visual perception technologies, represented by vision foundation models, have profoundly transformed the research paradigms of computer vision and signal processing. Meanwhile, the rise of the low-altitude economy and the rapid development of embodied intelligence have opened vast application spaces for visual perception and understanding. This special session aims to build a high-level academic exchange platform, bringing together the latest achievements and frontier thinking of global researchers in intelligent visual perception. The session focuses on vision foundation models, low-altitude intelligent visual perception, embodied intelligence vision, multi-modal fusion and understanding, efficient visual computing, open-world visual perception, and trustworthy AI for visual perception, covering the complete chain from theoretical innovation to system implementation. We welcome research on visual perception mechanisms in complex real-world scenarios, visual understanding in open and dynamic environments, and application studies of visual intelligence in critical domains such as smart cities, low-altitude logistics, autonomous driving, intelligent robotics, and industrial inspection, pushing visual perception technologies toward greater generality, robustness, and trustworthiness.

Session Organizers

Prof. Wanli Xue, Tianjin University of Technology, China
Prof. Chunwei Tian, Harbin Institute of Technology, China
Postdoctoral Researcher Zhibin Zhang, Tianjin University of Technology, China

Sepcial Session Topics

The topics of interest include, but are not limited to:
• Vision Foundation Models and Large-scale Vision Models
• Low-Altitude Intelligent Visual Perception and Understanding (UAV/Low-altitude Platform Vision)
• Embodied Intelligence Vision and Robotic Perception
• Multi-Modal Visual Fusion and Cross-Modal Understanding
• Open-World Visual Perception and Continual Learning
• Efficient Visual Computing and Edge Intelligence Deployment
• Trustworthy AI, Explainability and Fairness in Visual Perception
• Applications of Visual Perception in Smart Cities, Autonomous Driving and Industrial Inspection

Submission Method

Submit your Full Paper (no less than 8 pages) or your paper abstract—without publication (200–400 words)—via the Online Submission System, then choose Special Session 3 (Intelligent Visual Perception and Understanding).

Template Download

Introduction of Session Organizers

Prof. Wanli Xue
Tianjin University of Technology, China

Wanli Xue is a Professor and Ph.D. supervisor in Computer Science at Tianjin University of Technology. His research centers on computer vision for social good, especially UAV perception and continuous sign-language recognition for barrier-free communication. He has authored 30+ IEEE/Elsevier journal papers, including IEEE T-IP, IEEE T-NNLS, IEEE T-CSVT, IEEE T-MM, IEEE T-ITS and Information Fusion, Pattern Recognition, CVPR, ECCV, etc..

Prof. Chunwei Tian
Harbin Institute of Technology, China

Chunwei Tian is a Professor and Ph.D. supervisor in the School of Computing, Harbin Institute of Technology, listed among the world’s top 2% scientists from 2022-2024. His research spans video/image restoration and recognition, image generation. He has published 90+ papers in IEEE Transactions, Pattern Recognition, Neural Networks and Information Fusion, including 7 ESI highly cited papers and benchmark studies on image super-resolution.

Postdoctoral Researcher Zhibin Zhang
Tianjin University of Technology, China

Zhibin Zhang is a Postdoctoral Researcher at Tianjin University of Technology. His research focuses on visual object tracking, low-altitude intelligent visual perception, open-world visual understanding, and embodied intelligence vision. He has published multiple papers in top-tier journals including IEEE TNNLS, IEEE TMM, and CCF T1 Chinese journals such as Journal of Computers. He won 1st place in the Ten-Billion-Pixel Video Object Tracking Challenge (Tsinghua University, 2024) and 3rd place globally in the International Long-term Tracking Challenge (ECCV, 2022). He holds 1 invention patent and 2 software copyrights.

Call for Papers

Quick Links

Special Session 3: Intelligent Visual Perception and Understanding