ESC

Publications

Preprints

    Refereed Journal Articles

    1. Yang, W., Xiao, Q., & Zhang, Y. (2024). HA R 2 bot: A human-centered augmented reality robot programming method with the awareness of cognitive load. Journal of Intelligent Manufacturing, 35(5), 1985–2003.
      @article{yang2024ha,
        title = {HA R 2 bot: A human-centered augmented reality robot programming method with the awareness of cognitive load},
        author = {Yang, Wenhao and Xiao, Qinqin and Zhang, Yunbo},
        journal = {Journal of Intelligent Manufacturing},
        volume = {35},
        number = {5},
        pages = {1985--2003},
        year = {2024},
        publisher = {Springer US New York}
      }
      
    2. Yang, W., Dengxiong, X., Wang, X., Hu, Y., & Zhang, Y. (2024). “i can see your password”: A case study about cybersecurity risks in mid-air interactions of mixed reality-based smart manufacturing applications. Journal of Computing and Information Science in Engineering, 24(3), 031004.
      @article{yang2024can,
        title = {“i can see your password”: A case study about cybersecurity risks in mid-air interactions of mixed reality-based smart manufacturing applications},
        author = {Yang, Wenhao and Dengxiong, Xiwen and Wang, Xueting and Hu, Yidan and Zhang, Yunbo},
        journal = {Journal of Computing and Information Science in Engineering},
        volume = {24},
        number = {3},
        pages = {031004},
        year = {2024},
        publisher = {American Society of Mechanical Engineers}
      }
      
    3. Yang, W., & Zhang, Y. (2024). A global correction framework for camera registration in video see-through augmented reality systems. Journal of Computing and Information Science in Engineering, 24(3), 031003.
      @article{yang2024global,
        title = {A global correction framework for camera registration in video see-through augmented reality systems},
        author = {Yang, Wenhao and Zhang, Yunbo},
        journal = {Journal of Computing and Information Science in Engineering},
        volume = {24},
        number = {3},
        pages = {031003},
        year = {2024},
        publisher = {American Society of Mechanical Engineers}
      }
      
    4. Xian, C., Zhang, J., Yang, W., & Zhang, Y. (2024). Multi-scale progressive fusion-based depth image completion and enhancement for industrial collaborative robot applications. Journal of Intelligent Manufacturing, 35(5), 2119–2135.
      @article{xian2024multi,
        title = {Multi-scale progressive fusion-based depth image completion and enhancement for industrial collaborative robot applications},
        author = {Xian, Chuhua and Zhang, Jun and Yang, Wenhao and Zhang, Yunbo},
        journal = {Journal of intelligent manufacturing},
        volume = {35},
        number = {5},
        pages = {2119--2135},
        year = {2024},
        publisher = {Springer US New York}
      }
      

    Refereed Conference Proceedings

    1. Liu, Y., Liang, J., Fan, H., Yang, W., Cui, Y., Han, X., Huangg, L., Liu, D., Wang, Q., & Han, C. (2026). All you need is one: Capsule prompt tuning with a single vector. Advances in Neural Information Processing Systems, 38, 88139–88166.
      PDF Website
      @inproceedings{liu2026all,
        title = {All you need is one: Capsule prompt tuning with a single vector},
        author = {Liu, Yiyang and Liang, James and Fan, Heng and Yang, Wenhao and Cui, Yiming and Han, Xiaotian and Huangg, Lifu and Liu, Dongfang and Wang, Qifan and Han, Cheng},
        journal = {Advances in Neural Information Processing Systems},
        volume = {38},
        pages = {88139--88166},
        year = {2026},
        url = {https://proceedings.neurips.cc/paper_files/paper/2025/hash/7f8b8bc8ebac661c442c4dafd5d98c08-Abstract-Conference.html},
        pdf = {NeurIPS-2025-all-you-need-is-one-capsule-prompt-tuning-with-a-single-vector-Paper-Conference.pdf}
      }
      
      Prompt-based learning has emerged as a parameter-efficient finetuning (PEFT) approach to facilitate Large Language Model (LLM) adaptation to downstream tasks by conditioning generation with task-aware guidance. Despite its successes, current prompt-based learning methods heavily rely on laborious grid searching for optimal prompt length and typically require considerable number of prompts, introducing additional computational burden. Worse yet, our pioneer findings indicate that the task-aware prompt design is inherently limited by its absence of instance-aware information, leading to a subtle attention interplay with the input sequence. In contrast, simply incorporating instance-aware information as a part of the guidance can enhance the prompt-tuned model performance without additional fine-tuning. Moreover, we find an interesting phenomenon, namely "attention anchor", that incorporating instance-aware tokens at the earliest position of the sequence can successfully preserve strong attention to critical structural information and exhibit more active attention interaction with all input tokens. In light of our observation, we introduce Capsule Prompt-Tuning (CaPT), an efficient and effective solution that leverages off-the-shelf, informative instance semantics into prompt-based learning. Our approach innovatively integrates both instance-aware and task-aware information in a nearly parameter-free manner (i.e., one single capsule prompt). Empirical results demonstrate that our method can exhibit superior performance across various language tasks (e.g., 84.03% average accuracy on T5-Large), serving as an "attention anchor," while enjoying high parameter efficiency (e.g., 0.003% of model parameters on Llama3.2-1B).
    2. Wang, T., Han, C., Liang, J., Yang, W., Liu, D., Zhang, L. X., Wang, Q., Luo, J., & Tang, R. (2025). Exploring the adversarial vulnerabilities of vision-language-action models in robotics. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6948–6958. arXiv:2411.13587.
      PDF DOI arXiv Website
      @inproceedings{wang2025exploring,
        title = {Exploring the adversarial vulnerabilities of vision-language-action models in robotics},
        author = {Wang, Taowen and Han, Cheng and Liang, James and Yang, Wenhao and Liu, Dongfang and Zhang, Luna Xinyu and Wang, Qifan and Luo, Jiebo and Tang, Ruixiang},
        booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
        pages = {6948--6958},
        year = {2025},
        note = {arXiv:2411.13587},
        doi = {10.48550/arXiv.2411.13587},
        url = {https://openaccess.thecvf.com/content/ICCV2025/html/Wang_Exploring_the_Adversarial_Vulnerabilities_of_Vision-Language-Action_Models_in_Robotics_ICCV_2025_paper.html},
        arxiv = {https://arxiv.org/abs/2411.13587},
        pdf = {Wang_Exploring_the_Adversarial_Vulnerabilities_of_Vision-Language-Action_Models_in_Robotics_ICCV_2025_paper.pdf}
      }
      
      Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework. Despite their significant capabilities, VLA models introduce new attack surfaces. This paper systematically evaluates their robustness. Recognizing the unique demands of robotic execution, our attack objectives target the inherent spatial and functional characteristics of robotic systems. In particular, we introduce two untargeted attack objectives that leverage spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory. Additionally, we design an adversarial patch generation approach that places a small, colorful patch within the camera’s view, effectively executing the attack in both digital and physical environments. Our evaluation reveals a marked degradation in task success rates, with up to a 100% reduction across a suite of simulated robotic tasks, highlighting critical security gaps in current VLA architectures. By unveiling these vulnerabilities and proposing actionable evaluation metrics, we advance both the understanding and enhancement of safety for VLA-based robotic systems, underscoring the necessity for continuously developing robust defense strategies prior to physical-world deployments.
    3. Yang, W., Bai, S., & Zhang, Y. (2024). RADAR: Robotics Assembly by Demonstration via Augmented Reality. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7063–7070.
      @inproceedings{yang2024radar,
        title = {RADAR: Robotics Assembly by Demonstration via Augmented Reality},
        author = {Yang, Wenhao and Bai, Shi and Zhang, Yunbo},
        booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
        pages = {7063--7070},
        year = {2024},
        organization = {IEEE}
      }
      
    4. Yang, W., & Zhang, Y. (2022). Visualization error analysis for augmented reality stereo video see-through head-mounted displays in industry 4.0 applications. International Manufacturing Science and Engineering Conference, 85819, V002T06A016.
      @inproceedings{yang2022visualization,
        title = {Visualization error analysis for augmented reality stereo video see-through head-mounted displays in industry 4.0 applications},
        author = {Yang, Wenhao and Zhang, Yunbo},
        booktitle = {International Manufacturing Science and Engineering Conference},
        volume = {85819},
        pages = {V002T06A016},
        year = {2022},
        organization = {American Society of Mechanical Engineers}
      }
      
    5. Yang, W., Xiao, Q., & Zhang, Y. (2021). An augmented-reality based human-robot interface for robotics programming in the complex environment. International Manufacturing Science and Engineering Conference, 85079, V002T07A003.
      DOI Website
      @inproceedings{yang2021augmented,
        title = {An augmented-reality based human-robot interface for robotics programming in the complex environment},
        author = {Yang, Wenhao and Xiao, Qinqin and Zhang, Yunbo},
        booktitle = {International Manufacturing Science and Engineering Conference},
        volume = {85079},
        pages = {V002T07A003},
        year = {2021},
        organization = {American Society of Mechanical Engineers},
        url = {https://asmedigitalcollection.asme.org/MSEC/proceedings-abstract/MSEC2021/85079/1115433},
        doi = {10.1115/MSEC2021-62468}
      }
      
      TTo solve the problems of complex robot programming tasks, we propose an Augmented Reality (AR) based human-robot interface for planning a collision-free path in a complex environment. Current robot programming methods usually require a high level of experience in robot programming (online programming), the time-consuming 3D modeling of the working environment for collision detection (offline programming), and a tedious and inefficient re-planing to adapt environment or task changes (both online and offline programming). In order to address these problems, an end-to-end AR human-robot interface is proposed, which provides a new affordance to users by enabling them to plan the path in the AR environment. A set of user-interactive tools allow users to define and edit waypoints as the high-level guidance and the direct inputs for the toolpath planning package, Kinematics and Dynamics Library (KDL). With the fast sensing of the workspace and accurate rendering, an in-situ simulation module is utilized for collision check and verification by the users’ perception. Users will repeat the process of 1) waypoints definition and editing, and 2) the collision checking and path feasibility verification, until a satisfactory path is obtained. A preliminary testing is conducted in a use case with complex obstacles to verified the effectiveness and the efficiency of the proposed interface.