Qidong Huang (黄启栋)  

Ph.D student, USTC



Information process center, CAS Key Laboratory of Electro-magnetic Space Information

School of Cyber Science and Technology, University of Science and Technology of China

Email: hqd0037[AT]mail.ustc.edu.cn

Currently on the job market! Seeking faculty/job opportunities (begin in 2025 Summer/Fall)!

[Google Scholar] [GitHub] [CV] [Twitter]


News

  • 10/2024: Check out our PyramidDrop, accelerating your LVLM with over 1.7X training speed and 2.0X inference speed!
  • 10/2024: We introduce MIR&MoCa, a LVLM pre-training indicator and a light-weight modality calibration module!
  • 04/2024: OPERA is selected as Highlight in CVPR 2024!
  • 02/2024: Two papers are accepted by CVPR 2024. See you at Seattle!
  • 02/2024: I have one paper accepted by IEEE TIP 2024.
  • 12/2023: Please check out our new work OPERA for mitigating MLLM's hallucination!
  • 07/2023: I have one paper accepted by ACM MM 2023.
  • 07/2023: I have one paper accepted by ICCV 2023. See you at Paris!
  • 07/2023: I have a new homepage.

About Me

I am currently an Ph.D student at School of Cyber Science and Technology, University of Science and Technology of China , working with Prof. Nenghai Yu and Prof. Weiming Zhang. Prior to that, I received my bachelor degree in School of Information Engineering, University of Science and Technology of China. My current research focuses on large vision-language models.

Biography

  • 2023.08 - Present, Research Intern at Shanghai AI Lab, supervised by Jiaqi Wang and Xiaoyi Dong.
  • 2020.09 - Present, Ph.D. in School of Cyber Science and Technology, University of Science and Technology of China .
  • 2016.09- 2020.06, B.Eng. in School of Information Engineering, University of Science and Technology of China

    Preprints

    Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

    Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

    [arXiv] [Code]

    PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

    Long Xing, Qidong Huang, Xiaoyi Dong, Jiajie Lu, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang, Feng Wu, Dahua Lin

    [arXiv] [Code]

    Publications

    OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

    Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. (Highlight, 2.8% of submissions)

    [arXiv] [Code]

    PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

    Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Kui Zhang, Gang Hua, Nenghai Yu

    IEEE Transactions on Image Processing (TIP), 2024.

    [arXiv] [Code]

    Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

    Qidong Huang, Xiaoyi Dong, Dongdong Chen, Yinpeng Chen, Lu Yuan, Gang Hua, Weiming Zhang, Nenghai Yu

    IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

    [arXiv] [Code]

    Diversity-Aware Meta Visual Prompting

    Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

    [arXiv] [Code]

    Shape-invariant 3D Adversarial Point Clouds

    Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Nenghai Yu

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

    [arXiv] [Code]

    Initiative Defense against Facial Manipulation

    Qidong Huang*, Jie Zhang*, Wenbo Zhou, Weiming Zhang, Nenghai Yu (*Equal contribution)

    AAAI Conference on Artificial Intelligence (AAAI), 2021.

    [arXiv] [Code]

    SimAC: A Simple Anti-Customization Method against Text-to-Image Synthesis of Diffusion Models

    Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang* (Corresponding author)

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    [arXiv]

    Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion

    Kui Zhang, Hang Zhou, Jie Zhang, Qidong Huang, Weiming Zhang, Nenghai Yu

    ACM International Conference on Multimedia (MM), 2023.

    [arXiv]

    Poison Ink: Robust and Invisible Backdoor Attack

    Jie Zhang, Dongdong Chen, Qidong Huang, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

    IEEE Transactions on Image Processing (TIP), 2022.

    [arXiv] [Code]

    Deep Template-based Watermarking

    Han Fang, Dongdong Chen, Qidong Huang, Jie Zhang, Zehua Ma, Weiming Zhang* and Nenghai Yu

    IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2020.

    [Paper]

    Awards

  • 2024 National Scholarship
  • 2021 National Scholarship
  • 2023 Anheng Information Scholarship

  • Services

    Invited Talk:
  • [2024.07] AI SPOT@OpenMMLab. Topic: "Exploring MLLM's Hallucination from A Causal Attention Perspective"

    Invited Reviewer for:
  • TPAMI, TNNLS, TIP, Pattern Recognition (PR)
  • CVPR, ICCV, ECCV, NeurIPS, ICLR