publications | Can Qin

2025

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, and others

arXiv preprint arXiv:2505.09568, 2025

arXiv Code
Plug-and-Play 1. x-Bit KV Cache Quantization for Video Large Language Models

Keda Tao, Haoxuan You, Yang Sui, Can Qin, and Huan Wang

arXiv preprint arXiv:2503.16257, 2025

arXiv Code
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Kung-Hsiang Huang, Can Qin, Haoyi Qiu, Philippe Laban, Shafiq Joty, Caiming Xiong, and Chien-Sheng Wu

Annual Meeting of the Association for Computational Linguistics (Findings), 2025

arXiv Code
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Keda Tao, Can Qin, Haoxuan You, Yang Sui, and Huan Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

arXiv Code

2024

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Michael S Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, Ran Xu, Caiming Xiong, and Juan Carlos Niebles

arXiv preprint arXiv:2410.16267, 2024

arXiv Website
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, and others

arXiv preprint arXiv:2408.12590, 2024

arXiv Code
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, and others

arXiv preprint arXiv:2408.08872, 2024

arXiv Code
Self-Training Large Language and Vision Assistant for Medical

Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, and Zhiqiang Tao

In The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

arXiv Code
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, Ran Xu, and Zhiqiang Tao

European Conference on Computer Vision (ECCV), 2024

arXiv Code
HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang*, Xinyi Yang*, Yihao Feng*, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, and Ran Xu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv Code

2023

arXiv

Why is the state of neural network pruning so confusing? on the fairness, comparison setup, and trainability in network pruning

Huan Wang, Can Qin, Yue Bai, and Yun Fu

arXiv:2301.05219, 2023

arXiv Code
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, and Ran Xu

Advances in Neural Information Processing Systems (NeurIPS), 2023

arXiv Code Website
ICDM

Rethinking Adam: A twofold exponential moving average approach

Yizhou Wang, Yue Kang, Can Qin, Huan Wang, Yi Xu, Yulun Zhang, and Yun Fu

IEEE International Conference on Data Mining (ICDM), 2023

arXiv Code
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, and Ran Xu

International Conference on Computer Vision (ICCV), 2023

arXiv Code Website
TIP

Balancing biases and preserving privacy on balanced faces in the wild

Joseph P Robinson, Can Qin, Yann Henon, Samson Timoner, and Yun Fu

IEEE Transactions on Image Processing, 2023

arXiv Code
TPAMI

Global Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Huan Wang, Yulun Zhang, Can Qin, Luc Van Gool, and Yun Fu

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

PDF Code
CVPR

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M Patel, and Ran Xu

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

PDF Code
ICLR Oral

Image as Set of Points

Xu Ma, Yuqian Zhou, Huan Wang, Can Qin, Bin Sun, Chang Liu, and Yun Fu

In The Eleventh International Conference on Learning Representations (ICLR), 2023

arXiv Code

2022

arXiv

A Close Look at Spatial Modeling: From Attention to Convolution

Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao, Jie Fu, and Yun Fu

arXiv preprint arXiv:2212.12552, 2022

arXiv Code
ICDM

Making Reconstruction-based Method Great Again for Video Anomaly Detection

Yizhou Wang, Can Qin, Yue Bai, Yi Xu, Xu Ma, and Yun Fu

In 2022 IEEE International Conference on Data Mining (ICDM), 2022

arXiv Code
TIP

Semi-Supervised Domain Adaptive Structure Learning

Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang, and Yun Fu

IEEE Transactions on Image Processing, 2022

arXiv Code
KDD

External Knowledge Infusion for Tabular Pre-training Models with Dual-adapters

Can Qin, Sungchul Kim, Handong Zhao, Tong Yu, Ryan A Rossi, and Yun Fu

In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2022

PDF Code
ICLR

Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework

Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu

In International Conference on Learning Representations (ICLR), 2022

Code
ICLR

Learning Efficient Image Super-Resolution Networks via Structure-Regularized Pruning

Yulun Zhang*, Huan Wang*, Can Qin, and Yun Fu

In International Conference on Learning Representations (ICLR), 2022

PDF Code
Nature Comm

Self-directed online machine learning for topology optimization

Changyu Deng, Yizhou Wang, Can Qin, Yun Fu, and Wei Lu

Nature communications, 2022

Code

2021

FG

The 5th recognizing families in the wild data challenge: Predicting kinship from faces

Joseph P Robinson, Can Qin, Ming Shao, Matthew A Turk, Rama Chellappa, and Yun Fu

In IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021

arXiv
NeurIPS

Slow learning and fast inference: Efficient graph similarity computation via knowledge distillation

Can Qin, Handong Zhao, Lichen Wang, Huan Wang, Yulun Zhang, and Yun Fu

Advances in Neural Information Processing Systems (NeurIPS), 2021

PDF Code
NeurIPS Spotlight

Aligned structured sparsity learning for efficient image super-resolution

Yulun Zhang*, Huan Wang*, Can Qin, and Yun Fu

Advances in Neural Information Processing Systems (NeurIPS), 2021

PDF Code
ICCV

Context reasoning attention network for image super-resolution

Yulun Zhang, Donglai Wei, Can Qin, Huan Wang, Hanspeter Pfister, and Yun Fu

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

PDF
SDM

Contradictory structure learning for semi-supervised domain adaptation

Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang, and Yun Fu

In SIAM International Conference on Data Mining (SDM), 2021

arXiv Code
ICLR

Neural pruning via growing regularization

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu

International Conference on Learning Representations (ICLR), 2021

arXiv Code

2020

CVPRW

Face recognition: too bias, or not too bias?

Joseph P Robinson, Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020

PDF Code
ECCV

Generative view-correlation adaptation for semi-supervised multi-view learning

Yunyu Liu, Lichen Wang, Yue Bai, Can Qin, Zhengming Ding, and Yun Fu

In European Conference on Computer Vision (ECCV), 2020

PDF Code
AAAI

Dual relation semi-supervised multi-label learning

Lichen Wang, Yunyu Liu, Can Qin, Gan Sun, and Yun Fu

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020

PDF
Remote Sensing

Semi-supervised hyperspectral image classification via spatial-regulated self-training

Yue Wu, Guifeng Mu, Can Qin, Qiguang Miao, Wenping Ma, and Xiangrong Zhang

Remote Sensing, 2020

PDF

2019

NeurIPS

Pointdan: A multi-scale 3d domain adaption network for point cloud representation

Can Qin*, Haoxuan You*, Lichen Wang, C-C Jay Kuo, and Yun Fu

Advances in Neural Information Processing Systems (NeurIPS), 2019

PDF Code
ICCVW

Generatively inferential co-training for unsupervised domain adaptation

Can Qin, Lichen Wang, Yulun Zhang, and Yun Fu

In IEEE/CVF International Conference on Computer Vision Workshops, 2019

PDF Code