Can Qin

Email: cqin[at]salesforce.com or qin.ca[at]northeastern.edu

Hello and welcome! I’m currently embracing the exciting world of artificial intelligence as a Research Scientist at Salesforce AI Research. My journey is driven by a deep passion for Generative AI and Multi-modal Learning, with a focus on developing Video/Image to Text (Understanding) and Text to Video/Image (Generation) techniques.

In 2023, I earned my Ph.D. from Northeastern University in Boston, USA. My research during this period was primarily centered around the realms of Transfer Learning and Efficient AI, where I delved into complex problems and innovative solutions.

Before my Ph.D. journey, I obtained my B.E. degree from Xidian University in Xi’an, China, in 2018. This foundation laid the groundwork for my ongoing pursuit of knowledge and innovation.

news

Sep, 2024	Our Medical MLLM paper was accepcted by EMNLP 24 (Main)!
Aug, 2024	The xGen-MM (BLIP3) and xGen-VideoSyn-1 were released to the public! We have a paper accepcted by TKDE and congrats to Yizhou! I have been invited as the reviewer of Nature Communications.
Jul, 2024	We have one paper accepcted by ECCV 24!
Feb, 2024	We have one paper accepcted by CVPR 24!
Nov, 2023	Begin my journey at Salesforce Research in Palo Alto!
Jun, 2023	I have passed the PhD Dissertation Defense and become Dr. Qin!

selected publications

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, and others

arXiv preprint arXiv:2408.12590, 2024

arXiv Code
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, and others

arXiv preprint arXiv:2408.08872, 2024

arXiv Code
Self-Training Large Language and Vision Assistant for Medical

Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, and Zhiqiang Tao

In The 2024 Conference on Empirical Methods in Natural Language Processing (to appear), 2024

arXiv
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, Ran Xu, and Zhiqiang Tao

European Conference on Computer Vision, 2024

arXiv Code
HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang*, Xinyi Yang*, Yihao Feng*, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, and Ran Xu

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

arXiv Code
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, and Ran Xu

Advances in Neural Information Processing Systems, 2023

arXiv Code Website
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, and Ran Xu

International Conference on Computer Vision, 2023

arXiv Code Website