Fuxiao Liu

Fuxiao Liu
Hi! I'm a 3rd-year CS Ph.D at University of Maryland, College Park, working with Abhinav Shrivastava and Yaser Yacoob.
I have broad interests in vision and language tasks, including image/video captioning, multimodal semantic alignment, fact-checking, document understanding. My recent focus is on building customizable large models that follow humans' intent.
Google Scholar/Semantic Scholar/ LinkedIn/Github
  1. New! Aligning Large Multi-Modal Model with Robust Instruction Tuning
    Fuxiao Liu*, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang
    Under Review [paper] [code]
    Descriptions: Despite the promising progress in multi-modal tasks, current large multi-modal models (LMM) are prone to hallucinate inconsistent descriptions with respect to the associated image and human instructions. We addresses this issue by introducing the first large and diverse visual instruction tuning dataset, named Large-scale Robust Visual (LRV)-Instruction.

  2. DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
    Fuxiao Liu*, Hao Tan, Chris Tensmeyer
    Under Review [paper] [code]
    Descriptions: In this project, we apply the contrastive learning algorithm to determine the document-internal connections between specific figures and body Text. Our model can be applied to Adobe Liquid mode to improve the reading experience on the smartphone.

  3. COVID-VTS: Fact Extraction and Verification on Short Video Platforms
    Fuxiao Liu*, Yaser Yacoob, Abhinav Shrivastava
    EACL 2023 (~Oral presentation) [paper] [code] [bibtex]
    Descriptions: We introduce COVID-VTS, a fact-checking dataset for short video platforms. We propose an effective approach to automatically generate large-scale verifiable, trustworthy as well as misleading claims rather than employing human annotators. We propose TwtrDetective, a new explainable fact-checking framework for the short video platform.

  4. Visual News: Benchmark and Challenges in News Image Captioning
    Fuxiao Liu*, Yinghan Wang, Tianlu Wang, Vicente Ordonez
    EMNLP 2021 (~Oral presentation) [paper] [code] [bibtex]
    Descriptions: Introduced VisualNews, the largest and most diverse news image captioning dataset. Proposed VisualNews-Captioner, increasing CIDEr by 10+ points with fewer parameters than competing methods.

More About Myself
    I'm crazy about basketball since I was a little boy. I love it for its ultimate technical and mentality requirements. No one in the world can become a master without great talent and extensive training. My favorite basketball player is Kobe Bryant, who is noted for his rapid playing style, strong will, and his ambivalent relationship with the sport. I am always immersed in his phenomenal performance in the game.