New! Aligning Large Multi-Modal Model with Robust Instruction Tuning
Fuxiao Liu*, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang
Descriptions: Despite the promising progress in multi-modal tasks, current large multi-modal models (LMM) are prone to hallucinate inconsistent
descriptions with respect to the associated image and human instructions. We addresses this issue by introducing the first large and diverse visual instruction
tuning dataset, named Large-scale Robust Visual (LRV)-Instruction.
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu*, Hao Tan, Chris Tensmeyer
Descriptions: In this project, we apply the contrastive learning algorithm to determine the document-internal connections between specific figures and body Text. Our model can be applied to Adobe Liquid mode to improve the reading experience on the smartphone.
COVID-VTS: Fact Extraction and Verification on Short Video Platforms
Fuxiao Liu*, Yaser Yacoob, Abhinav Shrivastava
Descriptions: We introduce COVID-VTS, a fact-checking dataset for short video platforms.
We propose an effective approach to automatically generate large-scale verifiable, trustworthy as well as misleading claims rather than employing human annotators.
We propose TwtrDetective, a new explainable fact-checking framework for the short video platform.
Visual News: Benchmark and Challenges in News Image Captioning
Fuxiao Liu*, Yinghan Wang, Tianlu Wang, Vicente Ordonez
Descriptions: Introduced VisualNews, the largest and most diverse news image captioning dataset.
Proposed VisualNews-Captioner, increasing CIDEr by 10+ points with fewer parameters than competing methods.