Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Add code
Jun 05, 2025
Viaarxiv icon

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

Add code
May 26, 2025
Viaarxiv icon

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?

Add code
May 16, 2025
Viaarxiv icon

Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving

Add code
May 13, 2025
Viaarxiv icon

Tetrahedron-Net for Medical Image Registration

Add code
May 07, 2025
Viaarxiv icon

Visual Text Processing: A Comprehensive Review and Unified Evaluation

Add code
Apr 30, 2025
Viaarxiv icon

SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting

Add code
Apr 14, 2025
Viaarxiv icon

A Unified Image-Dense Annotation Generation Model for Underwater Scenes

Add code
Mar 27, 2025
Viaarxiv icon

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

Add code
Mar 25, 2025
Viaarxiv icon

Generative Compositor for Few-Shot Visual Information Extraction

Add code
Mar 21, 2025
Viaarxiv icon