Picture for Ge Zhang

Ge Zhang

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Add code
May 29, 2025
Viaarxiv icon

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

Add code
May 27, 2025
Viaarxiv icon

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Add code
May 22, 2025
Viaarxiv icon

P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

Add code
May 21, 2025
Viaarxiv icon

General-Reasoner: Advancing LLM Reasoning Across All Domains

Add code
May 21, 2025
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Add code
May 20, 2025
Viaarxiv icon

Hybrid-Emba3D: Geometry-Aware and Cross-Path Feature Hybrid Enhanced State Space Model for Point Cloud Classification

Add code
May 16, 2025
Viaarxiv icon

Is Grokking a Computational Glass Relaxation?

Add code
May 16, 2025
Viaarxiv icon

AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection

Add code
May 12, 2025
Viaarxiv icon