Picture for Shan Yang

Shan Yang

UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation

Add code
Jun 04, 2025
Viaarxiv icon

Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning

Add code
May 28, 2025
Viaarxiv icon

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

Add code
May 28, 2025
Viaarxiv icon

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

Add code
May 19, 2025
Viaarxiv icon

Offline Reinforcement Learning for Microgrid Voltage Regulation

Add code
May 15, 2025
Viaarxiv icon

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation

Add code
Apr 15, 2025
Viaarxiv icon

UniSep: Universal Target Audio Separation with Language Models at Scale

Add code
Mar 31, 2025
Viaarxiv icon

Contrast-Free Myocardial Scar Segmentation in Cine MRI using Motion and Texture Fusion

Add code
Jan 09, 2025
Viaarxiv icon

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions

Add code
Jan 08, 2025
Figure 1 for DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
Figure 2 for DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
Figure 3 for DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
Figure 4 for DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
Viaarxiv icon

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Add code
Jan 08, 2025
Figure 1 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 2 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 3 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Figure 4 for FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Viaarxiv icon