Satvik Dixit

My research focuses on building efficient audio-centric models and evaluation methods for evolving real-world applications and workflows. I've worked on:

Efficient audio-language models (Mellow)
Evaluation metrics that align with human judgments (AURA Score, MACE)
Benchmarks for audio generation and understanding (FoleyBench, MMAU-Pro)

Currently, I am a founding engineer at a YC-backed startup working on voice agent evals. Previously, I was a masters student at CMU advised by Professors Chris Donahue and Bhiksha Raj, where I worked on models and evaluation methods for audio-based systems. I have also done research internships at MIT and EPFL. Before that, I did my undergrad in electrical engineering at IIT Delhi.

News

Nov 2025: FoleyBench accepted to ICASSP 2026
Nov 2025: MMAU-Pro accepted to AAAI 2026
Sept 2025: Mellow accepted to NeurIPS 2025
July 2025: Work on Morphing accepted to WASPAA 2025
Jan 2025: MACE accepted to ICASSP SALMA 2025

Selected Publications and Preprints

Mellow Preview

Mellow: a small audio language model for reasoning

Soham Deshmukh, Satvik Dixit, Rita Singh, Bhiksha Raj

NeurIPS 2025

Paper Code

FoleyBench Preview

FoleyBench: A Benchmark For Video-to-Audio Models

Satvik Dixit, Koichi Saito, Zhi Zhong, Yuki Mitsufuji, Chris Donahue

Submitted to ICASSP 2026

Paper Page

AURA Score Preview

AURA Score: A Metric For Holistic Audio Question Answering Evaluation

Satvik Dixit, Soham Deshmukh, Bhiksha Raj

Submitted to ICASSP 2026

Paper Code

Vision Language Models Preview

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

Satvik Dixit, Laurie Heller, Chris Donahue

NeurIPS 2024 Audio Imagination Workshop

Paper

Contact

Email Google Scholar Semantic Scholar LinkedIn