Hello, I am Xinxi Zhang, a PhD student in the Department of Computer Science at Rutgers University, advised by Prof. Vladimir Pavlovic and also fortunate to work with Prof. Dimitris Metaxas . Echoing Richard Feynman’s quote that “what I cannot create, I do not understand,” my research focuses on generative modeling, with a current emphasis on advancing one-step diffusion/flow-based models. More recently, I have also been extending these concepts to the language domain.

T3D

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Tunyu Zhang*, Xinxi Zhang*, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas

T3D (Trajectory Self-Distillation via DDO) is a self-distillation framework for diffusion large language models (DLLMs) that trains a few-step student by matching the teacher’s generation trajectories. We show theoretically that trajectory-level distillation enables few-step decoding by reducing factorization error, and empirically that T3D consistently outperforms existing few-step decoding baselines.

Arxiv Code Slides DIFFUSION LANGUAGE MODEL FEW-STEP GENERATION
Re-Meanflow

Overcoming the Curvature Bottleneck in MeanFlow

Xinxi Zhang*, Shiwei Tan*, Quang Nguyen, Quan Dao, Ligong Han, Xiaoxiao He, Tunyu Zhang*, Chengzhi Mao, Dimitris Metaxas, Vladimir Pavlovic

Re-MeanFlow is a lightweight, data-free self-distillation framework for one-step flow generation that learns the mean-velocity field on rectified couplings from a pretrained model. By straightening generative trajectories, Re-MeanFlow smooths the rugged loss landscape of MeanFlow and delivers markedly stronger sample quality with much higher training efficiency.

Arxiv Project Page Code Slides ONE-STEP GENERATION EFFICIENT TRAINING
SODA

SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models

Xinxi Zhang*, Song Wen*, Ligong Han*, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Vladimir Pavlovic, Dimitris Metaxas

SODA is a spectrum-aware, parameter-efficient fine-tuning framework for diffusion models. We demonstrate its effectiveness on the task of personalizing text-to-image diffusion models: given only a few images of an object, SODA can generate novel scenes of the object controlled by text prompts, using only a lightweight fine-tuning stage.

PAPER (WACV 2025) DIFFUSION PERSONALIZATION PARAMETER-EFFICIENT FINE-TUNING