Hello, I am Xinxi Zhang, a PhD student in the Department of Computer Science at Rutgers University, advised by Prof. Vladimir Pavlovic and also fortunate to work with Prof. Dimitris Metaxas . Echoing Richard Feynman’s quote that “what I cannot create, I do not understand,” my research focuses on generative modeling, with a current emphasis on advancing one-step diffusion/flow-based models. More recently, I have also been extending these concepts to the language domain.

T3D

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Tunyu Zhang*, Xinxi Zhang*, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas

T3D (Trajectory Self-Distillation via DDO) is a self-distillation framework for diffusion large language models (DLLMs) that trains a few-step student by matching the teacher’s generation trajectories. We show theoretically that trajectory-level distillation enables few-step decoding by reducing factorization error, and empirically that T3D consistently outperforms existing few-step decoding baselines.

Paper (Preprint) Code Slides DIFFUSION LANGUAGE MODEL FEW-STEP GENERATION
Re-Meanflow

Flow Straighter and Faster: Efficient One-Step Generative Modeling via Meanflow on Rectified Trajectories

Xinxi Zhang*, Shiwei Tan*, Quang Nguyen, Quan Dao, Ligong Han, Xiaoxiao He, Tunyu Zhang*, Alen Mrdovic, Dimitris Metaxas

Re-MeanFlow enables efficient one-step generative modeling by learning mean velocities along rectified trajectories. By organically combining MeanFlow with trajectory rectification, it yields complementary strengths that neither component achieves alone. We demonstrate its generality and effectiveness on ImageNet under various settings, where Re-MeanFlow consistently outperforms previous one-step flow-based methods.

Paper (Preprint) Code Slides ONE-STEP GENERATION EFFICIENT TRAINING
SODA

SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models

Xinxi Zhang*, Song Wen*, Ligong Han*, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Vladimir Pavlovic, Dimitris Metaxas

SODA is a spectrum-aware, parameter-efficient fine-tuning framework for diffusion models. We demonstrate its effectiveness on the task of personalizing text-to-image diffusion models: given only a few images of an object, SODA can generate novel scenes of the object controlled by text prompts, using only a lightweight fine-tuning stage.

PAPER (WACV 2025) DIFFUSION PERSONALIZATION PARAMETER-EFFICIENT FINE-TUNING