Papers
arxiv:2602.05000

EntRGi: Entropy Aware Reward Guidance for Diffusion Language Models

Published on Feb 4
· Submitted by
Atula Tejaswi
on Feb 5
Authors:
,
,
,
,

Abstract

Discrete diffusion language models use entropy-aware reward guidance to improve test-time adaptation by dynamically regulating gradient feedback from reward models.

AI-generated summary

Reward guidance has been applied to great success in the test-time adaptation of continuous diffusion models; it updates each denoising step using the gradients from a downstream reward model. We study reward guidance for discrete diffusion language models, where one cannot differentiate through the natural outputs of the model because they are discrete tokens. Existing approaches either replace these discrete tokens with continuous relaxations, or employ techniques like the straight-through estimator. In this work, we show the downsides of both these methods. The former degrades gradient feedback because the reward model has never been trained with continuous inputs. The latter involves incorrect optimization because the gradient evaluated at discrete tokens is used to update continuous logits. Our key innovation is to go beyond this tradeoff by introducing a novel mechanism called EntRGi: Entropy aware Reward Guidance that dynamically regulates the gradients from the reward model. By modulating the continuous relaxation using the model's confidence, our approach substantially improves reward guidance while providing reliable inputs to the reward model. We empirically validate our approach on a 7B-parameter diffusion language model across 3 diverse reward models and 3 multi-skill benchmarks, showing consistent improvements over state-of-the-art methods.

Community

Paper submitter

We introduce a novel mechanism that dynamically regulates the gradients from a reward model for guiding discrete diffusion language models.
base_vs_entrgi_explain_why_the_sky_is_blue_to_a_5_year

Screenshot 2026-02-06 at 7.51.21 AM

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.05000 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.05000 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.05000 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.