Noah's picture

8 3

Noah

noahml

https://researchpod.netlify.app

AI & ML interests

None yet

Recent Activity

commented on a paper 1 day ago

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

commented on a paper 5 days ago

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

commented on a paper 5 days ago

$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

View all activity

Organizations

None yet

commented a paper 1 day ago

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Paper • 2601.09195 • Published 7 days ago • 12 •

commented 4 papers 5 days ago

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published 7 days ago • 54 •

$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

Paper • 2601.09274 • Published 6 days ago • 80 •

MAXS: Meta-Adaptive Exploration with LLM Agents

Paper • 2601.09259 • Published 6 days ago • 91 •

Controlled Self-Evolution for Algorithmic Code Optimization

Paper • 2601.07348 • Published 8 days ago • 109 •

commented 2 papers 7 days ago

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Paper • 2601.05110 • Published 12 days ago • 27 •

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 9 days ago • 202 •

upvoted a paper 7 days ago

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 9 days ago • 202

upvoted 2 papers 11 days ago

RelayLLM: Efficient Reasoning via Collaborative Decoding

Paper • 2601.05167 • Published 12 days ago • 28

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published 12 days ago • 200

commented a paper 11 days ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published 12 days ago • 200 •