3 5

sungyub kim

sungyub

AI & ML interests

None yet

Recent Activity

liked a Space 28 days ago

HuggingFaceFW/finephrase

upvoted an article 2 months ago

We Got Claude to Build CUDA Kernels and teach open models!

upvoted an article 2 months ago

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

View all activity

Organizations

None yet

liked a Space 28 days ago

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

📝

215

Explore synthetic data experiments as an interactive bookshelf

upvoted 2 articles 2 months ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

Jan 28

•

152

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Jan 27

•

updated a collection 3 months ago

VERL QA Datasets

Collection

High-quality QA generation datasets in VERL format: document QA, table reasoning, and multi-hop reasoning tasks. • 6 items • Updated Mar 2

updated a dataset 3 months ago

sungyub/qa-verl-unified

Viewer • Updated Jan 8 • 86.4k • 37

published a dataset 3 months ago

sungyub/qa-verl-unified

Viewer • Updated Jan 8 • 86.4k • 37

updated 2 datasets 3 months ago

sungyub/docqa-rl-verl

Viewer • Updated Jan 8 • 3.6k • 52

sungyub/code-verl-unified

Viewer • Updated Jan 8 • 959k • 2.41k • 1

liked 2 Spaces 4 months ago

FineWeb: decanting the web for the finest text data at scale

🍷

1.32k

Read a detailed overview of the FineWeb web‑scale text dataset

Evaluation Guidebook

📝

299

Explore LLM benchmark trends over time

updated a dataset 5 months ago

sungyub/codev-r1-verl

Viewer • Updated Nov 11, 2025 • 3.13k • 29

upvoted an article 5 months ago

Article

Let's talk about LLM evaluation

May 23, 2024

•

207

liked 2 Spaces 5 months ago

The Ultra-Scale Playbook

🌌

3.76k

The ultimate guide to training LLM on large GPU Clusters

The Smol Training Playbook

📚

3.08k

The secrets to building world-class LLMs

updated 6 datasets 5 months ago

sungyub kim

AI & ML interests

Recent Activity

Organizations

sungyub's activity

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

We Got Claude to Build CUDA Kernels and teach open models!

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

FineWeb: decanting the web for the finest text data at scale

Evaluation Guidebook

Let's talk about LLM evaluation

The Ultra-Scale Playbook

The Smol Training Playbook