๐ kf-deberta-gen
Generative Diffusion BERT - ํ๊ตญ์ด Diffusion ๊ธฐ๋ฐ ์์ฑ ์ธ์ด ๋ชจ๋ธ
๋ชจ๋ธ ์ค๋ช
์ด ๋ชจ๋ธ์ kakaobank/kf-deberta-base๋ฅผ ๊ธฐ๋ฐ์ผ๋ก Discrete Diffusion ๋ฐฉ์์ผ๋ก chat fine-tuning์ ์ํํ ์คํ์ (PoC) ์์ฑ ๋ชจ๋ธ์ ๋๋ค.
โ ๏ธ ์ฃผ์: ์ด ๋ชจ๋ธ์ ๊ฐ๋ ๊ฒ์ฆ(Proof of Concept) ๋จ๊ณ์ ๋๋ค. ์์ฑ ํ์ง์ด ๋ถ์์ ํ๋ฉฐ, ๋ฐ๋ณต ์์ฑ ๋ฑ์ ๋ฌธ์ ๊ฐ ์์ ์ ์์ต๋๋ค.
ํต์ฌ ํน์ง
| ํญ๋ชฉ | ๋ด์ฉ |
|---|---|
| ๊ธฐ๋ฐ ๋ชจ๋ธ | kakaobank/kf-deberta-base (DeBERTa-v2) |
| ํ์ต ๋ฐฉ์ | Masked Diffusion Language Model (MDLM) |
| Noise Schedule | Cosine |
| ์์ฑ ๋ฐฉ์ | Iterative Denoising (Confidence-based) |
๊ธฐ์กด MLM๊ณผ์ ์ฐจ์ด์
๊ธฐ์กด MLM: 15% ๊ณ ์ ๋ง์คํน โ ๋น์นธ ์ฑ์ฐ๊ธฐ๋ง ๊ฐ๋ฅ
Diffusion: 0~100% ์ฐ์ ๋ง์คํน โ ์ ์ฒด ์ํ์ค ์์ฑ ๊ฐ๋ฅ
์ฌ์ฉ ๋ฐฉ๋ฒ
๊ธฐ๋ณธ ์ฌ์ฉ
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("solonsophy/kf-deberta-gen")
model = AutoModelForMaskedLM.from_pretrained("solonsophy/kf-deberta-gen")
Diffusion ์์ฑ (Iterative Denoising)
import torch
import torch.nn.functional as F
def generate_diffusion(model, tokenizer, question, num_steps=15, max_answer_len=80):
model.eval()
device = next(model.parameters()).device
MASK_ID = tokenizer.mask_token_id
CLS_ID = tokenizer.cls_token_id
SEP_ID = tokenizer.sep_token_id
# ์ง๋ฌธ ํ ํฐํ
q_tokens = tokenizer.encode(question, add_special_tokens=False)[:100]
# ์ด๊ธฐ: [CLS] Q [SEP] [MASK]*N
input_ids = [CLS_ID] + q_tokens + [SEP_ID] + [MASK_ID] * max_answer_len
input_ids = torch.tensor([input_ids[:256]], device=device)
answer_start = len(q_tokens) + 2
# Iterative denoising
for step in range(num_steps):
with torch.no_grad():
logits = model(input_ids).logits
mask_pos = (input_ids[0, answer_start:] == MASK_ID).nonzero().squeeze(-1) + answer_start
if len(mask_pos) == 0:
break
# Confidence ๊ธฐ๋ฐ unmask
mask_logits = logits[0, mask_pos] / 0.8 # temperature
probs = F.softmax(mask_logits, dim=-1)
tokens = torch.multinomial(probs, 1).squeeze(-1)
conf = probs.gather(1, tokens.unsqueeze(-1)).squeeze(-1)
k = max(1, len(mask_pos) // (num_steps - step))
top_idx = conf.topk(k).indices
input_ids[0, mask_pos[top_idx]] = tokens[top_idx]
# ๊ฒฐ๊ณผ ์ถ์ถ
answer = input_ids[0, answer_start:]
answer = answer[(answer != MASK_ID) & (answer != tokenizer.pad_token_id)]
return tokenizer.decode(answer, skip_special_tokens=True)
# ์ฌ์ฉ ์์
answer = generate_diffusion(model, tokenizer, "์ธ๊ณต์ง๋ฅ์ด๋ ๋ฌด์์ธ๊ฐ์?")
print(answer)
ํ์ต ์ ๋ณด
Chat Fine-tuning ์ค์
| ํ๋ผ๋ฏธํฐ | ๊ฐ |
|---|---|
| Epochs | 3 |
| Batch Size | 64 |
| Learning Rate | 5e-5 |
| Max Length | 256 |
| Q Max Length | 100 |
| A Max Length | 153 |
| Noise Schedule | Cosine |
| Masking Ratio | 0% ~ 100% |
ํ์ต ๋ฐ์ดํฐ
| ๋ฐ์ดํฐ์ | ๋ผ์ด์ ์ค |
|---|---|
| OpenLab-NLP/tiny-singleturn-chat-ko | MIT |
| davidkim205/kollm-converations | Apache-2.0 |
| heegyu/hh-rlhf-ko | MIT |
| nlpai-lab/kullm-v2 | Apache-2.0 |
| heegyu/OIG-small-chip2-ko | Apache-2.0 |
| AIdenU/orca_dpo_data_ko | Apache-2.0 |
๐ Acknowledgments
๋ณธ ๋ชจ๋ธ์ DDOK.AI๋ก๋ถํฐ ์ ๊ณต๋ฐ์ ์ปดํจํ ๋ฆฌ์์ค๋ฅผ ํ์ฉํ์ฌ ํ์ต๋์์ต๋๋ค.
๋ผ์ด์ ์ค
์ด ๋ชจ๋ธ์ Apache-2.0 ๋ผ์ด์ ์ค๋ก ๋ฐฐํฌ๋ฉ๋๋ค.
๊ธฐ๋ฐ ๋ชจ๋ธ (kakaobank/kf-deberta-base): MIT
Citation
@misc{kf-deberta-gen,
author = {Hong Seongmin},
title = {Generative Diffusion BERT: Korean Discrete Diffusion Language Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/solonsophy/kf-deberta-gen}
}
- Downloads last month
- 3
Model tree for solonsophy/kf-deberta-gen
Base model
kakaobank/kf-deberta-base