๐ŸŒ€ kf-deberta-gen

Generative Diffusion BERT - ํ•œ๊ตญ์–ด Diffusion ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ์–ธ์–ด ๋ชจ๋ธ

GitHub Space


๋ชจ๋ธ ์„ค๋ช…

์ด ๋ชจ๋ธ์€ kakaobank/kf-deberta-base๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Discrete Diffusion ๋ฐฉ์‹์œผ๋กœ chat fine-tuning์„ ์ˆ˜ํ–‰ํ•œ ์‹คํ—˜์ (PoC) ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

โš ๏ธ ์ฃผ์˜: ์ด ๋ชจ๋ธ์€ ๊ฐœ๋… ๊ฒ€์ฆ(Proof of Concept) ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. ์ƒ์„ฑ ํ’ˆ์งˆ์ด ๋ถˆ์•ˆ์ •ํ•˜๋ฉฐ, ๋ฐ˜๋ณต ์ƒ์„ฑ ๋“ฑ์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํŠน์ง•

ํ•ญ๋ชฉ ๋‚ด์šฉ
๊ธฐ๋ฐ˜ ๋ชจ๋ธ kakaobank/kf-deberta-base (DeBERTa-v2)
ํ•™์Šต ๋ฐฉ์‹ Masked Diffusion Language Model (MDLM)
Noise Schedule Cosine
์ƒ์„ฑ ๋ฐฉ์‹ Iterative Denoising (Confidence-based)

๊ธฐ์กด MLM๊ณผ์˜ ์ฐจ์ด์ 

๊ธฐ์กด MLM: 15% ๊ณ ์ • ๋งˆ์Šคํ‚น โ†’ ๋นˆ์นธ ์ฑ„์šฐ๊ธฐ๋งŒ ๊ฐ€๋Šฅ
Diffusion: 0~100% ์—ฐ์† ๋งˆ์Šคํ‚น โ†’ ์ „์ฒด ์‹œํ€€์Šค ์ƒ์„ฑ ๊ฐ€๋Šฅ

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

๊ธฐ๋ณธ ์‚ฌ์šฉ

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("solonsophy/kf-deberta-gen")
model = AutoModelForMaskedLM.from_pretrained("solonsophy/kf-deberta-gen")

Diffusion ์ƒ์„ฑ (Iterative Denoising)

import torch
import torch.nn.functional as F

def generate_diffusion(model, tokenizer, question, num_steps=15, max_answer_len=80):
    model.eval()
    device = next(model.parameters()).device
    
    MASK_ID = tokenizer.mask_token_id
    CLS_ID = tokenizer.cls_token_id
    SEP_ID = tokenizer.sep_token_id
    
    # ์งˆ๋ฌธ ํ† ํฐํ™”
    q_tokens = tokenizer.encode(question, add_special_tokens=False)[:100]
    
    # ์ดˆ๊ธฐ: [CLS] Q [SEP] [MASK]*N
    input_ids = [CLS_ID] + q_tokens + [SEP_ID] + [MASK_ID] * max_answer_len
    input_ids = torch.tensor([input_ids[:256]], device=device)
    answer_start = len(q_tokens) + 2
    
    # Iterative denoising
    for step in range(num_steps):
        with torch.no_grad():
            logits = model(input_ids).logits
        
        mask_pos = (input_ids[0, answer_start:] == MASK_ID).nonzero().squeeze(-1) + answer_start
        if len(mask_pos) == 0:
            break
        
        # Confidence ๊ธฐ๋ฐ˜ unmask
        mask_logits = logits[0, mask_pos] / 0.8  # temperature
        probs = F.softmax(mask_logits, dim=-1)
        tokens = torch.multinomial(probs, 1).squeeze(-1)
        conf = probs.gather(1, tokens.unsqueeze(-1)).squeeze(-1)
        
        k = max(1, len(mask_pos) // (num_steps - step))
        top_idx = conf.topk(k).indices
        input_ids[0, mask_pos[top_idx]] = tokens[top_idx]
    
    # ๊ฒฐ๊ณผ ์ถ”์ถœ
    answer = input_ids[0, answer_start:]
    answer = answer[(answer != MASK_ID) & (answer != tokenizer.pad_token_id)]
    return tokenizer.decode(answer, skip_special_tokens=True)

# ์‚ฌ์šฉ ์˜ˆ์‹œ
answer = generate_diffusion(model, tokenizer, "์ธ๊ณต์ง€๋Šฅ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€์š”?")
print(answer)

ํ•™์Šต ์ •๋ณด

Chat Fine-tuning ์„ค์ •

ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’
Epochs 3
Batch Size 64
Learning Rate 5e-5
Max Length 256
Q Max Length 100
A Max Length 153
Noise Schedule Cosine
Masking Ratio 0% ~ 100%

ํ•™์Šต ๋ฐ์ดํ„ฐ

๋ฐ์ดํ„ฐ์…‹ ๋ผ์ด์„ ์Šค
OpenLab-NLP/tiny-singleturn-chat-ko MIT
davidkim205/kollm-converations Apache-2.0
heegyu/hh-rlhf-ko MIT
nlpai-lab/kullm-v2 Apache-2.0
heegyu/OIG-small-chip2-ko Apache-2.0
AIdenU/orca_dpo_data_ko Apache-2.0

๐Ÿ™ Acknowledgments

๋ณธ ๋ชจ๋ธ์€ DDOK.AI๋กœ๋ถ€ํ„ฐ ์ œ๊ณต๋ฐ›์€ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


๋ผ์ด์„ ์Šค

์ด ๋ชจ๋ธ์€ Apache-2.0 ๋ผ์ด์„ ์Šค๋กœ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ฐ˜ ๋ชจ๋ธ (kakaobank/kf-deberta-base): MIT


Citation

@misc{kf-deberta-gen,
  author = {Hong Seongmin},
  title = {Generative Diffusion BERT: Korean Discrete Diffusion Language Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/solonsophy/kf-deberta-gen}
}
Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for solonsophy/kf-deberta-gen

Finetuned
(2)
this model

Space using solonsophy/kf-deberta-gen 1