maurodore's picture
Publish code_detection model and artifacts
c17aad4 verified
metadata
language:
  - code
library_name: optimum
pipeline_tag: text-classification
tags:
  - code-detection
  - safety
  - onnx
  - hikmaai
license: apache-2.0

hikmaai-codebert-base-code-detection

A binary classifier that detects whether the input contains source code, fine-tuned from microsoft/codebert-base by HikmaAI.

Model Description

  • Task: Binary classification (safe=0, threat=1, where "threat" = code detected)
  • Base model: microsoft/codebert-base
  • Export formats: ONNX FP32 + INT8 dynamic quantization

Performance

See model_card.json for detailed metrics.

Optimized threshold: 0.9950 (val recall: 0.9984)

Usage (ONNX)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

model = ORTModelForSequenceClassification.from_pretrained(
    "HikmaAI/hikmaai-codebert-base-code-detection",
    subfolder="onnx/int8",
)
tokenizer = AutoTokenizer.from_pretrained(
    "HikmaAI/hikmaai-codebert-base-code-detection",
    subfolder="tokenizer",
)

inputs = tokenizer("def hello():\n    print('hi')", return_tensors="pt")
outputs = model(**inputs)
# outputs.logits -> [safe_score, threat_score]

Training

  • Epochs: 5
  • Learning rate: 2e-05
  • Batch size: 16
  • Class weights: [1.0, 2.0]

License

Apache-2.0

Citation

@misc{hikmaai-code_detection-2026,
  title={hikmaai-codebert-base-code-detection},
  author={HikmaAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection}
}