llama2-tokenizer / README.md
whereeeee's picture
Initial upload: Llama-2 tokenizer files (mirror for KnowRL)
f7be1db verified
|
raw
history blame
1.12 kB
metadata
license: llama2
language:
  - en
tags:
  - tokenizer
  - llama2
  - infigram

Llama-2 Tokenizer (Mirror for KnowRL Project)

This is a mirror of the tokenizer files from meta-llama/Llama-2-7b-hf, provided as a public, gated-free alternative for users who cannot access the original gated repo.

Why this mirror exists

The KnowRL project's QuCo reward function uses an Infini-gram index built with the Llama-2 tokenizer. To query the index, the exact same tokenizer is required. Since meta-llama/Llama-2-7b-hf is gated, users without approved access cannot run QuCo.

This repo contains only the tokenizer files (no model weights):

  • tokenizer.json — fast tokenizer
  • tokenizer.model — SentencePiece model
  • tokenizer_config.json — tokenizer configuration
  • special_tokens_map.json — special token mappings

Usage

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("UIC-R2-lab/llama2-tokenizer")

License

Follows the original Llama 2 Community License Agreement from Meta.