Instructions to use microsoft/phi-1_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-1_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-1_5")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/phi-1_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-1_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-1_5
- SGLang
How to use microsoft/phi-1_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-1_5 with Docker Model Runner:
docker model run hf.co/microsoft/phi-1_5
Failure to reproduce QA Format response from the README
The current README (https://huggingface.co/microsoft/phi-1_5/blob/914c8fb3c681ebe3cacbe3c748858a572283ddde/README.md) poses the QA format.
Trying to reproduce the response, I get nowhere close to what the README says (see output below). What am I missing?
# With transformers==4.36.2 and tokenizers==0.15.0
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_NAME = "microsoft/phi-1_5"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
generation = model.generate(
**tokenizer(
"Write a detailed analogy between mathematics and a lighthouse.\n\nAnswer:",
return_tensors="pt",
),
max_length=30,
do_sample=True,
)
print(tokenizer.batch_decode(generation, skip_special_tokens=True))
Running this prints:
['Write a detailed analogy between mathematics and a lighthouse.\n\nAnswer:\n\n\n\n\n\n\n\n\n\n']
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
At this point transformers 4.36.2 should print tons of warnings about keys mismatch: you are using built-in version of phi from 4.36.2 which is not compatible with weights in this phi-1.5 repo.
Either force transformers to load the code from this repo, or use repo with compatible version (see https://github.com/huggingface/transformers/issues/28416 for example) or do torch.load to load weights manually and poke them with a pointy stick until they become suitable to be loaded with load_state_dict
Hello @jamesbraza !
We just pushed a fix to the config.json and it should work now. However, as per the remark on the model card:
If you are using transformers<4.37.0, always load the model with trust_remote_code=True to prevent side-effects.
Best regards,
Gustavo.