Instructions to use defog/sqlcoder-7b-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use defog/sqlcoder-7b-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="defog/sqlcoder-7b-2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("defog/sqlcoder-7b-2")
model = AutoModelForCausalLM.from_pretrained("defog/sqlcoder-7b-2")

llama-cpp-python

How to use defog/sqlcoder-7b-2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="defog/sqlcoder-7b-2",
	filename="sqlcoder-7b-q5_k_m.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use defog/sqlcoder-7b-2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf defog/sqlcoder-7b-2:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf defog/sqlcoder-7b-2:Q5_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf defog/sqlcoder-7b-2:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf defog/sqlcoder-7b-2:Q5_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf defog/sqlcoder-7b-2:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf defog/sqlcoder-7b-2:Q5_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf defog/sqlcoder-7b-2:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf defog/sqlcoder-7b-2:Q5_K_M

Use Docker

docker model run hf.co/defog/sqlcoder-7b-2:Q5_K_M

LM Studio
Jan

vLLM

How to use defog/sqlcoder-7b-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "defog/sqlcoder-7b-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "defog/sqlcoder-7b-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/defog/sqlcoder-7b-2:Q5_K_M

SGLang

How to use defog/sqlcoder-7b-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "defog/sqlcoder-7b-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "defog/sqlcoder-7b-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "defog/sqlcoder-7b-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "defog/sqlcoder-7b-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use defog/sqlcoder-7b-2 with Ollama:
```
ollama run hf.co/defog/sqlcoder-7b-2:Q5_K_M
```

Unsloth Studio new

How to use defog/sqlcoder-7b-2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for defog/sqlcoder-7b-2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for defog/sqlcoder-7b-2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for defog/sqlcoder-7b-2 to start chatting

Docker Model Runner
How to use defog/sqlcoder-7b-2 with Docker Model Runner:
```
docker model run hf.co/defog/sqlcoder-7b-2:Q5_K_M
```

Lemonade

How to use defog/sqlcoder-7b-2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull defog/sqlcoder-7b-2:Q5_K_M

Run and chat with the model

lemonade run user.sqlcoder-7b-2-Q5_K_M

List all available models

lemonade list

Amazon Sagemaker endpoint implementation doesn't work accurately.

by singhmanas1 - opened Feb 18, 2024

Discussion

singhmanas1

Feb 18, 2024

Hi Team, I am trying to use your model within a sagemaker endpoint ( one 'ml.g5.2xlarge' instance ) as per the script - https://huggingface.co/defog/sqlcoder-7b-2?sagemaker_deploy=true. On using the prompt-

Generate a SQL query to answer [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]

Instructions
If you cannot answer the question with the available database schema, return 'I do not know'
Database Schema
The query will run on a database with the following schema:
CREATE TABLE products (
product_id INTEGER PRIMARY KEY, -- Unique ID for each product
name VARCHAR(50), -- Name of the product
price DECIMAL(10,2), -- Price of each unit of the product
quantity INTEGER -- Current quantity in stock
);

CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY, -- Unique ID for each customer
name VARCHAR(50), -- Name of the customer
address VARCHAR(100) -- Mailing address of the customer
);

CREATE TABLE salespeople (
salesperson_id INTEGER PRIMARY KEY, -- Unique ID for each salesperson
name VARCHAR(50), -- Name of the salesperson
region VARCHAR(50) -- Geographic sales region
);

CREATE TABLE sales (
sale_id INTEGER PRIMARY KEY, -- Unique ID for each sale
product_id INTEGER, -- ID of product sold
customer_id INTEGER, -- ID of customer who made purchase
salesperson_id INTEGER, -- ID of salesperson who made the sale
sale_date DATE, -- Date the sale occurred
quantity INTEGER -- Quantity of product sold
);

CREATE TABLE product_suppliers (
supplier_id INTEGER PRIMARY KEY, -- Unique ID for each supplier
product_id INTEGER, -- Product ID supplied
supply_price DECIMAL(10,2) -- Unit price charged by supplier
);

-- sales.product_id can be joined with products.product_id
-- sales.customer_id can be joined with customers.customer_id
-- sales.salesperson_id can be joined with salespeople.salesperson_id
-- product_suppliers.product_id can be joined with products.product_id

Answer
Given the database schema, here is the SQL query that answers [QUESTION]Do we get more sales from customers in New York compared to customers in San Francisco? Give me the total sales for each city, and the difference between the two.[/QUESTION]
[SQL]

The query that the endpoint generates is : SELECT c.city, SUM(s.quantity) AS total_sales, SUM(CASE"}.

Any thoughts on what might be going wrong here?

rishdotblog

Defog.ai org Feb 19, 2024

Hi there, please use the prompt in the model card for best results.

We do not use Sagemaker for deployment, so I do not know what issues you have with it here. But seems like you would have to increase the maximum output tokens to get the complete answer – your answer is being prematurely cut off here. I would also recommend using beam search with num beams around 3 or 4, if that's supported by Sagemaker.

rishdotblog changed discussion status to closed Feb 19, 2024

singhmanas1

Feb 20, 2024

•

edited Feb 20, 2024

Thanks for the reply @rishdotblog . I am using the same prompt and following the steps provided in your inference.py file (https://github.com/defog-ai/sqlcoder/blob/main/inference.py). To be able to change some model parameter values (num_beams, max_output_tokens), I have changed my strategy a little bit now - if I use the huggingface model hub directly in my sagemaker endpoint (as given in your script https://huggingface.co/defog/sqlcoder-7b-2?sagemaker_deploy=true ), I don't have any control over what model configurations I can pass. However, I can over write the sagemaker handlers for model loading and predictions with my own handlers (inspired from your inference.py) which forces sagemaker to use the configurations I want, similar to the implementation - https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb.

However, I am facing a different error now. I am running out of GPU Memory in my sagemaker endpoint. I am using a ml.g5.2xlarge with GPU memory of 22GB. Do you think your model sqlcoder-7b-2 with 7B parameters needs more GPU memory than 22GB. Any ideas on how I can reduce my GPU memory footprint for inference? The exact error is this-

"com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 22.20 GiB of which 37.12 MiB is free. Process 10864 has 22.16 GiB memory in use. Of the allocated memory 21.33 GiB is allocated by PyTorch, and 110.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF : 400"

Any help would be greatly appreciated. Thank you :)

Fig : GPU Memory utilization during inference. Note : You can see the GPU memory utilization is a 100% now.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment