GPU requirement

#10

by pgedeon - opened May 4, 2023

May 4, 2023

How much ram would meet the minimum requirement? I can not wait for some language specific models, buying an a100 is a bit out of my price range.

rwl4

May 4, 2023

@pgedeon I was able to load it in under 24GB by using bitsandbytes. You can add load_in_8bit=True, device_map="auto" to the load_pretrained line.

cactusthecoder8

May 6, 2023

I second setting load_in_8bit=True, but be careful when setting device_mapto auto if you only have 1 GPU since it may offload some of the layers to the CPU. The BigCode model obj class does not have a flag you can set to true to offload them to CPU. I ended up setting it to my own device map dict.

neverwin

May 10, 2023

Is it possible to run it in a 3070?

Warmonger

May 10, 2023

How about 4080 (16 GB)?

AV99

May 12, 2023

@cactusthecoder8 Could you probably share the device map that worked for you?

cactusthecoder8

May 14, 2023

@cactusthecoder8 Could you probably share the device map that worked for you?

@AV99 how many GPUs you have and what are the sizes of their memory each?

cactusthecoder8

May 14, 2023

How about 4080 (16 GB)?

I tried multiple configurations of the model, and nothing runs successfully with only 16GB unfortunately.

AV99

May 17, 2023

•

edited May 17, 2023

@cactusthecoder8 I initially started out with a single 16GB GPU and with offloading between CPU and GPU (and an hour of inference time later), I was barely able to get the "Hello World" running.

I now have 4GPUs, 16GB memory each. Any suggestions?

LouiSum

May 17, 2023

Can I run it locally on my Mac Studio (M1 Max 32 G)?

marcel

May 23, 2023

•

edited May 23, 2023

@LouiSum that would be fun

loubnabnl

BigCode org May 24, 2023

•

edited May 24, 2023

You can try ggml implementation starcoder.cpp to run the model locally on your M1 machine.

In fp16/bf16 on one GPU the model takes ~32GB, in 8bit the model requires ~22GB, so with 4 GPUs you can split this memory requirement by 4 and fit it in less than 10GB on each using the following code (make sure you have accelerate installed and bitsandbytes for 8bit mode):

from transformers import AutoModelForCausalLM
import torch

def get_gpus_max_memory(max_memory):
    max_memory = {i: max_memory for i in range(torch.cuda.device_count())}
    return max_memory

# for example for a max use of 10GB per GPU
# for fp16 replace with  `load_in_8bit=True` with   `torch_dtype=torch.float16`
model = AutoModelForCausalLM.from_pretrained(
    "bigcode/starcoder", 
    device_map="auto", 
    load_in_8bit=True,
    max_memory=get_gpus_max_memory("10GB"),
)

To understand the logic behind this check this documentation or this blog for handling large model inference.

loubnabnl changed discussion status to closed Jun 6, 2023

shivankarzz

Sep 10, 2023

•

edited Sep 10, 2023

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "gpu" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, load_in_8bit=True).to(device)

this code snippet is giving me the following error:
python3.10/site-packages/transformers/modeling_utils.py", line 2009, in to
raise ValueError(
ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype

I am unable to run without having load_in_8bit flag. I have a single A6000(48 gigs) gpu.

can anyone please help me in inferencing with starcoder??

Warmonger

Oct 18, 2023

To report progress after half a year:

I was able to run multiple small models (7b) quickly and flawlessly on my RTX 4080, using LM Studio server out-of-the-box. Could do up to 10b quantized, but these models are not common for some reason.
Their utility is questionable, though. For sure they are not reliable as a base of any useful system or process, even if only for personal use.
Usable only for experiments, learning or just fun.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment