LLM ASICs on USB sticks?

makeasnek@lemmy.ml · 27 days ago

LLM ASICs on USB sticks?

kakes@sh.itjust.works · 27 days ago

Never really occurred to me before how huge a 10x savings would be in terms of parameters on consumer hardware.

Like, obviously 10x is a lot, but with the way things are going, it wouldn’t surprise me to see that kind of leap in the next year or two tbh.

Fisch@discuss.tchncs.de · 25 days ago

That would actually be insane. Right now, I still need my GPU and about 8-10 gigs of VRAM to run a 7B model tho, so idk how that’s supposed to work on a phone. Still, being able to run a model that’s as good as a 70B model but with the speed and memory usage of a 7B model would be huge.

Chrobin@discuss.tchncs.de · 25 days ago

I have never worked on machine learning, what does the B stand for? Billion? Bytes?

Fisch@discuss.tchncs.de · 25 days ago

I think it’s how many billion parameters the model has

JackGreenEarth@lemm.ee · 25 days ago

I only need ~4 GB of RAM/VRAM for a 7B model, my GPU only has 6GB VRAM anyway. 7B models are smaller than you think, or you have a very inefficient setup.

Fisch@discuss.tchncs.de · 25 days ago

That’s weird, maybe I actually am doing something wrong. Is it because I’m using GGUF models maybe?

Mike1576218@lemmy.ml · 24 days ago

llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.

Barx [none/use name]@hexbear.net · 27 days ago

Finally. Wrong answers to questions using my phone.