• kakes@sh.itjust.works
    link
    fedilink
    arrow-up
    0
    ·
    27 days ago

    Never really occurred to me before how huge a 10x savings would be in terms of parameters on consumer hardware.

    Like, obviously 10x is a lot, but with the way things are going, it wouldn’t surprise me to see that kind of leap in the next year or two tbh.

  • Fisch@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    25 days ago

    That would actually be insane. Right now, I still need my GPU and about 8-10 gigs of VRAM to run a 7B model tho, so idk how that’s supposed to work on a phone. Still, being able to run a model that’s as good as a 70B model but with the speed and memory usage of a 7B model would be huge.

    • JackGreenEarth@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      25 days ago

      I only need ~4 GB of RAM/VRAM for a 7B model, my GPU only has 6GB VRAM anyway. 7B models are smaller than you think, or you have a very inefficient setup.

      • Fisch@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        25 days ago

        That’s weird, maybe I actually am doing something wrong. Is it because I’m using GGUF models maybe?

        • Mike1576218@lemmy.ml
          link
          fedilink
          arrow-up
          0
          ·
          24 days ago

          llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.