Thread - Nostr Hypermedia

Relays: 5

Replies: 20

Generated: 02:20:12

Seriously considering putting two RTX 6000 in my upcoming build machine. Puts the price up an order of magnitude, but truly private AI might be a worthwhile investment. Never played with GPUs before, so informed thoughts welcome?

2025-11-30 07:25:36 from 1 relay(s) 8 replies ↓

Replies (20)

Max max@towardsliberty.com npub1klkk...x3vt

nostr:nprofile1qqswlew3yr0ses5slf6gwflmgkkysl926drdfu3f82cxn68srlz3nqgpz4mhxue69uhhyetvv9ujuerpd46hxtnfduhszxrhwden5te0wfjkccte9ejxjum0vfjhjtnyv4mz7qgkwaehxw309aex2mrp0yhxummnw3ezumn9wshsc8za9h wdyt, are two enough?

2025-11-30 07:34:56 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Juraj juraj@bitpunk.fm npub1m2mv...r8p9

Too little VRAM I think. Look for anything with integrated ram and vram.

2025-11-30 07:45:24 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

Ape Mithrandir apemithrandir@apemithrandir.com npub16dsu...h6vy

Do it.

2025-11-30 07:50:29 from 1 relay(s) ↑ Parent Reply

Ape Mithrandir apemithrandir@apemithrandir.com npub16dsu...h6vy

96gb of VRAM is loads for a local setup to run models. Also the RTX 6000 is much faster than the Apple ram/vram approach. You get much better tokens per second from Nvidia cards.

2025-11-30 07:52:23 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Juraj juraj@bitpunk.fm npub1m2mv...r8p9

Yes, 96GB is quite ok, the problem is that you have the complexity of two cards, which is not so much fun to setup, you will need to handle this often. It's not a click and run solution, distributed inference even on the same machine adds complexity. Nvidia and Mac are not the only options. There's for example AMD Ryzen AI Max, easily with up to 128GB of shared memory. Framework for example sells these.

2025-11-30 08:01:44 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

𝕞𝕪𝕡𝕒𝕥𝕙𝕥𝕠𝕗𝕚𝕣𝕖 mypathtofire@iris.to npub1zsyt...xwsm

Did you try an Occulink setup ?

2025-11-30 08:15:32 from 1 relay(s) ↑ Parent Reply

Fritz Zorn fritz_zorn@rizful.com npub1qqqq...52qs

https://x.com/dhh/status/1993235569258709445

2025-11-30 08:41:54 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

npub179e9...lz4s

This is a good point: I shall test "local-able" models remotely first!

2025-11-30 09:02:51 from 1 relay(s) ↑ Parent Reply

Se7enZ Se7enZ@se7enz.com npub1yd3a...g8ph

Do you have a recommendable AI configuration for emacs?

2025-11-30 11:46:52 from 1 relay(s) ↑ Parent Reply

John npub1hjle...zg68

There's a pretty big gap between the light models and the heavy ones. Often theres a flagship MoE midel and a light version like GLM-4.5 and GLM-4.5 air 4.5 air fits easily on 96gb but 4.5 full needs 200gb+ just for the model (no context) (quantized to q4) CPU offloading makes them runnable but at like 10T/s which is pretty lame And then there's models like kimi k2 that are >500gb quantized I want a local rig but keep putting it off BC the reqs keep changing Stacking 5090s is nice BC they're 1/4 the price but stacking 6000s is just a nicer system (noise;/power etc)

2025-11-30 12:29:12 from 1 relay(s) ↑ Parent Reply

aljaz aljaz@nostr.si npub1alja...g9jp

technically you could get rtx 6000 pro and get 96gb on one card it is true that having two cards is not the same as one big, theres penalty to it and added complexity but also comparing 128gb shared memory vs 96gb vram on gpus is not exactly the same, theres lots to consider in terms of worload and performance, sadly afaik desktop gpus don't have nvlink support anymore so you need a motherboard with good pcie 5.0 support so that you'll have max bw between cards also depends on what you want to do with them, what kinds of models you want to run generally the hardest workload to do locally is coding agents because they tend to like to have a lot of context so you're hitting vram limits very quickly with them and its hard to have comparable performance with small gpus overall - buying great gpu's is always a good idea since the models are getting better and better and the tooling is maturing so two rtx 6000 can give you a lot of bang for the buck. will it perform the same as using opus/gemini3/codex? gonna probably need to bump another order of magnitude for that 😅 you can get decent performance out of rtx 6000 with quantized models according to the internetz

2025-11-30 15:57:59 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

lkraider null npub18c8h...353s

my take is that, unless you want to develop your own LLMs, I would wait for the consumer architecture to stabilize instead of being an early adopter and spending too much on a local rig.

2025-11-30 16:35:47 from 1 relay(s) ↑ Parent 2 replies ↓ Reply

lkraider null npub18c8h...353s

training your own LLMs on your own specific dataset is the only usecase I would suggest building a local rig for

2025-11-30 16:37:39 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

aljaz aljaz@nostr.si npub1alja...g9jp

its hard to have too much sovereign compute tho

2025-11-30 16:37:54 from 1 relay(s) ↑ Parent Reply

aljaz aljaz@nostr.si npub1alja...g9jp

i'd disagree, a lot of use cases are completely viable locally

2025-11-30 17:15:35 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Juraj juraj@bitpunk.fm npub1m2mv...r8p9

Yes, I do a lot locally too.

2025-11-30 17:38:37 from 1 relay(s) ↑ Parent Reply

szarka npub1szar...qxet

I went with one of these instead: https://www.gmktec.com/products/amd-ryzen%E2%84%A2-ai-max-395-evo-x2-ai-mini-pc?variant=64bbb08e-da87-4bed-949b-1652cd311770 YMMV, depending on what toolchain you want to work with.

2025-11-30 20:31:09 from 1 relay(s) ↑ Parent Reply

sister_sam sister_sam@primal.net npub1uqmy...xggf

I run a few models on 4090 locally. They are good enough for many tasks but there hallucination patterns are more obvious than for the larger paid models when it goes south. I mean to use langchain and vector databases on my extensive PDF library to see how much more I can get out of it.

2025-12-01 06:01:13 from 1 relay(s) ↑ Parent Reply

sister_sam sister_sam@primal.net npub1uqmy...xggf

48 GB VRAM is NOT too little to be useful.

2025-12-01 06:03:18 from 1 relay(s) ↑ Parent 1 replies ↓ Reply

Juraj juraj@bitpunk.fm npub1m2mv...r8p9

That's true. But 96GB or more is disproportionately more useful. I run a lot of inference on 32GB Mac mini. But then there are some things that I have to run on 96GB MacBook

2025-12-01 08:27:50 from 1 relay(s) ↑ Parent Reply