96gb of VRAM is loads for a local setup to run models. Also the RTX 6000 is much faster than the Apple ram/vram approach. You get much better tokens per second from Nvidia cards.
Login to reply
Replies (1)
Yes, 96GB is quite ok, the problem is that you have the complexity of two cards, which is not so much fun to setup, you will need to handle this often. It's not a click and run solution, distributed inference even on the same machine adds complexity.
Nvidia and Mac are not the only options.
There's for example AMD Ryzen AI Max, easily with up to 128GB of shared memory. Framework for example sells these.