\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

Local LLM memory guide

1. For doing inference with a long context window you need 2...
https://i.imgur.com/klfx8zs.jpeg
  12/08/25


Poast new message in this thread



Reply Favorite

Date: December 8th, 2025 11:50 PM
Author: https://i.imgur.com/klfx8zs.jpeg


1. For doing inference with a long context window you need 24gb. Not 16 or 20gb. Not less than 24gb will do.

2. For doing inference with a long context window, there's very little benefit to going ABOVE 24gb. The exra 8gb on the 5090 isn't doing shit, really.

3. For image/video generation you need 48gb. There are various ways of "pooling" VRAM across two GPUs to do this, but it's easiest with RTX 3090s.

4. The RTX 3090 with 24gb is the sweet spot right now because you can get them for $750. The 4090 has the same amount of VRAM and memory bandwidth is about the same. 5090 costs $2k extra and only nets you an additional 8gb of RAM, whereas that same money buys you three 3090s in good condition.

5. If you feel compelled to run uncompressed models that require 512gb of RAM you can still do it on old Xeons with DDR3. You'll get like 4 tokens/sec but you can do it, that's the point.

(http://www.autoadmit.com/thread.php?thread_id=5808201&forum_id=2/#49495710)