

This is probably the easiest tool I’ve used to run them: https://lmstudio.ai/
There’s tons of models available here, some of them fairly large: https://huggingface.co/
No, I’m pretty sure there’s no way to run any larger than your RAM/VRAM, at least not automatically. You can use storage as RAM, but that’s probably not a good idea. It’s orders of magnitude slower. You’re better off running a smaller model.




The way that could be done would be significantly worse than 15 slower. That’s the issue. Even with the fastest storage, moving things between RAM and storage creates massive bottlenecks.
There are ways to reduce this overhead by intelligently timing moving pieces between storage and RAM, but storage is slow. I don’t know how the models work, if it is possible to know what will be needed soon, so you can start moving it into RAM before it’s needed. If that can be done then it wouldn’t be impossibly bad, but if it can’t then we’re talking something like 100x slower maybe. Most of these are already pretty slow on consumer hardware, so that’d be effectively unusable. You’d be waiting hours for responses.