NHacker Next
- new
- past
- show
- ask
- show
- jobs
- submit
login
Alternately, on an M3 Ultra Mac Studio with 256GB of unified memory, you can run a 4bit quant of GLM-4.6 at about 20 tokens/second. That compares to about 40 t/s for a 6bit quant of MiniMax M2. I am not sure how fast these will run if you have a Mac Studio 512GB that can load the unquantized versions of the models.
Great performance for coding after I snatched a pretty good deal 50%+20%+10%(with bonus link) off.
60x Claude Code Pro Performance for Max Plan for the almost the same price. Unbelievable
Anyone cares to subscribe here is a link:
[1]: https://service.campaigndelivery.cn/resources/templateImages...