

I’ll vouch for Koboldcpp. I use the CUDA version currently and it has a lot of what you’d need to get the settings that work for you. Just remember to save what works best as a .kcpps, or else you’ll be putting it in manually every time you boot it up (though saving doesn’t work on Linux afaik, and its a pain that it doesn’t).
It gets real messy, lol. I tried to have GPT guide me through figuring out a Node and nvm error in my Arch WSL and it made nightmare spaghetti out of my npm prefix.
It eventually got stuck in a loop of trying to make me do the same two things over and over again and expected different results each time.