Hacker News new | past | comments | ask | show | jobs | submit login

I'm running TheBlokes wizard-vicuna-13b-superhot-8k.ggmlv3 with 4-bit quantization on a Ryzen 5 that's probably older than OPs laptop.

I get around 5 tokens a second using the webui that comes with oogabooga using default settings. If I understand correctly, this does not get me 8k context length yet, because oogabooga doesn't have NTK-aware scaled RoPE implemented yet.

Using the same model with the newest kobold.cpp release should provide 8k context, but runs significantly slower.

Note that this model is great at creative writing, and sounding smart when talking about tech stuff, but it sucks horribly at stuff like logic puzzles or (re-)producing factually correct in-depth answers about any topic I'm an expert in. Still at least an order of magnitude below GPT4.

The model is also uncensored, which is amusing after using GPT4. It will happily elaborate on how to mix explosives and it has a dirty mouth.

Interestingly, the model speaks at least half a dozen languages much better than I do, and is proficient at translating between them (far worse than deepL, of course). Which is mindblowing for a 8GByte binary. It's actual black magic.




"Note that this model is great at creative writing"

Could you elaorate on what you mean by that, like, are you telling it to write you a short story and it does a good job? My experiments with using these models for creative writing have not been particularly inspiring.


Yes, having the model write an entire short story or chapter is not very good. It excels if you interact closely with it.

I tested it to create NPCs for fantasy role playing games. I think its the primary reason cobold.cpp exists (hence the name).

You give it a (ideally long, detailed) prompt describing the character traits of the NPCs you want, and maybe even add back and forth dialogue with other characters to the prompt.

And then you just talk to those characters in the scene you set.

There's also "story mode", where you and the model take turns writing a complete story, not only dialogue. So both of you can also provide exposition and events, and the model usually only creates ~10 sentences at a time.

There's communities online providing extremely complex starting prompts and objectives (escape prison, assassin someone at a party and get away, ect.) for the player, and for me, the antagonistic ones (the models has control over NPCs that don't like you) are surprisingly fun.

Note that one of the main drivers of having uncensored open source LLMs is people wanting to role-play erotica with the model. That's why the model that first had scaled RoPE for 8k context length is called "superhot" - and the reason it has 8K context is that people wanted to roleplay longer scenes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: