Hacker News new | past | comments | ask | show | jobs | submit | DarthNebo's comments login

HiddenBar


Yessir, Thank you very much


This is old news


They did use the Canny ControlNet Pipeline


I think the article was updated with that in response to my comment:

> P.S. As pointed out by a fellow HackerNews reader, we clearly forgot to include our code snippet for ControlNet in the article.

No other code snippet besides the one added in response uses Canny, at least so far as I can see.


Oh I see, my bad


I'm hitting 3.9tok/s with CTX of 300 tokens on Android/778G via Userland & this is with an older unoptimized build of llama.cpp


Feels like there should be two branches of system design for unit profitable architecture for paying users vs VC backed architectures to support non-paying users.


For long running stuff https://developer.apple.com/tutorials/app-dev-training/trans... should be straightforward to translate as well using ported on-device BERT models


JM2C

lood_in_4bit=True will let you run Llama2-7B variants at 6.3GB VRAM.


I'm building a tool for transcriptions where your brand, product or any other technical jargon or heck even your own name does not need a fine-tuned model all the time. Both as a free native Mac app & a SaaS tool for those who need to process in bulk. Hit me up at nebo@minusgreed.com to check it out, will launch as FortuneSpeech.com once out of beta.

Thanks to this post, I kind of have another idea for a 'corrections to transcript' feature that Llama2-7B even on CPU can help with.


See sebastiennight's parent comment above - I think they meant to reply to you.

Curious to see your response.


Actually I wasn't, but if this can be useful to GP I shared my actual prompt in a follow-up comment.

I would be interested to see if LLama2 can perform a similar task.


It's like a repeat of c++ or python. If you can find folks & tooling for what you want to accomplish then go for it. I don't think founders who actually have paying customers concern themselves with what works best per cent of compute spend rather what allows them to improve the product, while as an outsider it may seem that they can reduce costs by choosing X over Y.


I would suggest getting your feet wet with HuggingFace Spaces free/Pro plan to get started & then their APIs once you get the hang of setting up things there. After that you can start with setting up LangChain pipelines or direct vector DB queries for which sort of columns or SQL queries to formulate(for the latter).

As for the former classifier you can try doing zero-shot classification between n number of categories + others. Models like Flan-T5/T5/Flan-UL2/DistillBART(also ~7B-40B param LLMs can also do this but would be overkill).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: