That... doesn't really explain how they can get such a high number? Standard FLO...

danielhanchen · on Dec 2, 2023

Hey! Great question! That's what I'm confused about as well!

So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:

1. Flash Attention v2 reduces the time taken by 17% or so

2. RoPE Triton kernels: -7.1%

3. RMS Layernorm in Triton: -3.1%

4. Cross Entropy in Triton: -1%

5. Manual autograd for MLP: -4%

6. Manual QKV autograd: -2%

7. Manual O autograd: -2%

8. Smart cache evictions and reduced data duplications etc: -30%

9. And other tricks in the Max and Pro versions makes it 30x faster

You can see it's just tricks in each step, which accumulate together to make to go faster.

I'll write up a blog post to detail it all in the future!!!

demosthanos · on Dec 2, 2023

> And other tricks in the Max and Pro versions makes it 30x faster

This feels like the collecting underpants meme. Phase 1: Get to the same performance as other methods. Phase 2: ???. Phase 3: Now you're at 750%!

You may or may not actually have succeeded at what you claim to, but you're not being very persuasive. I realize that you're trying to turn these tricks into a profit and revealing them would destroy that possibility, but you're going to have a really hard time persuading people to pay for a product that does something that enormous teams of PhDs at BigTech haven't been able to pull off on the basis of "trust me".

danielhanchen · on Dec 2, 2023

I agree fully - what do you suggest then? OSS the entire code base and using AGPL3? I tried that with https://github.com/danielhanchen/hyperlearn to no avail - we couldn't even monetize it at all, so I just OSSed everything.

I listed all the research articles and methods in Hyperlearn which in the end were gobbled up by other packages.

We still have to cover life expenses and stuff sadly as a startup.

Do you have any suggestions how we could go about this? We thought maybe an actual training / inference platform, and not even OSSing any code, but we decided against this, so we OSSed some code.

Any suggestions are welcome!

wsxiaoys · on Dec 2, 2023

Wow, this is a great topic. I don't really have specific suggestions, but I'd like to contribute some thoughts on the matter.

Monetizing anything isn't inherently problematic; the challenge lies in defining what should be paid for and what should be offered for free.

In the realm of open-source products and SaaS, the common practice is to provide free self-hosting options while charging for cloud hosting or enterprise-specific features, such as access control and authentication integrations.

However, the landscape becomes significantly more challenging for LLMOps (assuming you are still focusing on training as a major aspect of your business, which can be categorized as LLMOps).

Historically, there haven't been many success stories in this area (with exceptions like wand.ai, which focusing on tracking experiments). I believe this difficulty arises from the largely ad-hoc nature of training and fine-tuning processes, making standardization a challenge, coupled with the infrequency of these tasks.

That being said, training/finetuning is a valuable technique. However, transforming it into a company that offers products is really challenging. Successful examples in this realm typically depend heavily on solution customization or consulting-oriented business models.

danielhanchen · on Dec 2, 2023

Thanks for the points! I agree monetization in the LLM Ops space is hard and complex. Agreed fully on customizing solutions or consulting.

Yep self hosting solutions like Redhat, or DBs like MongoDB or Gitlab's dashboard style approach could work - the issue is now as you mentioned we offer training and finetuning.

We do plan to offer inference as well, plus the data gathering process, and the final prompt engineering side - but we thought why not have a shot?

It's possible best to make a training and inference platform - maybe some sort of personal ChatGPT training for the public - everyone can train their own personal ChatGPT not via ChatGPT's in context learning or RAG, but coupled with actual fast 30x finetuning, a personal bot can truly be possible.

Thaks for the suggestions!

_boffin_ · on Dec 2, 2023

You have companies that are spending good money on fine-tuning and will start spending money on fine-tuning. It seems like it would almost be easier to just go directly to these companies by looking at their blog posts--they're telling you that they're doing it in some way or another. I know Plaid and friends are doing it.

It's costing them x. you can shave y off. you can get improvements to market faster and cheaper.

danielhanchen · on Dec 2, 2023

Interesting points! I shall try this with my bro!!

I was thinking along the lines of say the cost of A100s or H100s * electricity cost and engineering costs then how much we save, and some discounting factor.

rmbyrro · on Dec 2, 2023

I think the time savings will be more appealing.

It allows for fast iteration and shorter go-to-market, which can generate virtually infinite value, as opposed to saving electricity, which is a limited game.

danielhanchen · on Dec 2, 2023

Fair point - I forgot to mention the time savings LOLL!!!

IanOzsvald · on Dec 2, 2023

You may want to look sideways to companies such as hedge funds. They have DNN teams and experiment with LLMs, you may find interesting optimisation opportunities with such teams. Charge according to opportunity that you open up, not electricity saved!

danielhanchen · on Dec 2, 2023

Interesting! Hedge funds - very interesting.

Oh no yep your right on time saved and what opportunities it gives them not just the electricity and capital costs :))

You can now experiment 30 different models instead of 1 - if you have 100 GPUs, we magically made it 3000!

theptip · on Dec 2, 2023

If your sauce is algorithmic (both current and future edge) then you cannot be OSS and profitable. Google will open source all sorts of things, but never their recommender algos.

Your best bet is probably a SaaS training platform (I suspect inference is a harder business, as you need to serve high uptime APIs; I guess you have more forgiving SLAs for training batches). Sell to medium-large companies (big enough to need training, not big enough to have an established in-house platform), and if you need to bootstrap at all you can probably do profitable consulting-type work without giving up your core IP, since you can hand off the trained model weights without handing out all of your trade secrets.

Folks around here are going to gripe about this; HN has a contingent of FOSS enthusiasts but these people are not going to give you a dollar, they are not your customers. FOSS is great but you are under no obligation to give away your life work.

Honestly where you have landed (opening up some of your work) is more generous with your time than most people would be; people should be thanking you instead of complaining that it’s not more open. I think giving out enough OSS for people to realize you are the real deal while keeping the biggest wins closed is a good marketing strategy.

danielhanchen · on Dec 3, 2023

Thanks for the nice comment! Ye we tried our best to OSS stuff since that's my passion - to help the OSS community! :)

Agreed on the training platform - yess consulting is also a good point!!

I guess the main point is we don't want to be eaten up by cloud providers, and not repeat the mistakes of other OSS projects like MongoDB with AWS etc.

But thanks for the nice comment and suggestions!

welzel · on Dec 2, 2023

Finding a OOS business model is non-trivial.

Maybe you should talk to https://goodsnooze.gumroad.com/l/macwhisper to get some inspiration?

People are paying for convenience.

as for the technology itself: the B2B market is super-super early and i understand everybody is in goldrush mode, however 98% of all startups will not survive the next 3-5 years.

From the demand site: Companies are still sleeping, you can see very very very few proof of concept implementation, but basically nothing goes to production.

The rate of innovation is extremely high with LLM, making it a bad investment for a company.

My idea: OSS everything, become an expert in the field, learn how to sell, survive from consulting services. Don´t build products, do paid projects instead.

Focus all your energy to understand customer needs and building your target audience.

Be ready when the time is right to build a startup around LLM.

Don´t waste time building technology, develop your business instead.

danielhanchen · on Dec 2, 2023

Hmm interesting take on things - I just thought consulting would fall into the trap of Dunbar's number https://en.wikipedia.org/wiki/Dunbar%27s_number - plus consulting requires more effort so given 24 hours in a day, you can only grow so much.

It sounds like consultants will become freelancers in the future - but LLMs itself might take over the consultant's job as well.

But on that note - that's why with my bro, we decided Unsloth was out 1st product release - we're going to be releasing tonnes of new other products! (Coincidentally a data science consultant as well!)

rmbyrro · on Dec 2, 2023

I think a training / inference platform that shares scale efficiencies would be very attractive. I'd use it for sure.

Problem with most platforms is they keep ALL scale efficiencies for themselves, which scares away big projects. They end up with only small users, which don't make unicorns in this case.

Finetuned LLMs is the future for most enterprise applications. Not every shop can possibly set up its own LLM team. If you abstract that away and let them know they'll pay less (per unit) as they scale up, it'd be a juicy proposal.

danielhanchen · on Dec 2, 2023

Fair points!! I agree fully on the platform approach since many people have already mentioned about it and if it's super duper efficient and affordable, people are willing to pay

mdekkers · on Dec 2, 2023

You don’t need to explain every detail of how you do what you do, your product should speak for itself - outcomes count. If you go to a restaurant, you don’t demand to stand in the kitchen and inspect each ingredient and process, right?

I appreciate this probably isn’t a popular HN opinion, but as you say, you need to make a living. If you have produced something novel that is working, put the gaspedal down and monetise the absolute living daylights out of it as long as you can. Because that is what everyone with _money_ is doing. You don’t see OpenAI opening all their research and tricks now, do you?

Do your thing, buddy, and make your money. All the best with your startup, and don’t get distracted by the people clamouring for your recipes.

danielhanchen · on Dec 2, 2023

Thanks! That's what I was trying to convery just I couldn't say it like that! :)

Sadly OpenAI did in fact open source everything, but now revenue is king - I'm sure they will open source stuff in the future once the time is right.

But thanks a lot - it means a lot - highly appreciate it!!!

Anon4174 · on Dec 4, 2023

Make an API where I can upload text files and pick any of the llama2 models and get back a trained Lora (better yet one for each epoch). This is something that I'm surprised doesn't already exist for llama and if you really have these performance benefits you can mark up the gpu rental costs substantially and still be cheaper than renting the gpus myself. Not to mention easier to use.

matthewcford · on Dec 2, 2023

Emphasizing that you have already done performance optimisation for ML algorithms would make me trust this a lot more, esp as you then open-sourced it.

danielhanchen · on Dec 2, 2023

Fair point!! I shall keep pointing this out from now on! :)