More

declaredapple · 2024-04-04T18:28:57

Some people may be willing to travel but don't know where the best place to go is.

declaredapple · 2024-04-01T17:37:24

It's going to be GPT-3.5turbo not GPT4. As others have mentioned, binggpt/bard are also free.

These smaller models are a relatively cheap to run, especially in high batches.

I'm sure there will also be aggressive rate limits.

declaredapple · 2024-03-29T17:52:41

They've been designing their own chips a while now, including with an NPU.

Also because of their unified memory design, they actually have insane bandwidth which is incredibly useful for LLMs. IMO they may have a head-start in that respect for on-device inference of large models (e.g. 1B+ params).

talldayo · 2024-03-29T22:51:24

I don't think people are running 1B+ models on the Neural Engine these days. The high-performance models I've seen all rely on Metal Performance Shaders, which scales with how powerful your GPU is. It's not terribly slow on iPhone, but I think some people get the wrong idea and correlate an ambient processor like the Neural Engine with LLMs.

The bigger bottleneck seems like memory, to me. iPhones have traditionally skimped on RAM moreso than even cheap and midrange Android counterparts. I can imagine running an LLM in the background on my S10 - it's a bit harder to envision iOS swapping everything smoothly on a similarly-aged iPhone.

JKCalhoun · 2024-03-30T03:55:30

Sure, but we're discussing 1.8-bit models that, again I'm a layman, I assume are over an order of magnitude smaller in their memory overhead.

declaredapple · 2024-03-29T17:49:44

The flash doesn't do the computations though, that's just a method of getting it to the processor

sroussey · 2024-03-29T20:19:41

It would be better to have eeprom or some such directly attached as memory. No loading.

declaredapple · 2024-03-28T20:14:51

I think there's clear social impairment. Making friends is harder and it takes more "manual" effort to socialize "effectively".

However much of this social impairment may or may not be a real problem. "special interests" and "obsessing over one topic" is an impairment in social scenerios, but can be extremely beneficial for tasks related to that special interest.

constantcrying · 2024-03-28T20:37:29

declaredapple · 2024-03-28T20:02:35

> We're currently working with professional to improve our domain rating, ensuring you receive the quality backlinks needed to boost your SEO. Please be patient. It might take a while.

The irony in this is magical.

I highly suggest you remarket as a collection of some type of link a la product hunt. Tech startups, cool apps, anything.

"backlink service" is literally just "seo spam" on it's own.

declaredapple · 2024-03-28T16:09:41

> companies like OpenAI have had access to large quantities of H100 for a few months now and Sora is being presented

From what I could tell from Nvidia's recent presentation, Nvidia works directly with OpenAI to test their next gen hardware. IIRC they had some slides showing the throughput comparisons with Hopper and Blackwell, suggesting they used OpenAI's workload for testing.

H100's have been generally available (not a long waitlist) for only several months, but all the big players had them already 1 year ago.

I agree with you, but I think you might be 1 generation behind.

> OpenAI used H100’s predecessor — NVIDIA A100 GPUs — to train and run ChatGPT, an AI system optimized for dialogue, which has been used by hundreds of millions of people worldwide in record time. OpenAI will be using H100 on its Azure supercomputer to power its continuing AI research.

March 21, 2023 https://nvidianews.nvidia.com/news/nvidia-hopper-gpus-expand...

GaggiX · 2024-03-28T16:19:50

Very interesting, I guess it does make sense that GPT-4 was also trained on the Hopper architecture.

declaredapple · 2024-03-27T20:51:37

What?

Are you asking if the framework automatically quantizes/prunes the model on the fly?

Or are you suggesting the LLM itself should realize it's too big to run, and prune/quantize itself? Your references to "intelligent" almost leads me to the conclusion that you think the LLM should prune itself. Not only is this a chicken and egg problem, but LLMs are statistical models, they aren't inherently self bootstraping.

dheera · 2024-03-27T21:35:31

I realize that, but I do think it's doable to bootstrap it on a cluster and teach itself to self-prune, and surprised nobody is actively working on this.

I hate software that complains (about dependencies, resources) when you try to run it and I think that should be one of the first use cases for LLMs to get L5 autonomous software installation and execution.

Red_Leaves_Flyy · 2024-03-27T23:19:37

Make your dreams a reality!

lobocinza · 2024-04-03T16:49:46

Worst is software that doesn't complain but fails silently.

2099miles · 2024-03-28T06:30:53

The LLM itself should realize it’s too big and only put the important parts on the gpu. If you’re asking questions about literature there’s no need to have all the params on the gpu, just tell it to put only the ones for literature on there.

declaredapple · 2024-03-27T17:25:29

For me GPT4 seems to suggest more generic unittests in python. They're much more "Put tests here", or "path.to.dependency".

Claude3 opus (and often Sonnet) actually fills in the full dependency paths, actually makes tests, and just overall seems to "know what I want from it".

declaredapple · 2024-03-27T16:22:11

> you do not divulge details of your training data.

FWIW asking LLMs about their training data is generally HEAVILY prone to inaccurate responses. They aren't generally told exactly what they were trained on, so their response is completely made up, as they're predicting the next token based on their training data, without knowing what they data was - if that makes any sense.

Let's say it was only trained on the book 1984. It's response will be based on what text would most likely be next from the book 1984 - and if that book doesn't contain "This text is a fictional book called 1984", instead it's just the story - then the LLM would be completing text as if we were still in that book.

tl;dr - LLMs complete text based on what they're trained with, they don't have actual selfawareness and don't know what they were trained with, so they'll happily makeup something.

EDIT: Just to further elaborate - the "innocent" purpose of this could simply be to prevent the model from confidently making up answers about it's training data, since it doesn't know what it's training data was.

wodenokoto · 2024-03-27T16:34:47

Yeah, I also thought that was an odd choice of word.

Hardly any of the training data exists in the context of the word “training data”, unless databricks are enriching their data with such words.