Hacker News new | past | comments | ask | show | jobs | submit | cowsaymoo's comments login

> THERE ARE THREE R'S IN STRAWBERRY

hilarious


What is the library used to profile the program?


pv

https://linux.die.net/man/1/pv

it is in the pipe command `... | pv > /dev/null`


`pv --discard` is faster by 8% (on my system).

  % pv </dev/zero >/dev/null
  54.0GiB/s

  % pv </dev/zero --discard
  58.7GiB/s


Which is suspiciously close to the speed of DDR4.


Give us something to bookmark


tinfoil had theory: they implanted watermarks already, so that AI generated text can be flagged for future training runs or as a service, such that some phrases are coaxed to become statistical beacons.


That's not really a tinfoil hat theory. That's been possible for some years and OpenAI reportedly does watermark their outputs, and can detect it. They just haven't released it as a service because it'd annoy all the users who are using it for cheating :)


I believe that if that was possible to do on purpose, they wouldn’t have so much trouble preventing the LLMs from talking about things they shouldn’t.


Cheap for now. One day, once the market shares balance out, the cloud spend will increase. Local LLMs may be important to prioritize for code that may be running after multiple subscription cycles into the future.

Edit: oh you best wrote closed-source model whoops


cognitive behavioral therapy enjoyer:


Here is their safety and warnings section disclosing that. It's really interesting because of how they're presumably required by law to make a CVS-receipt-length FDA medicine warning but all the dangers are for playing a video game. I think it's pretty cool to see how effective the FDA's procedures are at capturing your concerns, through forcing them to be transparent

# Indications: > EndeavorRx is a digital therapeutic indicated to improve attention function as measured by computer-based testing in children ages 8-17 years old with primarily inattentive or combined-type ADHD, who have a demonstrated attention issue. Patients who engage with EndeavorRx demonstrate improvements in a digitally assessed measure, Test of Variables of Attention (TOVA®), of sustained and selective attention and may not display benefits in typical behavioral symptoms, such as hyperactivity. EndeavorRx should be considered for use as part of a therapeutic program that may include clinician-directed therapy, medication, and/or educational programs, which further address symptoms of the disorder.

# Safety: > No serious adverse events were reported. Of 342 participants who received AKL-T01 in the two clinical trials supporting EndeavorRx authorization for age ranges 8-17, 17 participants (4.97%) experienced treatment-related adverse events (TE-ADE) (possible, probable, likely). TE-ADEs reported at greater than 1% across the studies include: frustration tolerance decreased (2.34%) and headache (1.17%). Other adverse events occurred less than 1% and included dizziness, emotional disorder, nausea, and aggression. All adverse events were transient and no events led to device discontinuation. Across other studies in children and adolescents with ADHD, rates of adverse events were similarly low (<10%) and no Serious Adverse Events have been reported. All reported adverse events across all clinical trials resolved at the end of treatment. Users should consider the totality of evidence presented along with their health care provider when considering incorporating AKL-T01 into their treatment plan.

# Cautions: > Rx only: Federal law restricts this device to sale by or on the order of a licensed health care provider. EndeavorRx should only be used by the patient for whom the prescription was written. For medical questions, please contact your child’s healthcare provider. If you are experiencing a medical emergency, please dial 911. EndeavorRx is not intended to be used as a stand-alone therapeutic and is not a substitution for your child’s medication.

> If your child experiences frustration, emotional reaction, dizziness, nausea, headache, eye-strain, or joint pain while playing EndeavorRx pause the treatment. If the problem persists contact your child’s healthcare provider. If your child experiences a seizure stop the treatment and contact your child’s healthcare provider.

> EndeavorRx may not be appropriate for patients with photo-sensitive epilepsy, color blindness, or physical limitations that restrict use of a mobile device; parents should consult with their child’s healthcare provider.

> Please follow all of your mobile device manufacturer’s instructions for the safe operation of your mobile device. For example, this may include appropriate volume settings, proper battery charging, not operating the device if damaged, and proper device disposal. Contact your mobile device manufacturer for any questions or concerns that pertain to your device.


I wish these texts were written with the intent to inform rather than cover their asses legally, it's barely readable to me.


When the Trump assassination attempt happened last week and every single post on here was still about computers that's when I realized this place is different


It’s not a secret HN is not a site for general news. That’s the first item in the guidelines:

> What to Submit

> (…)

> Off-Topic: Most stories about politics, or crime, or sports, or celebrities, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably off-topic.

https://news.ycombinator.com/newsguidelines.html

It is because you’d hear about that anywhere and everywhere else that it doesn’t belong here. Would you complain that a forum about cooking or sharing wallpapers didn’t cover the news as well?

Though it was submitted and discussed anyway, which always happens. That can be confirmed with your own keywords.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


We need our safe space.


There do seem to be posts.

I’m not sure of the activity level of HN at the time it occurred.

https://hn.algolia.com/?dateRange=pastWeek&page=0&prefix=fal...


Isn’t that kind of on purpose though? I think you will get flagged if you just post general news articles. It looks like political posts are only accepted if they have some relation to technology.


I wouldn't mind reading about it here in like 2 weeks to a month tbh, but clearly I don't come here to read 'worldly news'.


I’m glad that my attempts at removing USA politics from my content feeds have been so successful that this is the first time I hear about the Trump assassination attempt.


That's why I come here instead of other places.


Now this is truly the programming language that we should be using to benchmark LLM code gen in a private hold out set. There is no substantial datasets on the internet or github, and no documentation except the one provided. And that's all the model should need.

I asked GPT-4 to write a mat mul function, but that was too ambitious and it spit out outrageous nonsense.

To be more fair, I gave it in-context access to the documentation in prompt, along with the fibonacci example function; aka everything humans have access to. I then asked it to do the simpler task of converting a base 10 integer to binary. It was unable to write something error free even after 4 rounds of supplying it the error messages.

I repeated this 5 times in case it generates something grammatical in the Top-K@5.

I suspected there was some confusion it couldn't surmount about string manipulation. So I changed the question to something challenging, yet something that only used function calls, conditional logic, basic math ops, and numbers. First, I asked for an nth root approximator using newton's method. Didn't work. Asked for just the square root. Didn't work. Finally, I asked for a function that prints a student's grade given their integer percentage. Not even.

GPT-4 also persistently hallucinated the keyword BREAKING NEWS, which I think sounds like a pretty good keyword if Tabloid were to ever get error handling.

The spooky part is that the almost all the solutions at face value would get partial credit. They had the right abstract approach, being familiar with reams of example approaches in natural language or programming languages. However, in each case, GPT-4, 4o, Claude all failed to produce something without syntax errors.

I suspect this is the case because transformers do subgraph matching, and while on one end there are rich internal connections for all the problems I requested, on the other end there is nothing similar enough for it to even get a foothold, hence the biggest struggle being syntax. If the only barrier to executing Tabloid code (or other unseen languages) is more basic syntax training, then it excitingly suggests it just needs to learn the abstract concepts from leetcode scrapes once for every syntax it knows. Prior research has shown that grammar is easy for language models. When GPT-2 was made large enough, it went from babbling to grammatical sentences very early in it's training, and at that moment its loss plummeted.

All tests conducted in temporary data mode so that this eval stays dark.


Claude managed to write code successfully.

```

DISCOVER HOW TO square_root WITH x, iterations RUMOR HAS IT EXPERTS CLAIM guess TO BE x DIVIDED BY 2 DISCOVER HOW TO improve_guess WITH current_guess RUMOR HAS IT SHOCKING DEVELOPMENT (current_guess PLUS (x DIVIDED BY current_guess)) DIVIDED BY 2 END OF STORY

    DISCOVER HOW TO iterate WITH current_guess, remaining_iterations
    RUMOR HAS IT
        WHAT IF remaining_iterations SMALLER THAN 1
            SHOCKING DEVELOPMENT current_guess
        LIES! RUMOR HAS IT
            EXPERTS CLAIM new_guess TO BE improve_guess OF current_guess
            SHOCKING DEVELOPMENT
                iterate OF new_guess, remaining_iterations MINUS 1
        END OF STORY
    END OF STORY
    
    SHOCKING DEVELOPMENT iterate OF guess, iterations
END OF STORY

EXPERTS CLAIM number TO BE 16 EXPERTS CLAIM num_iterations TO BE 5

YOU WON'T WANT TO MISS 'The square root of' YOU WON'T WANT TO MISS number YOU WON'T WANT TO MISS 'is approximately' YOU WON'T WANT TO MISS square_root OF number, num_iterations

PLEASE LIKE AND SUBSCRIBE

```


This is consistent with my own experience that Claude is just downright better than ChatGPT.


Same, I've been pretty impressed as well and typically give Claude a shot. Sometimes I even pass their results back and forth in an LLM collab so they generate more diverse perspectives. However, this paper from 4 days ago shows that Claude can fall apart quickly in out of distribution tasks. If you ask opposite day questions, GPT-4 is weirdly strong at it (figure 2).

https://arxiv.org/pdf/2307.02477


Ah bravo! What was the prompt and Claude model?


Great idea here. I wonder if there's potentially more demand for new programming languages now purely as benchmarks for LLMs, like you said?


Maybe they will take on that role too one day


Right, they never claimed to have found a roadmap to AGI, they just found a cool geometric tool to describe how LLMs reason through approximation. Sounds like a handy tool if you want to discover things about approximation or generalization.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: