Hacker News new | past | comments | ask | show | jobs | submit | more dongping's comments login

Not that I don't agree. I would however rank Sci-Hub, Library Genesis and Anna's Archive higher than Wikipedia. And don't forget about Internet Archive and Arxiv.

This vast amount of high quality human knowledge was not available in the good old internet 20 years ago.


The irony is wonderful. 'Last' is a relative term, and was poorly chosen as an attention grabbing headline. Also worth mentioning is the Internet is not the WWW, and content is so often conflated with the network that it overflows into 'academic' papers.


Actually, that's great in my opinion.

Assuming that semi-convincing misinformation spreads everywhere, people will finally have to find the original source of a certain statement, verify their "knowledge supply chain", and maybe use logic to evaluate every single statement made.


This case was presented to the House Committee on Education & the Workforce in the hearing on "How SCOTUS’s Decision on Race-Based Admissions is Shaping University Policies":

https://www.youtube.com/watch?v=4Zu5cdfv9kk&t=2587s


At least this specific case was cited in a congressional hearing:

https://www.youtube.com/watch?v=4Zu5cdfv9kk&t=2587s


The discussion related to device attestation: https://github.com/antifraudcg/meetings/blob/837baaee953bc76...

The rendered version (using URL Fragment Text Directives): https://github.com/antifraudcg/meetings/blob/main/tpac/tpac-...


Perhaps serving plain text data is cheap enough so that it can be done as a non-profit/hobbyist project?

The problem is that people don't like to read text, and thus the above proposed social network might not have widespread adoption, thus reducing the network effect.


https://tildes.net - might be a thing you’re describing


I'm not sure about its math, but GPT-4 fails miserably at simple arithmetic questions like 897*394=?

The GPT-3.5 turbo is fined-tuned for arithmetic according to ClosedAI (noted in one of the change logs), so it is sometimes slightly better, but nevertheless always fails equations like 4897*394=?


Arithmetic is a pretty pathological case for ChatGPT because of BPE. Digits just tokenize in a way that makes arithmetic way more complicated.

That said, I just fed both of your examples into GPT-4 and it answered them correctly without using CoT.


This was the response that I got from the GPT-4 API yesterday:

{'id': 'chatcmpl-7uVF5xGqR1oEzXITw3WZYsnB4Yzt8', 'object': 'chat.completion', 'created': 1693700819, 'model': 'gpt-4-0613', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '353538'}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 11, 'completion_tokens': 2, 'total_tokens': 13}}

Maybe they fine-tuned the ChatGPT version better, or fed it to an calculator.


I would guess that they have a later model on web than on API (I also see worse results on API with 0613). Further testing shows that it loses the plot after a few more digits, which wouldn't make sense if they were injecting calculations.


> I'm not sure about its math, but GPT-4 fails miserably at simple arithmetic questions like 897*394=?

That's, um, about 300,000?

...

353,418 actually. But I'm not going to blame the AI too much for failing at something I can't do either.


One can resort to traditional vertical multiplication (which requires patience), or do

897*394 = (900-3) * (400-6) = 900*400 - 6*900 - 400*3 + 3*6 = 360,000 - (5,400 + 1,200) + 18 = 360,018 - 6,600 = 353,418


   8*3=24 and 800*300 =240000
   8*9=72 and 800* 90 = 72000
   8*4=32 and 800*  4 =  3200
   9*3=27 and  90*300 = 27000
   9*9=81 and  90* 90 =  8100
   9*4=36 and  90*  4 =   360
   7*3=21 and   7*300 =  2100
   7*9=63 and   7* 90 =   630
   7*4=28 and   7*  4 =    28
   --------------------------
                       353418


But you are smart enough to use a computer or calculator. And AI is a computer. So the naive expectation would be that it would be capable of doing as well as a computer.

Also, you probably could do long multiplication with paper and pencil if you needed to. So a reasoning AI (which has read many many descriptions of how to do long multiplication) should be able to also.


> And AI is a computer. So the naive expectation would be that it would be capable of doing as well as a computer.

Why would you judge an AI against the expectations of a naive person who doesn't understand capabilities AIs are likely to have? If an alien came down to earth and concluded humans weren't intelligent because the first person it met couldn't simulate quantum systems in their head, would that be fair?


The original question was whether LLM's are "smart" in a human-like way. I think that if you gave a human a computer, he'd be able to solve 3-digit multiplications. If LLM's were human-like smart, they could do this too.


Did someone train LLMs with "access" to a computer? If not, why would you expect them to be able to use something they have never seen?


“It’s right there, you stupid llm! Dammit, YOU’RE RUNNING ON IT!”


I mean, I'm running on incredible amounts of highly complex physics and maths, but that doesn't mean I can give you the correct answer to all questions on those.


I dunno, I simulate quantum systems (you, myself, my friends) in my head all the time


An AI is a program running on a computer.

Minecraft runs on a computer too, but you don't expect the Minecraft NPCs to be able to do math.

So it's a very naive assumption.

Most people struggle with long multiplication despite not only having learnt the rules, but having had extensive reinforcement training in applying the rules.

Getting people conditioned to stay on task for repetitive and detail oriented tasks is difficult. There's little reason to believe it'd be easier to get AIs to stay on task, in part because there's a tension between wanting predictability and wanting creativity and problem solving. Ultimately I think the best solution is the same as for humans: tool use. Recognise that the effort required to do some things "manually" is not worth it.


> But you are smart enough to use a computer or calculator. And AI is a computer. So the naive expectation would be that it would be capable of doing as well as a computer.

I disagree. The AI runs on a computer, but it isn't one (in the classical sense). Otherwise you could reduce humans the same way - technically our cells are small (non-classical) computers, and we're made up of chemistry. Yet you don't expect humans to be perfect at resolving chemical reactions, or computing complex mathematics in their heads.


They can reason through it they just sometimes make mistakes along the way, which is not surprising. More relevant to your comment is that if you give gpt4 a calculator it'll use it in these cases.


I am indeed smart enough to do that. And so is the AI, if you use the right AI. (I.e, code interpreter.)


I've got an engineers style mindset for these kind of calculations.

897 is about 900. 394 is about 400. 900×400 = 360,000. Only 2% error!


> [GPT] always fails equations like 4897 x 394=?

In some ways, I think we should treat GPT like a human without access to a calculator.

If you ask a human what 4897 x 394 is, they will struggle.


Sometimes chatgpt fails at tasks like counting number of a letter in a short string or checking if two nodes are connected in a simple forest of 7 nodes, even with chain of thoughts prompt. Human can solve those pretty easily.


I think I understand your logic, but ChatGPT+GPT-4 gave me the correct answer for "What is 897*394?"

https://chat.openai.com/share/00f94e43-c353-400a-858a-50c10c...

(GPT-3 gave the wrong numeric answer though)


Thanks for testing it. I canceled my ChatGPT Plus a few months ago (when they changed the color from black to purple IIRC).

So I only tested that with the GPT-4 API, with the following results:

{'id': 'chatcmpl-7uVF5xGqR1oEzXITw3WZYsnB4Yzt8', 'object': 'chat.completion', 'created': 1693700819, 'model': 'gpt-4-0613', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '353538'}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 11, 'completion_tokens': 2, 'total_tokens': 13}}


It absolutely FAILS for the simple problem of 1+1 in what I like to call 'bubble math'.

1+1=1

Or actually, 1+1=1 and 1+1=2, with some probability for each outcome.

Because bubbles can be put together and either merge into one, or stay as two bubbles with a shared wall.

Obviously this can be extended and formalized, but hopefully it also displays that mathematics isn't even guaranteed to provide the same answer for 1+1, since it depends on the context and rules you set up (mod, etc).

I should also mention that GPT-4 does quite astoundingly good at this type of problem wherein new rules are made up on the fly. So in-context learning is powerful, and the idea that it 'just regurgitates training data' for simple problems is quite false.


The crypto scams highlight the consequences of lacking regulation. Incorporating lessons from a century of financial turmoil could make crypto viable (implementing laws and regulations as code), though potentially less attractive to those seeking an alternative.


In the sense of replicating the results, we do have CI servers and even fuzzers running for our "code replication".


I don't want to derail the science discussion too much, but what if you actually had to reproduce the code by hand? Would that process produce anything of value? Would your habit of writing i+=1 instead of i++ matter? Or iteration instead of recursion?

Would code replication result in fewer use after free, or off by one than code review? Or would it mostly be a waste of resources including time?


I'm not sure if it is meaningful to divert the topic to an analogy that is never precise. But I think we always rerun the same code (equivalent to the same procedure in the papers) in the CI or your workstation to debug. Replicating the result of the program doesn't mean that one would have to rewrite the code, as in replicating the result of a computer vision paper may only include code review and running the code from the paper.


Then those unconfirmed results are better put on arxiv, instead of being used to evaluate the performance of scientists. Tenure and grant committees should only consider replicated work.


I don't agree. A published article should not be taken for Gods Truth no matter if it's replicated or peer reviewed.

Lots of "replicated" "peer-reviewed" research have been found to be wrong. That's fine, it's part of the process of discovery.

A paper should be taken for what it is: a piece of scientific work, a part of a puzzle.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: