Hacker News new | past | comments | ask | show | jobs | submit login

>Surely GPT-4 would fail some exams

Some? It does hilariously badly on basic math.

With confidence, though.




> hilariously badly on basic math. With confidence, though

How does it do on the GMAT? Sounds like a good candidate for an MBA program.


Scored 680 / 800 in this try in March 23 https://www.linkedin.com/pulse/today-i-put-chatgpt-4-test-ha...

Update GTP-4 turbo is now up to about 770 beating most humans https://twitter.com/airesearchtools/status/17569731696325880...


GPT-4 with code interpreter is better at math than elite Math undergrads.


>GPT-4 with code interpreter is better at math than elite Math undergrads.

Yeah, and I just spent $20 to call BS on what you said.

GPT-4 just bullshits with confidence. I don't see how code interpeter can possibly help here.

Behold:

======================

Me: I drew three arrows on a piece of paper starting at the same point, pointing left, up, and diagonally (up and right). Do they represent linearly independent vectors?

---------------------

ChatGPT: In this case, you have described three different arrows pointing in unique directions:

One pointing left, which can be represented as a vector, say, [-1,0] in a two-dimensional space (using standard Cartesian coordinates).

One pointing up, which can be represented as a vector, say, [0, 1].

One pointing diagonally up and to the right, which could be represented as a vector, say, [1,1].

For these vectors to be linearly independent, no vector can be expressed as a linear combination of the others. In other words, there's no way to add or subtract scaled versions of any two vectors to get the third.

Looking at the vectors you've described:

The first vector represents movement purely in the horizontal direction. The second vector represents movement purely in the vertical direction. The third vector represents movement in both directions simultaneously.

Because none of these vectors can be created by scaling and adding the other two (for example, you can't add or subtract any combination of the first and second vectors to get the third one), they are linearly independent.

In mathematical terms, if we form a matrix with these vectors as columns, the determinant of this matrix would not be zero (a non-zero determinant indicates linear independence for the columns of the matrix). This is a more formal way to show their independence, but based on their directions and the explanation above, these vectors are indeed linearly independent.


You must be using a different GPT-4 than me. I recently tried to get it to reason about probability distributions arising from combining multiple probability distributions and it immediately started hallucinating.


Enable the code interpreter. It isn't enabled by default.


I co-authored a published mathematics paper on knot theory[1] and wrote software that did the computations (and generated diagrams) in that paper as a Math undergrad, and I don't consider myself elite (though I did went on to get a PhD).

It seems like you have a very low bar for "elite", a very limited definition of "math", and a very peculiar one of "better at".

[1] https://arxiv.org/abs/0801.3253


Absolutely not. Try asking for it to solve something not well covered in the training data (e.g. a relatively performant polyhedral homotopy continuation solver) and it completely falls apart regardless of how much hand holding you give it.


Have you tried GPT recently on maths? Since they trained it to write code for maths questions it's got a lot better



Yes, I have.

It's hilarious.

Example

=======

Me: I have a small sack with yellow sand, and a big sack with blue sand of the same density.

I take some amount of sand from the small sack, and put it into the big sack. Then I take the same amount from the big sack, and put it into the small sack. Now both sacks contain some yellow and some blue sand. Is there more blue sand in the small sack or yellow sand in the big sack?

ChatGPT:

Let's denote the amount of sand transferred from one sack to the other and then back again as "X" kg. Here's what happens with each step:

After the first transfer: The small sack (originally all yellow sand) has its content reduced by X kg. The big sack (originally all blue sand) gains X kg of yellow sand but loses no blue sand.

After the second transfer: We take X kg from the big sack. Since this sack contains a mixture of blue and yellow sand, we need to determine the proportion of yellow and blue sand in this mixed X kg.

Before the second transfer, the big sack contains X kg of yellow sand and its original quantity of blue sand. Let's assume the original amount of blue sand doesn't change our calculation because we're considering the proportion of yellow to total sand when we take X kg out.

Therefore, when we take X kg out of the big sack, we're removing a proportion of yellow and blue sand based on their proportions in the sack at that moment. If the sack had a total mass of sand (before removing X) of "Original Blue + X", the proportion of yellow sand is X/(Original Blue + X), and the proportion of blue sand is "Original Blue"/(Original Blue + X).

However, the exact fraction of yellow and blue sands removed from the big sack will equate to the fractions present: i.e., if the sack had more blue than yellow, every scoop (in this case, the same amount X) will contain less yellow sand than blue sand by their ratios.

But here's the crucial understanding:

The amount of blue sand in the small sack is the amount of blue sand that was part of the X kg removed from the big sack during the second transfer.

The amount of yellow sand in the big sack after the second transfer is less than X, because some of the yellow sand (which came from the small sack originally) was taken back in the second transfer (since the mix removed contains both yellow and blue, not just yellow).

The critical mathematical realization comes from the mixture theory: when you remove a mix from a larger quantity containing varied parts, you dilute the concentration in the source. But since you're asking for a direct comparison after only a mutual exchange:

You will have removed less yellow from the big sack than you put in because you're removing a mix, meaning some of the original blue is also taken out. Thus, without needing to plug in specific values:

After the exchange, there will be more blue sand in the small sack than there is yellow sand in the big sack. This is because the portion taken back from the big sack contains both yellow and blue, reducing the amount of original yellow left in the big sack from X. Meanwhile, the small sack would have added to it an amount of blue sand equal to the portion of blue sand in the mixed X kg taken out during the second transfer, which has no prior removal, only an addition.


That's not a math problem, it's a logic puzzle

It has literally zero mathematics involved

GPT still gets lots of logic puzzles wrong.

An example maths problem that it used to get wrong but now gets right is: "what is the square route of (5943/6) times 9"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: