Hacker News new | past | comments | ask | show | jobs | submit login
The future of AI according to thousands of forecasters (metaculus.com)
93 points by ddp26 on June 27, 2023 | hide | past | favorite | 81 comments



Nowhere do they define "AGI". I guarantee that is a big reason why the predictions have so much variance.

For many people, what GPT-4 does qualified as AGI -- up until GPT-4 came out and then everyone seemed to decide that AGI meant ASI.

I am guessing for many people answering this poll it means "a full emulation of a person". Or maybe it had to be "alive".

The thing that irritates me so much is that there is this lack of definition or moving of goalposts and also stupidly people seem to assume that AI can't be general purpose or useful unless it has all of those animal/human characteristics.

GPT-4 is very general. Make it say 10-20 times faster, open up the image modality to the public, and you will be able to do most human tasks with it. You don't need to invent a lot of other stuff to be general purpose.

You DO need to invent a lot of other stuff to become some kind of humanlike digital god. But that is not the least bit necessary to accomplish most human work output.


Chatgpt 4 is amazing. Still cant comprehend they build this.

However if you work with it a lot you realize it doesn't understand reality, it predicts language.

It reminds me of YouTube videos of chess when human start to figure out chess bots and the tricks they use to draw out time.

Understanding language is super impressive, but a far cry from general intelligence. Maybe from here open ai can build further. But I have a hard time believing that this will be a foundation for general intelligence.

A lot of animals have a form of general intelligence but cant do math or language. Yet in some ways are more capable then the latest self driving cars.


Is it surprising that a system that was exclusively trained on emitted language will only do well on language?

I don't see how I can extrapolate to anything beyond that. I certainly can't extrapolate that it would be unable to learn other tasks.


Everything is defined in language. As a system is able to learn from the corpus of texts, it also builds reasoning to a large extent. For example, GPT4 can guide you through solving a problem like building a flying and crawling drone for inspecting an attic. Or, watering plants. So this goes beyond language, unless I missed your point.


Karpathy explains it far better than I did in his "State of GPT" talk about a month ago: https://www.youtube.com/watch?v=bZQun8Y4L2A

Yes, language can do that, but books and other texts, on average, leave out a lot of steps that humans have learned to do naturally. They can be prompted to do them, and their answers improve, indicates that there is definitely some level of "reasoning" there.

But it has weird limitations in its current forms. I expect that to improve pretty quickly, tbh.


Not everything. For example, how did Wagner create his music? The ability to appreciate beauty stands above reasoning. But your point largely stands, as langugage processing alone allows to build a "linear thinking machine" with super human reasoning ability. It won't be able to make music or art, but it will create a spaceship and take science to the next level. As a side effect, such AI will erase all art, as art will look like noise in its "mind".


This is the problem here.

A Claim is made. "GPT isn't general" or "GPT isn't Intelligent" or whatever but a testable definition is not made. That is, what is generality to you and what bar need be passed ? Or What Intelligence is and what competence level need be surpassed ?

Without clearly stating those things, Intelligence may well be anything or any goal and your posts could shift to anywhere.

I'm just going to tell you facts of the state we're in right now.

Any testable definition of AGI that GPT-4 fails would also be failed by a significant chunk of the human population.


> Any testable definition of AGI that GPT-4 fails would also be failed by a significant chunk of the human population.

GPT cannot function in any environment on its own. Except in response to a direct instruction it cannot plan, it cannot take action, it cannot learn, it cannot adjust to changing circumstance. It cannot acquire or process energy, it has no intentionality, it has no purpose beyond generating new text. It's not intelligent in any sense, let alone generally. It's an incredibly capable tool.

Here's a testable definition of AGI - any combination of software and hardware that can function independently of human supervision and maintenance, in response to circumstance that have not been preprogrammed.

That's it. Zero trial learning and function. All adult organisms can do it, no AI can. Artificial general intelligence that's actually useful would need a bunch of additional functionality of course, there I'll agree with you.


>GPT cannot function in any environment on its own. Except in response to a direct instruction it cannot plan, it cannot take action, it cannot learn, it cannot adjust to changing circumstance.

Sure it can. It's not default behavior sure but it's fairly trivial to set up, just expensive. Gpt-4 can loop on its "thoughts" and reflect, it can take actions in the real world.

https://tidybot.cs.princeton.edu/ https://arxiv.org/abs/2304.03442 https://arxiv.org/abs/2210.03629 https://arxiv.org/abs/2303.11366


> Here's a testable definition of AGI - any combination of software and hardware that can function independently of human supervision and maintenance, in response to circumstance that have not been preprogrammed.

Not sure how I feel about a significant portion of the population already not meeting that threshold. The elderly? The disabled? I think you're proving the parent's point.


> Not sure how I feel about a significant portion of the population already not meeting that threshold. The elderly? The disabled?

Anyone? How did the saying go? The minimum viable unit of reproduction for homo sapiens is a village.

None of us passes the bar, if the test excludes "supervision and maintenance" by other people - and not just our peers, but our parents, and their parents, and their parents, ... all the way back until we reach some self-sufficient-ish animal life form. That's, AFAIR, way below primates on evolutionary scale.

But that test is bad also for other reasons, including but not limited to:

- It is confusing intelligence with survival. Survival is not intelligence, it is a highly likely[0] consequence of it[1].

- It underplays the non-random aspect of it. Phrased like GP phrased it, a rock can pass this test. The power of intelligence isn't in passively enduring novel circumstances - it's in actively navigating the world to get more of what you want. Including changing the world itself.

--

[0] - A process optimizing for just about any goal will find its own survival to be beneficial towards achieving that goal.

[1] - If you extend it from human-like intelligence to general optimization, then all life is an example of this: survival is a consequence of natural selection - the most basic form of optimization, that arises when you get a self-replicating something that replicates with some variability into similar self-replicating somethings, and when that variability affects survival.


Define "reality."

Human experience consists of more than language, for sure. There is also a visual, audible and and tactile component. But GPT-4 can also take visual inputs from my understanding, and it shouldn't be difficult to add the other senses.

Google's experiments with PaLM-e show that LLMs are also useful in an embodied context, such as helping robots solve problems in real-time.


> you will be able to do most human tasks with it. You don't need to invent a lot of other stuff to be general purpose.

I think this is where most people strongly disagree with you. A probabilistic language model is not good enough to do anything requiring context particularly well.


Agentic uses of GPT can solve the context problem by breaking problems into steps and building prompts to solve those steps.

Can't do everything, but GPT+APIs can do a lot.


What in Chomsk's name is "agentic"?


> Nowhere do they define "AGI"

Ummm, maybe you should have looked? At the top of the very first prediction, here: https://www.metaculus.com/questions/5121/date-of-artificial-...

We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans.

Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.

Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a (or the equivalent of a) circa-2021 Ferrari 312 T4 1:8 scale automobile model. A single demonstration of this ability, or a sufficiently similar demonstration, will be considered sufficient.

High competency at a diverse fields of expertise, as measured by achieving at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al..

Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected.

By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)

Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public.


I think we're reaching a point where the Turing test is no longer useful. If you get into the nitty-gritty of it (instead of just handwaving "computer should act like person"), it's about roleplaying a fake identity. Which is a specific skill, not a general test of competence.


The Turing test seems to be a product of an era where the nature and capabilities of artificial intelligence were still in the realms of the unknown. Because of that it was difficult to conceive a specific test that could measure its abilities. So the test ended up focusing on human intelligence—the most advanced form of intelligence known at that time—as the benchmark for AI.

To illustrate, imagine if an extraterrestrial race created a Turing-style test, with their intelligence serving as the gold standard. Unless their cognitive processes closely mirrored ours, it's doubtful that humans would pass such an examination


Thank you. It was arguably never useful beyond an intuition pump. It's a test of credulity, of susceptibility to pareidolia, not reasoning ability.


Correct, which is part of the reason the "weak" AGI is relatively out there. Will anyone bother dumbing down an AI to pass a Turing Test? "Oh a human can't write a poem that fast -- it's an AI!"


Yup, missed that, thanks. Has anyone scored GPT-4 on the APPs benchmark?

I believe that if you take GPT-4 multimodal integrated with Eleven Labs and Whisper then there is a shot at passing that extended Turing test, if designed fairly. The wording is still a bit ambiguous.

Also assembling that particular scale model is probably challenging but not really a general task and something that could be probably be achieved with simulated sensors and effectors given a 3-4 month engineering effort into utilizing advanced techniques (maybe training an existing multimodal LLM and integrating it with some kind of RL-based robot controller?) at interpreting and acting on those kinds of instructions. It would be possible to integrate it with the LLM such that it could report its projects and identify objects during assembly.

So my takeaway is that with some serious attempts and an honest assessment of this bar, an AI would be able to pass that this year or next. I mean I don't know how far GPT-4 is from the 75%/90% but I doubt it is that far and so expect if not GPT-4 then GPT-4.5 or 5 could pass given some engineering effort aimed at the test competencies.

If people really are thinking 2030 or 2040 when they read "AGI" and respond to that poll (I suspect some didn't read the definition) then that would indicate that people are just ignorant of the reality of how far along we are, or in denial. Or a little of both.


You do realize that many, if not most, humans would fail this test, right?


Yes you'll find that any testable definition of AGI that has not been passed yet would be unpassable for a big chunk of the human population.

In other words, General, Artificial and Intelligent have been passed. That's why a few papers/researchers opt to call these models "General Artificial Intelligence" instead

https://jamanetwork.com/journals/jama/article-abstract/28064...

https://arxiv.org/abs/2303.12003

Or some such variant like "General Purpose Technologies" as Open AI did.

https://arxiv.org/abs/2303.10130

since "AGI" has so much baggage with posts shifting at the speed of light.


AGI is competing with human culture as a whole.

Individual humans are not exactly the best of all possible tests for AGI.


Yes, but humans as a group can do it. An AGI needs to show a similar number of AGIs can do the same given the same starting template.

The AGI will need to look at all of the tasks written, determine what the success criteria is, and then combine that that into a single set of answers. With the instructions in human-readable form, not machine readable. It can use as many or as few AGIs as it needs to accomplish this.

It's the same as if we gave these instructions to a human with sufficient skill and resources to delegate.


In my opinion a critical part of an AGI is learning on the fly from few examples. Humans do this well.

When humans play (video)games against an AI they will usually find some pattern of behavior that the AI falls into. Once they find it, they can continuously take advantage of this pattern of behavior and abuse it for great success. I think current LLMs still fall into this category. But I think they might be able to handle this relatively well quite soon.


One argument against ChatGPT's AI-ness was that it's wildly unstable: a small change in the input may become a big change in the output. When that's fixed, GPT will still need memory to learn things, as AI that can't learn isn't really AI. However the memory can be added easily: it's just a hidden step after every prompt when GPT is asked to update it's memory state (a blob of text, its scratchpad). If GPT is stable, it will update the memory more or less properly. With these two changes, and they are fairly simple, GPT5 will earn its AI badge. P.S. I find it interesting that eye and AI sound similarly, so perhaps the symbol of the future AI deity will be an eye.


Its depressing to go over all this ultra-shallow chit-chat that has short-circuited any intelligent discussion about the role and trajectory of information technology (let alone any more serious problem or opportunity of the current times).

Talking about AI (and AGI) as if its some xenomorph lurking somewhere in silicon, waiting for its inevitable escape from its human prison.

AI will not bootstrap itself with some emergent property if somebody spends gazillions of dollars and Watts to estimate petazillions of parameters.

Further progress is not going to come unless some very human brain and intelligence opens up completely new algorithmic vistas.

The future of AI is literally tied to the future development of human mental (mathematical) models around information, knowledge and its digital representation.

If not intuitively obvious, the history of mathematical thought development is crushing evidence that it follows its own dynamic over timescales that span centuries.


"We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

...

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered."

http://www.incompleteideas.net/IncIdeas/BitterLesson.html


I'm a little bit tired of the Bitter Lesson copypasta.

It doesn't seem to me like point 1 to 3 have been proven to be true about ConvNets. Sure, scaling convnets further help, I don't think anyone would argue that training a big model on more data would not at least see some improvement.

I also think there are irrefutable lower bounds of computation that it would be important to consider when building these meta-methods. If your problem requires N steps of computation for a problem of N size, but your method can only always perform at most K steps of computation, no amount of data or compute you throw at it will result in the right solution.


I don't think the parent comment is irreconcilable with the Bitter Lesson. I don't think transformers are a path to AGI, but that doesn't mean we need to go back to symbolic AI. We need to iterate on the meta-methods we use to search for and encapsulate the complexity of intelligence.


I think the bitter lesson has a lot more to do with the physical limits of von neumann computation and the structure of computing as we currently understand it than the direction that AI research should go.

That the emergent deep learning truthers are on the ascent should not be an excuse to completely discount knowledge based approaches. We're currently grappling with a number of limitations in the latest generation of ML models, particularly surrounding the cost of computation required to train and run them, and the fact that they have absolutely no way to verify the quality or correctness of their output. Clearly there are drawbacks to relying on a system we barely understand to produce our programs for us.


Do you think the human brain has bootstrapped itself with an emergent property in a way that is impossible in silicon?


There is an entirely theoretical possibility for this to happen in some universe. It doesn't particularly concern me. Its arguing about how many angels can dance on a pin's head.

The messianic prophecy of the imminence of the in-silico birth of the AGI is based on extreme, simplistic mental models: the brain is a computer, everything is input and output, there is some neural net black box "learning" inside etc.

The stance reminds me of the arrogant ignorance displayed by Laplace's demon [1]. Laplace postulated that if someone (the demon) knows the precise location and momentum of every atom in the universe, their past and future values for any given time are entailed - they can be calculated from the laws of classical mechanics.

Alas, he completely missed quantum mechanics, the very basis of how the universe works. It was discovered a hundred years after he died.

In fact Laplace's overconfidence sin is probably orders of magnitude smaller than what the breathless AGI enthusiasts currently commit: Classical mechanics and quantum mechanics share some formal mathematical similarities [2].

It feels very unlikely that anything resembling AGI will share much mathematical form with current machine learning models. The evolutionary, in-principle agnostic and general purpose "wiring" of the human brain, assuming it can abstracted somehow, lives on a different mathematical plane.

[1] https://en.wikipedia.org/wiki/Laplace%27s_demon

[2] In fact the process of conjuring up the formal apparatus required for quantum mechanics is one of the most dramatic examples of the versatile abstraction powers of the human brain.


The answer actually might be, no, why do you assume silicon can actually do what is required?

Considering we've never seen a brain only made from silicon do much, then it's likely not the right stuff.

Maybe we can brute force things that act like a brain in silicon, but considering that a bee and an ant are intelligent, communicate and solve problems in groups and run on almost zero energy compared to an entire datacenter, you have to admit, something isn't quite there.

I know this is a sensitive topic for many, so I'm not being hostile, I genuinely mean it though.

In my honest opinion, if ants or bees had thumbs and hands, they'd evolve to do some amazing things.


Counterpoint 1:

> Talking about AI (and AGI) as if its some xenomorph lurking somewhere in silicon, waiting for its inevitable escape from its human prison.

> AI will not bootstrap itself with some emergent property if somebody spends gazillions of dollars and Watts to estimate petazillions of parameters.

Not right now, though recent breakthroughs look worryingly close. But the problem isn't AGI creating itself ex nihilo, but rather us creating it. And if you look at how the world reacted to LLMs, millions of minds and billions of dollars are being thrown towards making that happen - because for the first time in history, it looks like we have an actual "angle of attack", and neither curious minds nor greedy businesses will leave it on the table.

> Further progress is not going to come unless some very human brain and intelligence opens up completely new algorithmic vistas.

All we need to do is to get to roughly average human intelligence in higher cognitive functions, running in pure software. At that point, you can have an AI programmer and an AI ML researcher of similar occupational capability as their human counterparts. But this is software we're talking about - if you can have a pair of them, you can keep horizontally scaling until you have thousands or millions of them. And if you let them directly or indirectly work on their own code... they won't stay human-level for long. You get a feedback loop that's to our technological progress what our technological progress is to biological evolution.

Counterpoint 2:

Maybe it's clearer if instead of using the word "intelligence", we distill it into "optimization". After all, at high level, this is what intelligence is - a powerful optimization process. Now, we humans are no strangers to optimization processes that get out of control. Corporations, arguably, are one. Markets are definitely one, and the aggregate market economy is a powerful optimization process that's already out of control - it's self-directing and owns us. Culture arguably is too, but it's a weak one.

Consider that every time people say things like "if only we weren't so short-sighted and greedy, we would've solved poverty/war/climate change". If you take seriously the sociological and macroeconomic fact that humans follow incentives, then this really becomes "the system/economy/market makes it impossible for us to solve those problems", and... well, what is that system if not for a self-directed optimization process that arose, unintentionally, from our individual needs and desires? Well, semi-unintentionally - free market proponents use exactly this strong optimizer nature to argue why the market is fair and awesome and better than any kind of central planning.

Now, as powerful optimizer as our economic system is, it's still kind of dumb. It averages away individual human intelligence. But what if it didn't? What if it had an actual human-level mind of its own, or at least a human-level mind that's able to perceive and act on the market faster than any one human or group of humans could? Literally getting inside our collective OODA loop? This influence won't get cancelled out - at this point, we'd be screwed.

And that's just one possible scenario - AI tech giving a mind to the economy itself. But the broader point here is, we've already established we can accidentally create powerful generic optimizers that end up controlling us. The ones we deal with today are slow, because they run on humans and human interaction in physical space. But when we create an optimizer that runs on our digital infrastructure, that one will be faster.


"AI" overlaps with the more profound and (potentially) far more dangerous problem of the "algorithmization" of society [1].

"Out-of-brain" information technology is something that started long time ago with the agricultural revolution and the first cuneiform tablets. It has been shaping our societies ever since. Once the automation inherent in these constructs is baked-in it doesn't require over-complex black box models to create complete havoc.

[1] Defined as the institutionalization of intermediating information technology artifacts that humans cannot opt-out from if they want to be part of society.


Profound, yes. Far more dangerous? Well, it's really the same thing. Automation slotting itself in between people and institutions can be harmful at various scales, but if that automation gets sophisticated enough - that is, once the automation is (equivalent to) powerful AI - we'll be facing an extinction-level threat.


I am of the opinion that this AI/ChatGPT, or atleast the current form of it, is just another VC money shuffling business which has no long term real world consequence. Like VR and crypto, while being great technical experiment but eventually they have not reached any kind of sufficient adoption. iPhone and Bitcoin came out at basically the same time and see how many people have iPhone and how many even know about bitcoin. And yet, Crypto & VR have been VC darlings for all these years. Generally I feel all the usecases for this AI are incredibly depressing and don't give us any hope that they will make the world a better place. I can totally imagine that this AI can replace some jobs, but who really benefits from it? I certainly do not, government probably doesn't, only the big corporations. We would have more umemployment, less tax collections and overall we will end up worse than before. So why are we even investing in such a tech? Who would you be even selling stuff to if all of us are unemployed?


I don't know what you are smoking if you think ChatGPT doesn't have real world consequence.


Sorry, I forgot the endless amount of spam generation.


AGI by 2040 huh. I bet it won’t happen by 2100, and I’m young enough that I can look back on this at 2040 and see how wrong/ correct I was. Future me, don’t forget!


Depending on who you ask, we will never have it or we have it already. We never had a perfect definition of AGI, but I feel like now the definition has become so murky to be useless.


My definition would AGI as an AI embodied in a human robot that can do all the physical and cognitive tasks a regular human can, for example I would assume 90% of humans can be an Uber driver, and so the AGI system should be capable of being an Uber Driver. While perhaps only 5% can be a theoretical physicist or a professional tennis player, so I wouldn’t expect my AGI to be capable of that


I kind of wonder if we'll just evolve past the fascination of it, like maybe the problems we need to solve will become so much easier through ML algorithms, we'll just get past it all without the "G" in AGI and become distracted with new things and leave behind being fascinating with computers and chat bots in general.

and I know this sounds wild to some, even unthinkable, but I can't imagine people sitting around in 2000 years talking about "computers".

We might even get past the idea of needing "computing" as we find new ways of achieving our goals. I mean who knows, maybe we'll learn to augment our own intelligence through biohacking?

Right now it feels like AI is the actual be all and end all of everything, but I like to experiment with think about what's over the horizon.


Some day people will realize that intelligent automaton is an oxymoron. Something designed and built will always operate on rails devised by their creator. Intelligence means the ability of being the creator.

Interestingly, genesis myths usually say god(s) made humans in their image. It's a metaphor for reproduction. We create intelligent beings all the time.


OP in 2040: "It's not AGI because it doesn't have a soul."


Yeah I feel like people have turned the definition of AGI into what we previously called omnipotence.


True, though clearing up some of the murkiness is part of what forecast questions like these are for, with their explicitly spelled out resolution criteria.


Murky definitions help with the marketing hype.


How cool would it be if we could build stuff that directly builds the society we want? Call it utopia or whatever who cares.

But I wish there were other people in the game (none that I know personally at least, lots just act like accountants lol) who want to use their money to build and enable the kind of world at large, like we are playing a city builder irl. Making money is whatever but making a world is just _chefs kiss_.

To use the technology available today in ways that people don’t even have to think, but it leads to good outcomes.

AI seems to be one of those vectors and I hope it works out. I’m personally all in with even just LLM applications.


Where's the poll option for "it continues to be overhyped junk spewing misinformation, then arguing with the user (Bard) or acquiescing when pressed regardless of the correctness of the rebuttal (ChatGPT)"?

You can surely get a indistinguishable imitation of human text from including things like Reddit comments in the LLM training data. Correctness is a hurdle I am not convinced will be surpassed.


It sure seems like people are getting value from the current crop of tools: https://fortune.com/2023/06/15/nine-of-ten-developers-using-...


Cool, now try it with a knowledge field that isn't code that can be tested for the proper output.


I find it useful for a few different non-coding work tasks, and I don't think I'm unusual there.


I write technical finance/econ articles and the only thing I've found it useful for is getting a vague outline of typical topics that might be touched on in an entry-level article about something.

For example to GPT-4 I ask:

>is silver a good investment? what are the considerations to take into account?

Returns essentially an enumerated list of topics about supply and demand, volatility, storage considerations, and opportunity cost compared to other investments. These are very basic explanations of a sentence or two and usually correct, but too general to be of much use.

I then ask:

>how has the historical performance of silver compared to gold and the S&P 500 over the past 5 decades?

And it provides vague history of what happened to each investment in each decade, no quantifiable data. Okay, let's ask:

>Re-consider the above prompt using quantifiable data, how much would an investment of $1,000 per month adjusted for inflation into each of these investments have compared at the end of the 5 decades?

Output: I apologize for the confusion, but as an AI language model, I don't have access to real-time data or the ability to perform specific calculations.



Is there any already resolved question in which metaculus forecasters made a surprisingly accurate prediction about an AI advance?


https://www.metaculus.com/questions/track-record/

Only 13 resolved binary questions where you had longer prediction horizons (1+ year), the accuracy is zilch in AI category - Brier score of 0.25 which is akin to just guessing out 50% for all questions. Generally overconfident.

(Other categories much better - Brier of 0.14 1 year out)


There are several AI categories on the track record page; make sure you're not just selecting the one or you'll miss a lot. There's a careful analysis of the overall track record on AI questions here: https://www.metaculus.com/notebooks/16708/exploring-metaculu...

The short version is that the Brier score is much better than .25 for AI questions, and the weighted Metaculus Prediction is more accurate still.


Good call on the categories.

> The short version is that the Brier score is much better than .25 for AI questions, and the weighted Metaculus Prediction is more accurate still.

Added more categories. 1 year out is 0.217. I agree that's better than chance, though "much better"?

That said, this is dominated by bad community predictions pre-2020 and there's not much data recently for binary questions. I agree that CRPS is better - but it's not clear to me from that link how early they are looking at questions - accuracy gets better closer to resolve date -- I'm claiming that longer-term predictions are shakier.


And you may want to check the weighted Metaculus Prediction if you haven't already.


Can I see the list of questions used in this analysis somewhere? Is it literally just the set of questions I see when I filter for "Resolved" and "Artificial Intelligence"?

My impression from browsing that set of questions is that it's a mix of pretty trivial things like "how expensive will chatGPT be?" or "when will Google release Bard?". There are very few questions in the bunch I'd even consider interesting, let alone ones where the metaculus prediction appears to have offered any meaningful insight.


No, it's also 'AI and Machine Learning,' 'Artificial Intelligence, and 'Forecasting AI Progress.' The list of resolved questions is roughly this: https://www.metaculus.com/questions/?status=resolved&has_gro... (though that will include a few questions that have resolved since the analysis.)


Anyone can signup and vote, so I assume it tends towards the majority option tbh. Seems like obviously noisy data.


The Metaculus prediction weights predictors by how accurate they've been.


Yes, but if voters are voting the majority (and probable) decision, that just continues the trend of predicting obvious outcomes.

It’s not something to do with understanding domain knowledge and voting based on scientific research.


I wouldn't say AGI in 10 years is predicting an obvious outcome. And the same goes for many other questions on the site.


It will trend towards the major opinion as it gets closer to 10yrs. They haven’t been around long enough.


The numbers there seem odd (or did before the page broke, possibly due to the HN 'hug of death')

90% chance of near term "human level" AI but only a 75% chance someone tries to influence the US election with "AI driven misinformation" which for most definitions of "AI" and "misinformation" is already happening?


Something about the page is certainly broke. I don't get the graphs to load until I use the mouse wheel on their ui elements


Hug of death is fixed, even this modest traffic was too much for our cache.

| 90% chance of near term "human level" AI but only a 75% chance someone tries to influence the US election with "AI driven misinformation" which for most definitions of "AI" and "misinformation" is already happening?

Yes, if 2040 is "near term". The majority view is that before 2025, the only major AI safety concerns are deepfakes, misinformation, and maybe social engineering.

But over the subsequent years, the consensus shifts to predicting widespread disruption.


"Forecaster" here means any rando who signs up to their service and answers a question.

Sometimes a large polling number does not equal a more accurate answer.


Anyone can forecast, sure. But there's a large body of research on the accuracy of aggregated forecasts and on the ability of forecasters to become more accurate with practice. (Thinking here in particular of work by Mellers & Tetlock.)

Metaculus provides a transparent track record of community forecasts here: https://www.metaculus.com/questions/track-record/ It's very difficult for any one person to consistently beat the community.


this isn't quite true -- on metaculus, accounts that have a history of forecasting things well are weighted more heavily


SISO (Shit In, Shit Out) still applies. You guys need a high quality user base with domain knowledge, at least as a seed. There is no proof that you have that at the moment.

Edit:

Okay, that track record page avionical posted in a separate comment is actually a bit convincing now that I dig deeper into it. :-)

I suppose that for e.g. AI/AGI a weakness could be that the estimates for most of the users have been short term (a few years at most) but the AGI estimates are 8-17 years away. Those estimates are a lot harder to do make and hence surely a lot less accurate.


What would be good evidence of a high-quality user base with the relevant skills? A transparent, well-calibrated track record?


> Metaculus predicts a 75% likelihood of an attempt to influence the 2024 US presidential election with AI-driven misinformation.

Hasn't that already happened? Wasn't DeSantis caught running deepfaked voices in an ad? Or am I misremembering something


It was fake images not fake voices, but yeah looks like it has already happened, not sure if it fully meets the criteria in the market.

https://www.npr.org/2023/06/08/1181097435/desantis-campaign-...


It's more specific than that once they actually operationalize the question:

This question resolves as YES if, in Meta's 2024 Q4 Quarterly Adversarial Threat Report, Meta claims that there was at least one "coordinated inauthentic behavior" that

specifically pertained to the 2024 US Presidential election, and Meta suspects was primarily conducted via AI.


The operationalized question seems to be more of a bet on what Meta admits happens on its platform than what happens!

We know that people aiming to influence the elections use bots to inflate particular sentiments, statistical techniques which are arguably 'AI' to target messaging and have the capability to effortlessly generate varations on content using LLMs... it'd almost be odd if people didn't do so in a coordinated manner for the US election.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: