I'm not sure how to respond to your point about Bem and Baumeister's work since ...

authorfly · 2024-08-09T07:25:33 1723188333

Fair enough, you might have indeed rejected those authors - however, vast swathes, for Baumeister the majority, did not at the time. It's almost certainly true now for existing authors we are yet to identify, or maybe never will.

I admit the point on TESS, I didn't research that enough. I'll look into that at a later point as I have an interest in learning more.

To address your studies regarding expert / study forecasting - thank you for sharing some papers. I had time and knew some papers in the area so I have formulated a response because, as you allude to later regarding cultural predictions, there is debate in the question of the usefulness of expert vs non-expert forecasts (and e.g. there is a wide base of research on recession/war predictions showing the error rate is essentially random at a certain number of years out). I have not fully comprehended the first paper but I understand the gift of it.

Economics bridges sociology and the harder science of mathematics, and I do think it makes sense for it to be more predictable than psychology studies by experts(and note the studies being predicted were not survey-response like most are in psychology), but even this one paper does not particularly support your point. Critically, one conclusion in the paper you cite is that "Forecasters with higher vertical, horizontal, or contextual expertise do not make more accurate forecasts.", "If forecasts are used just to rank treatments, non-experts, including even an easy-to-recruit online sample, do just as well as experts", and "Fourth, experts as a group do better than non-experts, but not if accuracy is defined as rank ordering treatments.". "The experts are indistinguishable with respect to absolute forecast error, as Column 7 of Table 4 also shows... Thus, various measures of expertise do not increase accuracy". Critically at a glance, of the selected statements, almost 40% are outperformed by non-experts anyhow in Table 2 (the last column). I also question the use of Mturk Workers as lay people(because of historic influences of language and culture on IQ tests, the lay person group would be better being at least geographically or WEIRD-ly similar to the expert groups), but that's a minimal point.

Another point that further domain information, simulation or other tactics does not impact the root issue of the biased dataset of published papers - "Sixth, using these measures we identify `superforecasters' among the non-experts who outperform the experts out of sample.". Might we be in danger, with some claim 8 years later with LLMs, of the very "As academics we know so little about the accuracy of expert forecasts that we appear to hold incorrect beliefs about expertise and are not well calibrated in our accuracy. " that the paper warns against?

I know what you are getting at that these are not replications, that it feels elementally exciting that GPT-4 could simulate a study taking place - rather than a replication as such - and determine the result more accurately than a human forecast. But what I am saying is, historically, we have needed replication data to assess if human forecasts (expert and non-expert) are correct long term anyhow, and we need those to be for future or current replications to avoid the training data including the results, to draw any conclusion about the method of GPT-4 in getting this accuracy in forecasting results with any method, simulation or direct answer. The idea that it is cheaper to run GPT queries than recruit human participants makes me wonder if you are actively trolling though - you can't be serious? Fields in which awful statistics and research goes on all the time, awaiting an evolution to a better basic method, and a result that is accurate 3% higher than a group of experts, when we don't even know whether those studies will replicate in the long run (and yes, even innocently pre-registered research tends to proliferate more false positives because the proportion of pre-registered studies published is not close to 100% and thus the results of false-positive publishing still occur https://www.youtube.com/watch?v=42QuXLucH3Q

The problem is until we have fundamentals more stable, small increments and large claims on behaviour are repeating the mistake of anthropomorphizing biological and computational systems before we understand them to the level we need to, to make those claims. I am saying the future is bright in this regard- we will likely understand these systems better and one day be able to make these claims, or counter-claims. And that is exciting.

Now this is a seperate topic/argument, but here is why I really care about these non-substantial, but newsworthy claims: Lets not jump the gun for credence. I read a PhD AI paper in 2011. It was the very furthest from making bold claims - people were so low-mooded about AI. That is because AI was pretty much at its lowest in 2011, especially with cuts after the recession. It was a cold part of the "AI winter". Now that AI is raring at full speed, people overclaim. This will cause a new, 3rd AI winter. Trust me, it will, so many members of faculty I know started feeling this way even back in 2020. It's harmful not only to the field but our understanding really, to do this.