Hacker News new | past | comments | ask | show | jobs | submit | data_maan's comments login

Just from the way this paper is written (badly, all kinds of LaTeX errors), my belief that something meaningful was proved here, that some nice mathematical theory has been developed, is low.

Example: The first 10 pages are meaningless bla


Sorry but you're just wrong. There are issues but the paper is written well enough. The content (whether this is really a novel enough idea) is debateable because anyone could have told you that LLMs aren't going to develop the halting algorithm.


Have you actually read the paper/know how a ML paper should be written?

Here are some of the issues: - section 1.4.3: Can you explain how societal consequences of LLM hallucinations are in any way relevant for a paper that claims in the abstract to use mathematical theories ("computational theory", although that is an error too, they probably mean computability theory)? At best, such a section should be in the appendix, if not a separate paper. - section 1.2.2: What is with the strange subsections 1.2.2.1, 1.2.2.2 that they use for enumeration? - basic LaTeX errors, e.g. page 19, at L=, spacing is all messed up, authors confuse using "<" with "\langle", EFC.

So no, I'm afraid you are wrong. The paper violates many of the unspoken rules of how a paper should be written (which can be learned by reading a lot of ML papers, which I guess the authors haven't done) and based on this alone, as it is, wouldn't make it into a mediocre conference, let alone the ICML, ICLR, NeurIPS.


What I don't like about this is that they had to build a separate system.

Why wasn't it possible to contact arXiv and do this in collaboration with them?


Seems like it could just fill the role of the exploitative publishers if it takes off. Community does the work (scientists volunteer peer review), site gets the profit and locks everyone in due to network effects of being the hub for discussion. Eventually starts charging for boosting your paper with guaranteed return in citations etc. I'm just assuming it is venture backed due to having a team of advisors and stuff and about page being LinkedIn but sorry if I'm off on that and it is nonprofit or something.


This is precisely my worry!

While it says "a Stanford project" at the top, this could mean anything. It could in particular allude to the fact that Stanford professors are advisor.

The advisors are well-known professors, but some, like Sebastian Thrun, have an entrepreneurial background, so probably a company sits behind this.

For me, the entire project gets a NO CONFIDENCE vote because I can't tell who they are, who pays for moderation (which they claim to do etc.).

I'd feel much more at ease if this was done by arXiv, or at least endorsed by arXiv.


Behind all the technical lingo, what problem does this solve that cannot be solved by sticking to a git repo that tracks your research and using some simple actions on top of GitHub for visualization etc.?


Remember the famous HN comment:

“This ‘Dropbox’ project of yours looks neat, but why wouldn’t people just use ftp and rsync?”


The fact that software engineers are the only folks with the skills to do what you just said.

When I was working on PhD thesis 20 years ago, I had a giant makefile that generated my graphs and tables then generated the thesis from LaTeX.

All of it was in version control, made it so much easier, but no way anyone other than someone that uses those tools would be able figure it out.


> The fact that software engineers are the only folks with the skills to do what you just said.

I've always been impressed by the amount of effort that people are willing to put in to avoid using version control. I used mercury about 18 years ago, and then moved to git when that took off, and I never write much text for work or leisure without putting it in git. I don't even use branches at all outside of work - it's just so that the previous versions are always available. This applies to my reading notes, travel plans, budgeting, etc.


Version control is fantastic, and you can get quite creative with it too. Git scraping for example (https://simonwillison.net/2021/Dec/7/git-history/). But as nice as Git is, people who are not trained to be a software developer or computer scientist often don't have a lot of exposure to it, and when they do it's a relatively big step to learn to use it. In my mechanical engineering studies we had to do quite a bit of programming, but none of my group mates ever wanted to use version control, not even on bigger projects. The Jacquard notebook and other Ink&Switch projects are aimed at people with non-software backgrounds, which is quite nice to see :)


Oh, they all use version control.

It just looks like "conf_paper1.tex" "conf_paper3.tex" "conf_paper_friday.tex" "conf_paper_20240907.tex" "conf_paper_last_version.tex" "conf_paper_final.tex"

...

"conf_paper_final2.tex"

Oh, and the figures reference files on local dir structure.

And the actual, eventually published version, only exists in email back and forth with publisher for style files etc.


I once worked with a professor and some graduate students who insisted on using box as a code repository since it kept a log of changes to files under a folder. I tried to convince them to switch to git by making a set of tutorial videos explaining the basics but it was still not enough to convince them to switch.


When github started, for most people the only purpose was just so you didn’t have to manage a server holding your repository. To avoid using it at that point for private projects required all of ssh and a $5/mo virtual machine somewhere, and all of their customers could follow the steps to set that up. It still succeeded.


I thought there Terence Tao was the Mozart of maths. So confused

https://www.smh.com.au/lifestyle/terence-tao-the-mozart-of-m...


Yeah. Or maybe Grothendiek. Oh sorry I forgot he’s the Einstein of maths[1]. In general calling people the Mozart of some field is pretty lazy.

[1] https://www.spectator.co.uk/article/the-einstein-of-maths/ in spite of the fact that Einstein probably would have considered himself the Einstein of maths.


:D


> Also some junk senders seem to have worked out that the sub-domain I use for the per-entity addresses is a catch-all, I need to address that at some point.

Could you elaborate how you'd address that?


There are a few options I've thought of, including these ones off the top of my head:

1. Just enable each on first use, instead of using a catch-all at all, though there is a danger there of bounces due to mistakes on my part.

2. Keep the catch-all but send to a junk folder unless the destination address is on a white-list, this has the advantage that if I forget to add the new address (or do it incorrectly) no mail is lost as I can move it out of junk after the fact (as long as I notice within 30 days).

3. Generate the addresses either fully or as <picked-portion>.<truncated-salted-hash-of-that>@sub.domain.tld, so ea588e3be96e89.8177be49@sub.here.com or SomeShop.499ec679@sub.there.com, and use programmatic filtering to decide where the messages go (junk unless the hash matches). The disadvantages of that are needing to have access to the generator at any time I need an address, difficulty giving addresses verbally (there would be transcription errors), and it does nothing for existing addresses.

Option 2 is probably the winner there.

Every now and then I think of another option then dismiss it as overcomplicated or otherwise not workable.


I think the idea of having a token in the address could work!

Unless you’re being specifically targeted, it doesn’t need to be anything particularly long or secure though—just needs to not fit the same pattern as everyone else.

It could be as simple as “the letter before the first letter and the letter after the last letter” or something easy enough to generate in your head when filling out a paper form. (E.g., “hackernews@“ becomes “hackernews.gt@“; “google@“ becomes “google.ff@“)

Or “numbers of vowels mod 10” (“facebook@“ becomes “facebook.4@“; “medium@“ becomes “medium.3@“).

You could always combine this with option 2 as well. Anything that has a valid token goes through the regular filters, everything without goes straight to junk.

… In fact, I think I’m going to do this for my own domain.


You're right, that is a good simplification of that option to make it less tech dependent:

* Generate addresses with that process (that can be done in the head)

* Programmatically verify the incoming addresses to filter junk/not (hopefully not difficult with postfix, just use the “standard” config-in-db setup with a stored procedure to do the mapping instead of a simple SELECT so you can check the checksum (in fact, it should be possible with a SELECT but I expect the syntax will be more messy that way))

* Still have a whitelist for addresses previously used, unless using a new sub-domain for this setup

* Still send mismatches to a junk folder not /dev/null in case of mistakes

I might go that way when I finally get around to rebuilding & migrating my mail server(s)…


Which experts said that?

I don't think that's the case at all. The writing was already on the wall.


The writing was on the wall for the last year and a half (in fact I lost a bet to an IMO medalist about AI getting IMO gold by 8/2023) but three years ago this was unimaginable.


It's bullshit. AlphaGeometry can't even solve Pythagoras theorem.

Not opensourcing anything.

This is a dead end on which no further research can be built.

It violates pretty much every principle of incremental improvement on which science is based. It's here just for hype, and the 300+ comments prove it.


> Everything beyond nothing is an imperfect and unstable solution held together with duct-tape.

What a depressing worldview... :(


I dont think it is depressing. I think it is marvelous and a testament to human engineering, perseverance, and collective spirit that we hold things together.

As I said down thread, it is something to be grateful for, given that we are always one major catastrophe away from robbing and killing each other in the street.

While sure, things could be better, I think it is overly cynical to consider the status quo garbage. If you wanna see some real garbage, go to Haiti, Donetsk, or Liberia and report back about US society isnt "good and just".


I'd challenge you to attack it logically on it's merits. I think parent poster is obvipusly right, civilization exists on a precipice at all times. The natural state of the universe is toward entropy.

That's not to say we can't make it better, just that we should have no illusions about its stability.


Lots of folks on the internet misunderstand entropy.


Don't read that. That's one of the most horrible papers I know, a hodgepodge of mathematical well-known concepts thrown together with some vague ideas how they connect. The mathematical parts are explained better in any undergraduate book. I'm not sure why the author felt the need to expound on what a manifold is when that has been done better in hundreds of other texts - literally.

And it lacks a definite conclusion: They don't prove anything, don't make any particular experiment, but just loosely talk about how these ideas might be relevant to machine learning.

I'm surprised that such highly cited researcher have produced such a paper. I would be embarrassed to be on it - and I'm embarrassed on behalf of the ML community that they are citing it.


Do you have better references you could share?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: