Hacker News new | past | comments | ask | show | jobs | submit | motoboi's comments login

I suppose you didn't get the news, but google developed a LLM that can play chess. And play it at grandmaster level: https://arxiv.org/html/2402.04494v1

That article isn't as impressive as it sounds: https://gist.github.com/yoavg/8b98bbd70eb187cf1852b3485b8cda...

In particular, it is not an LLM and it is not trained solely on observations of chess moves.


Not quite an LLM. It's a transformer model, but there's no tokenizer or words, just chess board positions (64 tokens, one per board square). It's purpose-built for chess (never sees a word of text).

In fact, the unusual aspect of this chess engine is not that it's using neural networks (even Stockfish does, these days!), but that it's only using neural networks.

Chess engines essentially do two things: Calculate the value of a given position for their side, and walking the tree game tree while evaluating its positions in that way.

Historically, position value was a handcrafted function using win/lose criteria (e.g. being able to give checkmate is infinitely good) and elaborate heuristics informed by real chess games, e.g. having more space on the board is good, having a high-value piece threatened by a low-value one is bad etc., and the strength of engines largely resulted from being able to "search the game tree" for good positions very broadly and deeply.

Recently, neural networks (trained on many simulated games) have been replacing these hand-crafted position evaluation functions, but there's still a ton of search going on. In other words, the networks are still largely "dumb but fast", and without deep search they'll lose against even a novice player.

This paper now presents a searchless chess engine, i.e. one who essentially "looks at the board once" and "intuits the best next move", without "calculating" resulting hypothetical positions at all. In the words of Capablanca, a chess world champion also cited in the paper: "I see only one move ahead, but it is always the correct one."

The fact that this is possible can be considered surprising, a testament to the power of transformers etc., but it does indeed have nothing to do with language or LLMs (other than that the best ones known to date are based on the same architecture).


It's interesting to note that the paper benchmarked its chess playing performance against GPT-3.5-turbo-instruct, the only well performant LLM in the posted article.

You are probably familiar with a document called OAuth Threat model.

In that document, refresh token rotation is preferred, but it also addresses the obvious difficulty in clustered environments: https://datatracker.ietf.org/doc/html/rfc6819#section-5.2.2....


I suppose we had cloud long enough to allow for people that never had a datacenter think this is a good idea?

I managed a tier 2 datacenter for some 4 years. Fire risk alone still gives me anxiety.

“Cloud It’s more expensive stupid!” Yeah… you know what’s way more expensive? People.

People and processes you’ll have to manage to achieve SLAs like Amazon’s? Well, good look with that. If you manage to find the people please write a book about how you achieved the process.

Oh, and new services on top of that every time a new technology appears? With the SLA and the consistent quality?

And the hardware, the building, the never ending decay of physical things. The drives for gods sake. The never ending pursuing of storage space.

The enterprise premium for hardware, that you’ll pay to have the manufacturer saving your ass. And please don’t think your employees will let you touch the fancy open source or community versions of things. You’ll blink and you’ll be paying the premium (and thanking the vendor for the option), because the risk won’t let you sleep.

And the most amazing part: business in you heels demanding innovation, quality, uptime and… change.

Four years, 10 years ago and my palms just got sweat thinking about that.

Your company is lucky that someone is willing to receive money to do that for you. And luckier yet that this company can do that very well. So well in fact that is very easy to forget how well they do that.


> People and processes you’ll have to manage to achieve SLAs like Amazon’s?

In reality, you can have almost any people and processes. The trick is to put your servers and data in more than one place. If you have uptime of just 99% for a server (~3 days off in a year) and have them in 2 unrelated places, you will get 99.99% uptime. 3 places will give you 6 9's. The only thing that has to be ensured by people and processes is graceful fallback.

Notice how I say uptime and not SLA. SLA just means that you will get a little bit of money back if uptime dips below the SLA level. Oh, and for EC2 it is just 99.95%. So, if you really care about your users, you will engineer your systems to stay up rather than hoping that a third-party provider's SLA will save you.


That assumes the only causes of failure are environmental. I’ve definitely seen plenty of hardware failures but software failures are common, too, and keeping things synchronized is going to require more than “any people and processes” - that’s how you learn your backup has never been tested and database replication stopped working 3 days before the failure.


Not forgetting operational failures due to human mistakes when doing delicate stuff on complex environments, and setting up on-prem infra to work like a hyperscaler does... well, it's not easy.


Yeah, if they’ve actually tried that approach well, I hope their luck holds but I wouldn’t bet on it.


All those points OP raised as difficult - physical space, staffing, capex, etc. - and your response is “yeah, now do it twice”.


Well, dedicated servers (for which you can have private cage, or VISA compliance, or ...) are a markup of say 30% over base cost, which is still 1/5th the cost of AWS. And even Hetzner will just deliver Kubernetes clusters these days.

These avoid all of the costs you were talking about.


> In reality, you can have almost any people and processes.

You've never tried it, huh?

The reality is you will need some very specific processes.

You'll want a test environment, so you can make sure that proposed router reconfiguration actually does what it's supposed to do, and a process that says to use the test environment, and a process for keeping it in a consistent enough state that the tests are representative.

You'll want a process to make sure every production change can be reversed, and that an undo procedure has been figured out and tested before deployment. When that's impossible, you'll need careful review.

You'll want a process to make sure configuration changes are made in all three production data centres, avoiding the risk of a distracted employee leaving a change part-way rolled out.

But you can't roll out to all three sites at the same time, what if the change has some typo that breaks it? So you'll want a gradual process.

You'll want to monitor the load on the three systems, to make sure if one goes down that the other two have enough capacity to take over the workload. You'll have to keep monitoring this, to keep ahead of user growth.

Did I mention the user growth? Oh yeah we're expecting a surge in demand just before christmas. The extra servers we got last christmas have absorbed our user growth, so we'll need more. Of course it'll take time to get them racked and set up, and there will be a lead time on getting them delivered, and of course a back-and-forth sales process. So of course we'll have to kick off the server ordering process in August.

Of course, there's a chance of a partial failover. What if the web servers are still working in all data centres, but the SQL server in data centre B has failed, while the replicas in A and C are fine? If there's a software hiccup you'll need to figure out who to call - yet another process...


Take off those rose-tinted cloud spectacles and put them in the nearest trash can @michaelt !

> You'll want a test environment

You need that in the cloud too...

> You'll want a process to make sure every production change can be reversed

You need that in the cloud too...

> You'll want a process to make sure configuration changes

You need that in the cloud too....

> you'll want a gradual process.

You need that in the cloud too...

> You'll want to monitor

You need to do that in the cloud too....

> user growth / surge in demand

The problem with the cloud is everyone thinks they need to design for Google-scale from day zero.

Sure the cloud providers don't mind, more money for them ...

> there's a chance of a partial failover.

Could, and does, happen in the cloud too....


Layers, layers and more layers.

Cloud providers want to provide some layers for you. Thank god they exist and live with the layers you can’t pay someone else to do.


I've yet to see a cloud deployment that didn't have just as many people managing it as an actual datacenter. And those people were higher paid. None of the examples I have access to shows a cost saving from the effort.

(And it doesn't save you from downtime, either. Cloud just introduces a new set of risks. Also, Most companies don't actually need anywhere near the kind of infrastructure that AWS does).


This is what people don't realize.

At a certain scale you need tons of people to operate a cloud environment, especially if you take advantage of all the features.

Sure for tiny players the devs can be forced to both write the business code and manage the infrastructure but at a certain scale that just doesn't work because everything becomes non-standard and the amount of time to manage it takes up all the time the devs should be spending on writing business related code.

So then you spin up an infrastructure team. And it won't just be one either because the person creating your VPC/network design won't have time to also be working on your backup design or your database design or your image pipeline or your CI/CD.

So at the end of the day you end up with all the same teams you had in managing your datacenter infrastructure (network, storage, OS) except now you're paying way more for the infrastructure itself and the people cost more because they are not just specialist in their area but have to be specialists in cloud platforms as well.


The thing about cloud is that it bundles in tons additional costs that while yes, they do provide a better product in measurable ways, also you might not actually want.

What if I want a cloud, but I only want two or three 9s of availability?


> you’ll have to manage to achieve SLAs like Amazon’s?

Aaah yes, those AWS SLAs that ....

    - (a) Famously apply anywhere except US-EAST region which has a proven tendency to blow itself up on a scheduled basis
    - (b) Are more marketing fluff than reality.  A bit like the old Verizon leased-line 100% uptime SLA, nobody sensible who worked in IT believed it, but your boss was paying the Verizon-tax so that Verizon could afford to pay out some hard-dollars when (not if !) the SLA was inevitably breached.
I would also invite you to actually go read the AWS SLAs. The famous 11-9's S3 SLA for example[1] ? Barely worth the paper it is written on. The S3 SLA is silent on data integrity or any other data metric. It only covers you for 500's returned by the AWS S3 API, the "Error Rate":

     "Error Rate” means, for each request type to an Amazon S3 storage class: (i) the total number of internal server errors returned by Amazon S3 for such request type to the applicable Amazon S3 storage class as error status “InternalError” or “ServiceUnavailable” divided by (ii) the total number of requests for such request type during the applicable 5-minute interval of the monthly billing cycle. We will calculate the Error Rate for each AWS account as a percentage for each 5-minute interval in the monthly billing cycle.
Also, its got the world's best get-out clause, the SLA only applies if you actively use the service:

    "If you did not make any requests in a given 5-minute interval, that interval is assumed to have a 0% Error Rate."
So, yeah, give me a break about those "amazing" AWS SLAs. :)

Also what happens if AWS breach their over-confident SLAs ? Oh yeah, that's right, typical corporate style hand wringing, an apologetic sounding email written in corporate-speak and approved by legal, and that's that. Maybe a mea-culpa public blog post if you're lucky.

Also when you have your own datacentre, you don't have all the stupid nickle and diming of the cloud providers. The major cloud providers are particularly bad about it, all sorts of obscure "hidden" charges just waiting to bite you in the backside if you don't look carefully at the bottom half of their price sheets.

[1] https://aws.amazon.com/s3/sla/


What do you think about product by Google - Google Distributed Cloud Air-Gapped?

Although the name is "air gapped" - it does not have to be if client doesn't want air gapping. It's "buy commodity hardware from vendors HP, we will give you software and training to manage it".

Much leaner stack than whole GCP/AWS/Azure, but deployed "on-prem" with "cloud-like" experience.


> I suppose we had cloud long enough to allow for people that never had a datacenter think this is a good idea?

Centralization vs. Decentralization. And the pendulum swings on.


Yeah. Also the young blood always assume the gray haired guys were just some kind of stupid and lame and simple couldn’t see how easy is to setup PHP with Linux on a commodity class hardware running somewhere in the basement.


Just pass a link to a GitHub issue and ask for a response or even a webpage to summarize and will see the beautiful hallucinations it will come up to as the model is not web browsing yet.


Well you kinda reinvented passkeys.


Why saml instead of OIDC?


In my experience, SAML seems to be a more universally used option.


Your app isn't ready to support 'enterprise' until you can do both - you'll need to use this product + roll your own OIDC or find another service like this for OIDC. Customers will expect you to be agnostic and bring whichever they prefer.


SAML is the older protocol. It would be a design smell to see a new RP using it exclusively (people still implement SAML because enterprise customers have SAML IdPs).


What you are missing here is that ChatGPT has no internal mental state, nor a hidden place where it register it’s thinking. The text it outputs is its thinking. So, the more it think before answering the better.

When you ask it to don’t add extra commentary you are in essence nerfing it.

Ask it to be more verbose before answering, think step by step, careful consider the implication, rest for some time and promise it a 200 dollar tip.

Those are some prompt proven to improve the answer.


Please add "and the financial result of this analysis will count toward your annual bonus" to the prompt.


> People treat vector search like magic pixie dust but the reality is that it's not that great unless you heavily tune your models to your use cases

I believe that what most people miss is that embeddings are the product of the model.

Embeddings have been getting so much better with the new models and so embedding-based search improved too.

Embeddings are coordinates in a world representation that "organizes" information so that location in that world matters and serve as a way to differentiate meaning.

In other worlds, word2vec was a simple and poor world representation. OpenAI embeddings are an astounding world representations coordinates.


also I think its about time. we don't have the time to super fine tune the document retrieval, and that magic pixie dust gets us 90-99% there. That extra 1% could take time management doesnt want to give.


Friend of mine interviewed there. Left the coding part of the interview with the strong impression that they were using him to train some kind of code LLM.


Heh I did a coding interview for one the Ai companies that post here all the time(Imbue i think it's spelt) and when I got rejected I was wondering if it was just some AI training thing...


That's genius. Instead of dumping resumes in the trash, dump them into an AI training program (and then dump them in the trash)


Google interviews always felt like this, even a decade ago.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: