Hacker News new | past | comments | ask | show | jobs | submit login

So your proposed system would be a extremely interesting exploration of that! I look forward to it.

Right, so lets virtualise it. Actually training AIs using real cameras and real robot arms will be really slow and expensive.

So we provide a system that renders a photorealistic graphical room with teapot and robot arm, and a virtual camera inside the room is 'seeing' parts of the room and a vision model then processes what it can 'see' to try and feed info to the LLM. Likewise the LLM can make the robot arm move but its all just simulated.

Does the LLM now have a relationship with reality?




Let me know when you find out.

The interesting and open question to me is what the limitations are of a language model at the center of that experience. How much of a a relationship with reality can be captured by language at all, and specifically with the specific sort of statistical models of language that we're exploring now? For some of us, the intuitive answer is not all that much and for others it seems to be at least as much as any human.

Whether conducted virtually or physically, coming up with an answer sounds like an empirical study, and one that we're some years away from having results for.


Ideally in that scenario you'd have a model that unified vision, language and an understanding of 'doing things' and manipulating objects. so it wouldnt just be an LLM, it would be a language-vision-doingthings model. There's no reason why we cant build one.


Come to think of it, thats kindof what Tesla are building




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: