Hacker News new | past | comments | ask | show | jobs | submit login
The Worm Ouroboros (semantic-domain.blogspot.com)
44 points by octosphere on Aug 3, 2018 | hide | past | favorite | 12 comments



I ran into a somewhat similar issue recently: Sometimes I work on a piece of software that classifies files based on their contents, and with so many file formats implemented as "just stuff it in a zip file" it ends up identifying a lot of files by looking for well known filenames in zip archives. An annoying quirk about the zip file format is that the authoritative list of the offsets of the files in the archive is at the end of the zip file, but not at any particular offset from the end because there's an optional, arbitrarily sized comment field in it. That means the only way for a program to correctly parse it is to read backwards from the end of the file looking for a magic number. The folks who wrote the .Net API for enumerating zip entries knew about this and wrote their code accordingly.

For our file classification tool...we made it work from a System.IO.Stream, with the only requirement being that the stream was seekable, and we have it classifying hundreds of millions of filesystem-backed file streams per day, no problem.

So we recently shared it with a partner team, and all of a sudden in their environment, every once in a while they get a file that takes five minutes to identify. The difference turned out to be that their stream implementation was backed by a database, so even though it was seekable, every read turned into a database query, so if you passed in a big zip file with the directory information missing from the end, .Net would read through the entire stream backwards, looking for the magic number, one 32-byte read (database query) at a time...


> can the behaviour of the following bit of C code depend

> on a modern computer, a pointer dereference could very lead to the execution of Python code.

So the BEHAVIOR of the C code does not change. The dereference of the pointer triggers a memory page load, and if that load is successful, a numeric value is returned and added to the array. If the load fails, you will have the undefined result of accessing uninitialized memory.

In both cases, the behaviour of the code remains squarely within the C standard - with the actual result of the computation contingent on various external factors.


The behavior isn't changed, right, but it depends on python behavior. IF the python behavior changed somehow, than the C behavior might as well, and due to the A implies B & B implies A basic structure of the system, very weird unpredictable behavior might result.


But the behavior of any data dependent algorithm depends on any behavior change in the upstream data source. If you pipe Seti@home data into the array, the behavior of the program might depend on what some distant alien intelligence did thousands of light-travel-years ago. Is it "weird unpredictable" or is it exactly what you programmed into it?


Almost any “can” question will lead to something like this.

This can happen if the OS lets you use user-space file systems, the same way it can happen if you have custom hardware with memory-mapped temperature sensors (and suddenly your code literally “depends” on the weather).


What's new? Unless you are manipulating logic gates directly you are working at some layer of abstraction, and there is no way to tell how many layers are below you. Modern CPUs long ago ceased being directly manipulated gate machines where instructions mapped straight into logic. Instead they are basically VMs that execute microcode.


You might as well ask "what's new? the laws of physics haven't changed" about anything in computer science, or, indeed, in any technology. It's not a useful point of view.


Physical virtual machines? ;)


What is the relation of the title to the content? (no arrogance - genuine question)


I don't understand it either and almost ignored it because of that. I suggest changing the title on HN to something like "Can the behaviour of C code depend upon the semantics of Python?"


The conclusion of the article is that the correctness of the memory model is circular, feeding upon itself, like the Ouroboros of myth.


Our Ob or Ros?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: