Hacker News new | past | comments | ask | show | jobs | submit login

The framing you have here is an attractive one, but I don't think it makes much sense in the context of reproducing molecules.

There is no reason to posit random DNA chains.

The statement that "the number of DNA chains that produce valid/useful protein in the space of all possible DNA chains is vanishingly small" seems reasonable (however I'm not sure how we would know these chains are the only ones that produce valid/useful proteins).

The idea that we need to choose randomly from the space of all possible DNA chains is not reasonable.

----

Once we have a reproducing molecule, we expect to see a multitude of valid reproducing molecules as descendants of that first molecule. We expect (at least some of) these descendants to eventually be extremely different from the original molecule, and by their nature valid reproducing molecules.

Once we have a reproducing molecule (like DNA) that creates other molecules (like RNA and proteins) we can expect the same of its descendants, and the descendants' by-products.

If these molecules form an ecosystem, where the reproduction of one relies on the validity of the other, the only succesful variations within the ecosystem will be valid variations of the ecosystem.

----

The space that we are choosing from is not the space of all possible DNA chains, it is the space of all DNA chains adjacent to existing valid chains (or chains in a valid ecosystem).

It's analogous to taking a valid x86 program that can reproduce, randomly adding/removing/mutating some bits on reproduction (with low frequency, very quickly, and in a ginormous space - think on the scale of molecules in the Earth's oceans), and asking if that new program is also valid. And then, after millions of years of this, asking if one of the programs is a valid mathematical function.

----

There are still big questions here. Questions like "how do we get the first reproducing molecule?" and "is DNA likely to arise once you have reproducing molecules, or just one out of many options?"

None of those questions give reason to evoke the number of all possible variations of DNA as evidence that the variation we see in proteins is somehow unlikely.

Once we know that there exists one valid DNA/protein system (which we do, as it exists), and we know that variations of DNA/protein ecosystems can be functional (which we do, as we've observed it), it is reasonable to expect a multitude of valid, functional DNA chains, and the proteins produced by them.




Like you, I can imagine hundreds—maybe thousands—of ways to resolve these issues.

That's hardly relevant though, what matters are resolutions that actually work.

I agree that we are (very) likely to find the mechanisms involved, but so far, we haven't. In fact, we don't even have a theory on how DNA was originally developed, or how non-functional DNA/proteins self-replicate, or really anything at all. We only have the end product (which does—as you point out—work). The question is how did it get there, and previous hand waving about a huge, old Universe and random chance isn't sufficient.

It's going to have to be something similar to what you (and other commenters) describe: mechanisms that preferentially and relatively quickly produce valid, self-replicating DNA/protein chains. To date, no one has found anything even close to that.


You see the difference between this argument and what you wrote above though?

Perhaps I'm reading your original post too strongly, so please correct me if so.

In the first post you compare the number of valid DNA chains to the space of all possible chains, you mention the number of different proteins in the human body, and you draw an analogy to a random sequence of bits forming a valid program.

None of these talk to the probability of a reproducing molecule arising through physical processes, nor do they talk to the probability of DNA as a descendant of that original reproducing molecule (or potentially multiple original molecules).

I get that you understand the gaps in our knowledge of how these systems came to be; my point is that your original argument is misleading in the exact same way you claim the argument

"Billion of years passed since the Big Bang. If some chemical process can create life it's very likely that somewhere it did."

is a

"kind of hand-wavey statement [that] seems to convince most people. Universe is hella-old, and really big. Ergo, incredibly rare stuff has happened basically infinitely many times. Life everywhere, etc."

(this was a reply to a different post, but I think it holds to the comment you originally replied to).

In fact, I find the argument that "things reproduce, and have been reproducing for a long time in a large environment, so we expect to see complexity in those things" much more reasonable than "most random arrangments of this molecule are useless, and we can see lots of useful arrangements, therefore time and randomness can't explain them".


We're discussing how to get those "things that reproduce" in the first place. I agree that once you have useful things that reproduce, it's easy to keep it going. Similarly, if I have a running copy of Linux, I can use the tools (and source code) to produce another copy of Linux.

But how do we get the first copy, the "original reproducing molecule" as you put it?

The usual explanation is that the "first copy" arose randomly, and then kept going. Do you believe that? I suspect not—but most people do.

We know that it can't have been random (which is the argument I gave, and I suspect you agree with). We should tell people "it wasn't random, something about the fundamental nature of these molecules caused better and more complex molecules to emerge." But we have no mechanism for that, just a (valid) belief that it has to be true.

I think we should find those mechanisms, and simultaneously, stop telling people that random chance + vast universe + long timespan is sufficient.


> The usual explanation is that the "first copy" arose randomly, and then kept going. Do you believe that?

I believe a variation of that.

I believe that the first copy arose through physical processes.

Evoking 'randomness' is unnecessary and misleading.

Do you not believe this?

To my knowledge, we don't yet have a mechanism for how such a molecule came in to being (though there are ideas).

We also don't have any reason to think that it must be some random single choice from a large possibility space, and we don't have any evidence at all that it could have arisen from non-physical processes (what would that even look like?).


This is what I mean by random: no DNA sequence is privileged over any other, and no (known) physical process produces anything but random DNA sequences (excluding, of course, copying already useful DNA sequences).

DNA has about as much structure as bits on a disk (with coding for one of 20 amino acids as the "bits"). No DNA sequence is more likely than any other to exist.

I think that means we need to identify strong physical processes that produce useful DNA strands; you, apparently, aren't as concerned about it. Maybe you're right, but from where I'm sitting, it's hard to imagine what those physical processes might be since the strands they must produce are extremely, unimaginably rare in practice.

DNA is basically information[0], and we literally have no example of a chemical process producing valid DNA information, nor is it all obvious how such a process might work in practice. In the past, large amounts of time + equal likelihood of producing random DNA was considered sufficient to think "well, useful DNA stands could appear randomly." We now know that's extremely unlikely to the point of being effectively impossible, statistically-speaking.

[0] https://www.nature.com/scitable/topicpage/dna-is-a-structure...


But some DNA sequences are privileged over others!

The mechanisms for producing new DNA sequences involves copying existing DNA sequences. Thus, the ones that exist are privileged over the ones that do not exist (yet), and the adjacent sequences are privileged over a random sequence.

> No DNA sequence is more likely than any other to exist.

It is far more likely for a DNA sequence very similar to my own to exist than a random sequence.

> we need to identify strong physical processes that produce useful DNA strands

We have already identified those processes! We know quite well how the machinery of DNA replication works.

If we care about the first DNA molecule to ever exist it's a very different question. We don't need to find a physical process that produces a modern DNA molecule from 'raw parts', rather one that takes not-quite-DNA and converts it into DNA.

> it's hard to imagine what those physical processes might be since the strands they must produce are extremely, unimaginably rare in practice.

Can you imagine slightly simpler DNA? Say just a bit shorter? What's the simplest molecule we might still call DNA, that is reproducing? Can we imagine machinery that would produce that?

I think it's very reasonable to think such machinery could exist, even if we don't know the exact mechanisms involved. We know that RNA can self-reproduce, and also produce proteins, so it's reasonable to think that machinery to produce RNA strands could evolve to produce DNA strands (for example).

The only involvement randomness has in this whole process are (relatively) rare and infrequent changes to self-replicating molecules, and (potentially) the initial formation of a self-replicating molecule.

It is irrelevant how many possible DNA sequences there are, or how much information is stored within them, as we know new sequences are derived from previous ones.


> If we care about the first DNA molecule to ever exist it's a very different question. We don't need to find a physical process that produces a modern DNA molecule from 'raw parts', rather one that takes not-quite-DNA and converts it into DNA.

We haven't found that, and apparently aren't even close. We don't even have any idea what something like that might look like, or even more critically: given all the incredibly, insanely, unbelievably rare DNA sequences that exist in the world today, why is such a fundamental process capable of producing them not abundant as fuck already? Where'd it go? Why is this process even a mystery in 2020? It should be ubiquitous; in fact, all of the primordial soup mechanisms should be. Certainly that's what we expected when the theory was developed, and it hasn't panned out.

Anyway, I think we've exhausted this topic. Thanks for commenting.


> We haven't found that, and apparently aren't even close.

We do have ideas! Specifically, within the RNA world hypothesis, the transition period is called the virus world [0]

> given all the incredibly, insanely, unbelievably rare DNA sequences that exist in the world today,

We have a good understanding of where diversity comes from, I'm not sure what point you're making here.

> why is such a fundamental process not abundant as fuck already. Where'd it go? Why is this even a mystery?

I don't think anyone thinks this process need be 'fundemental', though it definitely is pivotal. It only really needed to happen once, and then DNA was off reproducing and spreading by itself. That said, it looks like viruses converting RNA to DNA could still be happening today.

In general, we don't expect novel self-reproducing molecules to arise today, because they are out-competed by existing self-replicating molecules. In a world where nothing is replicating the first replicator is king. In today's world a brand new replicator is food for something else.

> Maybe it's possible that your romantic view of how this all happens (pseudo-Darwinian circa 2020) isn't telling the whole story?

I don't think I, or anyone else really, is claiming to tell the whole story - just that we have good reason to believe this came about through physical processes, and no evidence to believe... well I'm not sure what else there could be.

What are you proposing?

[0] https://en.wikipedia.org/wiki/RNA_world#Evolution_of_DNA


Interesting discussion. I’d like to ask a sincere question:

Wouldn’t a system A that is capable of encoding another complex system B, need to be as complex in order to encode all the information in the result?

It’s like a compression algorithm, you can encode the information, but the complexity level of that information is still there (also the difficulty in compressing the information increases very fast - exponentially or maybe even factorially).

So if the most basic protein sequence requires so many bits of information, wouldn’t anything capable of producing that (in a non-random manner) also require at least that level of information (if not more).

It doesn’t matter what process we call systems A and B.

So it seems if randomness doesn’t solve the problem (because math), then the only conclusion is that there is a fundamental requirement for intentionality.


It's possible for a simple thing to encode something more complex, deterministically.

The prime example is The Game of Life - simple rules from which complex behaviour emerges.

This idea of information is one we're putting onto the system, not some inherent attribute. Yes, the encoding of a protein needs to have enough information to produce that protein (or a family of proteins), but that says nothing about the process that created the encoding.

For example, a strand of RNA can be spliced in many different ways to create many different proteins [0] and this process can go weird in many ways. New sequences will arise from this process, even though they weren't 'intended' to.

[0] https://en.wikipedia.org/wiki/RNA_splicing


The Game of Life doesn't produce complex behavior from simple rules.

The complex behavior comes from a large enough random starting state combined with a very low minimal required complexity to see something interesting. Also, even for a short interesting run of local behavior, the game never produces a stable behavior that grows in complexity beyond the initial information encoded in the random state. (i.e. if there is a bubble of cool stuff happening somewhere on the 2d plane, something usually interferes with it and destroys that pattern - like waves in the ocean, even when the energy curves combine to form a wave once in a while, they are limited and temporary).

So the Game of Life is actually an example that the system is limited to the information encoded in the initial starting state.

In the starting state there is either:

- a large enough random search space (i.e. a million random attempts with a 100x100 board might get something cool looking)

- intentionality (a person can design a starting state that can produce any possible stable system)


Yes, and useful proteins are basically the equivalent of "oscillators" or "spaceships" in the game of life. But must runs of the game of life are not oscillators or spaceships, just like most proteins are useless.

That's why the "initial condition" is so important, and why DNA is so important: without a good "start state", you get useless results—just like in the game of life.

What we are trying to find is not Conway's rules for the game of life, but this: how do we produce useful starting states (DNA) with a physical system? And more importantly, how do we create those starting states preferentially (i.e. non-randomly)?

We still need a model for how useful DNA (which corresponds to the "initial state" in the game of life) gets created. And we have no model for that right now, other than assuming unique random initial states are continually occurring and letting the law of large numbers eventually "find" winners.


For DNA, at least, it could have come from RNA (as per the link in my last post).

While I don't think the pre-biotic problem is solved at all, we have a lot more models of how it could have happened than you seem to credit - this is after all a huge research area.

For example, here is one [0], and here is a whole journal issue on the subject [1].

I found these by searching for 'evolution of DNA' and 'evolution of RNA'.

Now, these models all include some randomness, but in no way does anyone assume "unique random initial states are continually occurring... letting the law of large numbers eventually "find" winners"

The models show plausible environments where pre-biotic synthesis of RNA (or RNA pre-cursors) can occur, and stabilise.

This model you keep bringing up - randomly selecting a molecule from all possible combinations of atoms and saying 'enough time will get you one that works' - is not mentioned anywhere that I have seen. Perhaps some lay-people (of which I am definitely one!) believe it, but as you point out it is so obviously implausible it falls down on first inspection.

There are other models (lots of them!) and they don't rely on this pure randomness.

[0] https://phys.org/news/2018-01-chemical-evolution-dna-rna-ear...

[1] https://www.mdpi.com/journal/life/special_issues/evolution-R...


Minor side note, but most runs of the game of life actually will produce spaceships and/or oscillators, even starting from a random configuration. (Initialize a 100 x 100 box of cells randomly, and you're virtually guaranteed to get several gliders flying off of the resulting mess.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: