A theory of the learnable

NougatRillettes · on Jan 31, 2018

Very interesting topic indeed !

Me and two other researchers have published a paper[1] using Valiant's Probably Approximately Correct learning to learn regulatory Gene networks, which can be interesting if you want to dig deeper !

If anyone has questions on the topic, feel free to ask, I'll keep an eye on the thread.

[1]A. Carcano, F. Fages, and S. Soliman, “Probably Approximately Correct Learning of Regulatory Networks from Time-Series Data,” presented at the CMSB’17 - 15th International Conference on Computational Methods for Systems Biology, 2017, vol. Lecture Notes in Computer Science, pp. 74–90. https://hal.archives-ouvertes.fr/hal-01519826v2

jj12345 · on Jan 31, 2018

Thanks for the paper link! I've been digging into PAC learning so I'm taking anything I can get.

On the paper, I was hoping you could help me understand a few things: - It seems that the main finding is that any k-CNF form, like Thomas' Boolean Regulatory Network, can be expressed by PAC learning bounds. In section 4 of the abstract [1], you mentioned that "when the dimension increases... the PAC learning algorithm can leverage available prior knowledge...". Are you referring to the time dimension adding more clauses to the k-CNF? - I'm having trouble reconciling the PAC term "h" with "model confidence" in section 5.2. Is this allowed because the PAC learning "delta" (probability) [2] parameter is dropped for the k-CNF adaptation? - In this concrete case, is the learning portion just the mapping the stochastic traces to outputs (i.e. lookups)? I'm missing some understanding on how such a mapping handles stochasticity.

You'll have to forgive me, as I'm still trying to understand the paper. It's incredibly interesting to me, so thanks for writing it!

[1] Abstract - http://mlsb.cc/2017/abstracts/MLSB_2017_paper_11.pdf

[2] PAC Learning - https://en.wikipedia.org/wiki/Probably_approximately_correct...

NougatRillettes · on Jan 31, 2018

Hi ! I'll try to answer your questions as I can :

Unfortunately the final version of the paper is not the one that is on hal -- which is an older one, and I must concede that being new to this whole publishing world, I don't know exactly how you can have access to it. I'll try to sort it out and get back to you. In the mean time, you can refer to the hal version.

  In section 4 of the abstract [1], you mentioned that "when the dimension increases... the PAC learning algorithm can leverage available prior knowledge...". Are you referring to the time dimension adding more clauses to the k-CNF?

I think this sentence is actually referring to what is presented in section 5.3 of the final (and hal version) paper.

  I'm having trouble reconciling the PAC term "h" with "model confidence" in section 5.2. Is this allowed because the PAC learning "delta" (probability) [2] parameter is dropped for the k-CNF adaptation?

I think there are two things here.

First the "delta" you are referring to is indeed taken to be equal to the "epsilon" and both are what we call "h" in section 2.1.

Second, the idea of the discussion in section 5.2 is the following: let's say you fix a number A of initial states and simulate B steps of for each of these states. You will have a given number of (de)activation samples for each (de)activation function. Then, accross all 2n (de)activation functions, take the minimum number of samples you got, and call it Lmin. Then using results from section 2.2, with S=(2n)^(k+1), you can find h so that Lmin=2h(S+log(h)). This h will be your "model confidence".

However, if we didn't have one "h" but one "delta" and one "epsilon" (wikipedia notations), I guess we would not have a given value but only a relation between the two (i.e. defining one would define the other).

I'm afraid I don't get your last point.

I'm new to HN so I hope this answer will be somewhat correctly formatted.

jj12345 · on Feb 1, 2018

Thanks, these are great responses! Also, I apologize for not fixing my own formatting sooner, which makes the questions almost impossible to read. You answered my last question about stochastic traces by explaining the de/activation function relationships with Lmin.

Thanks again!

gone35 · on Jan 31, 2018

Interesting work! We definitely need far better inference models in computational biology than what we currently have. I agree with your paper that modeling is unfortunately more of an "art" than a science nowadays... and with huge societal consequences.

Having said that, on a cursory read I think you may be misapplying Valiant's algorithm...

In particular, the original (union bound) PAC guarantee relies crucially on IID samples, so you cannot straightforwardly apply it to time series data and expect the guarantee to hold unchanged. Instead, you should use block bootstrap methods to sample consecutive segments of your time series of a certain size --in which case a (possibly weaker) PAC-like guarantee might hold, provided the dependence across time decays sufficiently fast [1].

I'm also a bit concerned about the semantics of your approach, since I thought gene regulatory inference was/is notoriously intractable, and Valiant's model is very stringent and conservative... So IMO somewhere along the line you are getting a massive free lunch simply by reducing to k-CNF!

Not saying it's wrong per se of course; but I couldn't easily tell exactly where the 'trick' is... So if I were you I would try to communicate more clearly (to dumb non-experts like me) how exactly this particular reduction captures something highly non-trivial in gene regulatory networks to achieve such a (seemingly) drastic speedup..

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551412/

--

gone35 · on Jan 31, 2018

OK, I think I narrowed the 'trick' down. It's actually an interesting existential question.

Correct me if I'm wrong but I think the key step is the use of "positive Boolean semantics"; which, as your Ref. 9 proves, are substantially weaker --and hence, unsurprisingly, far more tractable-- than more conventional "stochastic" or "differential" semantics...

But then Ref. 9 [1] goes on to make, I think, a frankly astonishing, Church-Turing like existential claim in Biology (Sec 3.2, infra):

[...]if a behavior is not possible in the boolean semantics, it is surely not possible in the stochastic semantics whatever the influence forces are.

If that is the case, that would IMO have huge consequences! It would mean, then, that some of the underlying machinery of Biology may turn out to be far simpler than we think: no more pesky self-loops or bistable, mutually inhibitory modules to deal with! Tractable network inference, at last! It would potentially revolutionize computational biology, if true.

But, is it true? I think I see the intuition, but I don't think the case is as clear-cut, with that single "surely" carrying way too much of the rhetorical work... Indeed, the claim hinges on what I think is a rather interesting, non-trivial existential question: informally, if 'something' (of a given type) cannot be denoted in a certain weaker type, does that mean that 'something' cannot exist?

Anyway, not your paper per se; but I think it's an interesting debate nonetheless.

[1] https://hal.archives-ouvertes.fr/hal-01378470

testudovictoria · on Jan 31, 2018

On a completely unrelated note, I thought your spacing of punctuation was odd. I didn't know that it was a thing in other languages: https://english.stackexchange.com/questions/4645/is-it-ever-...

JohnStrange · on Jan 31, 2018

Interesting blog entry but I was a bit disappointed not to hear about Solomonoff induction, as it is often taken to determine quite rigidly what is in principle learnable by induction. What's the connection of the approach mentioned in the blog post to Solomonoff induction?

yters · on Jan 31, 2018

Isn't Solomonoff induction optimal if the target is learnable?

The connection to Kolmogorov complexity is interesting. It implies a perfect learner must have infinite KC or increase KC, neither of which an algorithm can do.