I can't download the posts of forums like comp.lang.lisp, and I can't disable javascript in firefox with one click, so forget about any applaud from me. The browser I would like is nothing like this and I want a javascript free browsing experience in which content is important and display not so much, sorry css fans. Today web is about selling things and propagating noise and today browsers are a perfect tool to propagate more noise, linkbaits are the norm. The pityful technical enhacements of noise propagators is not what I am interested about. To be somewhat positive, anything Ungoogled sound to be a good thing, I am expecting the next interesting post to be about a very small company building the new web with a small browser whose code you can trust and check.
just reading page 15, arg max = maximal ..., I think that global maximum or local maximum is better than maximal.
I would like to read all the interesting fruit of RL in just one hour, can someone suggest a short book for someone with advanced maths skills?
Thanks a lot to the authors the book seems to be really interesting.
Edit: In page 25, an extended example: tic-tac-toe the rule to update the value of each state v(s)=v(s)+a(v(s')-v(s)) doesn't take into account that if in s' there is a winning strategy by the policy then previous values is also part of a winning strategy. So if v(s')=1 (win) then v(s)=1 (I can win).
In my very humble opinion, the author should digress a title to talk about this very important point.
The book is hundreds of pages long, if he diverges to talk about everything in Chapter 1 it would be a mess.
The scenario you describe is if alhpa=1, and it would do poorly. Try thinking about games where the opponent doesn't play an optimal game. Try thinking of stochastic environments.
Lets pretend alpha = 1 on a win and alpha = 0.1 on a loss.
Imagine a scenario where you play a game and the opponent plays poorly and you win; you then try and repeat the same thing again, this time the opponent has learnt from their mistakes and beats you. You'll keep playing the same losing move significantly more times because it worked that one time.
I don't know why everyone wants to second-guess the first chapter of the standard textbook in this space with what seems like no experience even thinking about this topic...
Today I read about reinforcement-learning in the wikipedia and The Wealth of Humans: Work and its Absence in the Twenty-first Century. Perhaps both ideas are useful to be successful.
I don't have a German dictionary, my point is that a german person can discover interesting relations between English and German using grep, for example the one you suggest about ('d','v') becoming ('th','f'). The use of gerl is not essential but is a handy tool.
Also, not sure why a genetic algorithm wouldn't be what you are looking for.
We can't really make "motivations" for the AI stuff we are doing, so making them be "motivated by" reproduction doesn't work,
So, if you just want the ones that fit some task better, that seems like you would just make the more successful ones reproduce more. A genetic algorithm.
I suppose maybe if you had them evaluate each other and have the reproduction be based on that, and have that evolve as well?
But I don't expect that to be a particularly effective training method.
Yes, you suggest a little effective training method. What I suggest is training a lot of very little effective training methods looking for mutations. By not being gready perhaps AI can develop a new system for evolving that we can't envision at this moment.
Just to be a little more concrete: What about a "genetic algorithm" approach for a population in which individuals are genetic algorithms and in which the optimization function put a great weight on being self-sustainability and diversity of structure.
That reminds me of the idea of training a neural net to express a function that, if used where the sigmoid function would be used in a neural net, would result in the neural networks using it being trained well and quickly.
I assumed training a neural net like that would be too expensive (each evaluation of it would require training a neural net), for not much benefit (I don't expect the result to be better than the sigmoid function).
I'm not sure quite what you mean by self-sustainability. What does it mean for a genetic-like algorithm to be self-sustainable?
This idea also sounds like it would be very computationally expensive, but it's possible I misunderstand.
If it could be modified to only use one layer of these algorithms, maybe that would be less expensive?
I am imagining neural nets that each do both image to label, and label to image, where I guess they would engage in a sort of challenge response thing, where one produces a label and requests an image, and evaluates the image received, and provides an image and requests a label (or level of confidence for each label) and evaluates the response, and then when both have evaluated the responses, they output whether or not to reproduce, and if both do, then the genetic part is done. Sort of like a mating dance I guess. (Of course, what challenges and responses they give, and whether they accept, would also be determined by things that are recombined like the other stuff.)
But having that by itself wouldn't be enough, because it needs some connection to real things in order for the "image recognition" to represent anything real. There has to be some sort of real fitness for the reproductive fitness evaluation for the bots to evaluate.
The mating dance and the need of some sort of real fitness is interesting. By self-sustainability I was thinking about some form of interactions with the property to not become blocked or extinguished easily, some property to overcome problems, filter agents that don't satisfy certain interesting properties.
Perhaps humans are not a good tool to teach a computer how to be intelligent, there is too much chaos in our society and our dreams are many time irrational. We want machines to be intelligent following the human way and this may be a bad approach. Perhaps we should need to focus in a two step approach: first, what is needed for a system to evolve, and second: how that system can be used to interact with human intelligence.
In maths you learn to solve problem by expanding the abstractions by adding new elements. There is a crucial difference between problems easy to solve by repetition and problems that require some insight.
Exactly, otherwise all those IOI and IMO contestants would be pure geniuses, probably they are, but they practiced a lot to be good at a single task - problem solving in programming or mathematics for competitions. That's a good foundation built on insane amounts of repetition and memorization, but still, not many turn out to be Terence Tao or Peter Shor.
They practically solve a very hard problem (it might take days for me to solve it) in 5 minutes. If that is not some supreme pattern matching then I don't know what it is.
Bunch of proof strategies used in math competitions are the same.
You have to have a giant knowledge of algorithms: fast fourier transform, bfs, dfs, iterative deepening, dijkstra, a*, sweep line algs, practically have to master dynamic programming (there's DP on trees, hidden markov model like DPs etc.), flood fill, topological sort, bipartite graph checks, kruskal, prim, edmonds karp, bunch of combinatorics, geometry algorithms.
data structures like segment trees, fenwick tree, suffix trees, etc.
you see where I'm going, there's a lot of memorization, and a lot of repetition (which improves memorization).
same thing goes for IMO, you have to do a lot of proofs, have to memorize and repeat a bunch of proof strategies in a huge number of mathematical areas.
I would add another multiplicative factor: the probability that you have success in what you are trying to achieve, if that probability is near zero you effort will be in vain.
I ask as someone who is vaguely familiar with Taleb's ideas but didn't read the book:
That is a really fascinating point, especially considering this is literally a case where you are "betting your (future) life" on the low probability outcome event.
I mean, what is your hedge in that case? Does Taleb talk about that too?
Taleb investment advice: put almost all of your savings in super stable investments, like treasuries. Use the rest to bet on unicorn startups, or highly leveraged options (as long as those options have a limited downside and an unlimited upside!). I'm not quite sure how to apply this to one's career.
I guess a hedge in that case would be to try to arrange things so they come out OK if you miss the goal. Like if you want to be president aim for the fail case being a successful lesser politician. Not sure what Taleb says.