Master of Doom contains an anecdote on this aspect of John Carmack's life. On page 252, it mentioned how he had sequestered himself in a "small, anonymous hotel room somewhere in Florida" as he researched Trinity. He had a dolch portable computer with pentium II and full length PCI slots, while subsisting only on pizza and Diet Coke.
That bit for some reason made a big impression on me when I read it on the bus ride to school. To be able to let yourself go and research and code what you truly believe in or curious or excited about (w/ room service and not have to clean up after yourself haha) seemed incredible.
I wonder if John still sticks to Florida, or if he goes to different places each year; in a city or just a hotel off of a highway or near an airport. My favorites have been Hyatt - Place Amsterdam Airport, - Regency Charles De Gaulle, and - Lake Tahoe. Something about sterile rooms, room service, a hotel near, but not too close to beautiful and historical landmarks just center you and allow you to think.
Agreed, I guess that the trick to selecting a location is to be in a location the the inside the hotel/cabin room is much more attractive that on the outside.
I had a colleague that told me that his most productive period was when he was stuck at the hospital for a couple of weeks but was able to do some coding.
There was a post on HN years ago advocating coding on a cruise ship -- it's like a nice hotel, but the internet is so crappy that you'll only use it when you really need it (e.g. finding docs, syncing git), which is great for productivity!
Post may have been by Tynan (http://tynan.com); he's big on working while on transatlantic cruises. Following his advice, I have done the same thing several times.
Here are some tips:
- Find the best cruises most easily on cruisesheet.com (disclosure: it's Tynan's project)
- Royal Caribbean has the best internet at sea through O3b. In the Caribbean or Mediterranean it's about 70ms latency. In the middle of the Atlantic it's about 220ms latency. So Skype may work but the delays are annoying. Plenty of bandwidth now though. You may have only "one 9" service on average though. So scheduled conference calls are always a gamble.
- Repositioning cruises (many ships move from the Mediterranean to the Caribbean and back seasonally) in April/May and October/November are the cheapest cruises you'll ever find. 5-8 days at sea means plenty of time to get some work done, as well as goof off a bit during the evenings. For the second week (Europe or Caribbean), I find it easy enough to get an hour or two of email catch-up in on port days after getting back from a shore excursion, but it's easier to just say port day = vacation, sea day = work and great food.
A 2-week repositioning cruise may cost as little as $600 per person including taxes and fees. Add another $200pp for gratuities, a few $hundred for shore excursions, and a few $hundred for airfare to get back home. Depending on where you go, you may get the benefits of a 2-week vacation for the price of one on land.
Similarly, if you want to do a one-week offsite with your startup (particularly if you're all remote most of the time anyway), this is probably cheaper than flying your crew to any big city.
The seminars-at-sea model was also well proven out by geekcruises.com, now renamed insightcruises.com
My wife and I have been renting an off-season beach house near Boston for 8 months out of the year for less than the price of a basement studio in Cambridge, then traveling during the summer. This means our repositioning cruises are further discounted by the fact that we aren't paying rent or mortgage on an empty house. It's hard to beat if you both have good schedule flexibility.
Good writeup -- and one of the main reminders for me is this:
People throw around words like "revolution" for the current deep-learning push. But it's worth remembering that the fundamental concepts of neural networks have been around for decades. The current explosion is due to breakthroughs in scalability and implementation through GPUs and the like, not any sort of fundamental algorithmic paradigm shift.
This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
In computer vision at least, deep learning has been a revolution. More than half of what I knew in the field became obsolete almost overnight (it took about a year or two I would say) and a lot of tasks received an immediate boost in term of performances.
Yes, neural networks have been here for a while, gradually improving, but they were simply non-existent in many fields where they are now the favored solution.
There WAS a big fundamental paradigm shift in algorithmic. Many people argue that it should not be called "neural networks" but rather "differentiable functions networks". DL is not your dad's neural network, even if it looks superficially similar.
The shift is that now, if you can express your problem in terms of minimization of a continuous function, there is a new whole zoo of generic algorithms that are likely to perform well and that may benefit from throwing more CPU resources.
Sure it uses transistors in the end, but revolutions do not necessarily mean a shift in hardware technology. And, by the way, if we one day switch from transistors to things like opto-thingies, if it brings a measely 10x boost on performances, it won't be on par with the DL revolution we are witnessing.
It is indeed pretty amazing in CV. In a field nearly as old as CS itself, I'd say at least 75% of the existing techniques were made obsolete in a span of only a few years.
You could have started your PhD in 2006, made professor in 2012, and nearly everything you had learned would have been _completely_ different.
I got my diploma in 2003. Actually I got lucky that I found a client who needed a computer vision specialist to add fancy features to their DL framework so I could train myself in this new direction
Yea .. if 10x compute made all the difference we would see elastic cloud compute doing things 10x as well ;) As always these new application work in tandem of hardware and software advances; seems odd to point to one or the other.
Can you elaborate on what in CV became obsolete overnight? I took a survey course in CV but I haven't kept up. You still do facial detection, object recognization, camera calibration, image stitching the same way in 2012? Or has it changed because the processing has gotten faster and the results are near real-time?
These were at the root of many detectors. They still are for some applications but for most of them, a few layers of CNN manage to train far better and very counter-intuitive detectors.
Facial detection/recognition was based on features, this is not my specialty, I don't know if DL got better there too as their features were pretty advanced but if they are not there yet I am sure it is just a matter of time.
I can see image stitching benefiting from a deep-learned filter pass too.
Camera calibration is pretty much a solved problem by now, I don't think DL adds a lot to it.
Like I said, not everything became obsolete, but around 50% of the field was taken over my DL algorithms where, before that, hand-crafted algorithms had usually vastly superior performances.
Just to confirm for the facial recognition/ detection, modern DNN algorithms outperform the 'classic' methods that took decades of continuous improvement ...
I don't think the revolution was about hardware improvement. I did some neural network research (and published a few papers) in the 1990s and switched to other research disciplines afterwards. So, I'm not really familiar with the recent developments. But to my knowledge, there was indeed a revolution in neural network research. It was about how to train a DEEP neural network.
Traditionally, neural networks were trained by back propagations, but many implementations had only one or two hidden layers because training a neural network with many layers (it wasn't called "deep" NN back then) was not only hard, but often led to poorer results. Hochreiter identified the reason in his 1991 thesis: it is because of the vanishing gradient problem. Now the culprit was identified but the solution had yet to be found.
My impression is that there weren't any breakthroughs until several years later. Since I'd left that field, I don't know what exactly these breakthroughs were. Apparently, the invention of LTSM networks, CNNs and the replacement of sigmoids by ReLUs were some important contributions. But anyway, the revolution was more about algorithmic improvement than the use of GPUs.
The things that have most improved training neural networks since you left were: 1. Smart (Xavier/He) initialization 2. ReLU activations 3. Batch normalization 4. Residual connections
The GPUs and dataset size were definitely very important though.
> not any sort of fundamental algorithmic paradigm shift
I think I cannot agree with this. There has been a lot of improvements to the algorithms to solve problems and the pace has speed up thanks to GPUs. You just cannot make a neural network from 15 years ago bigger and think it is going to work with modern GPUs, it is not going to work at all. Moreover, new techniques have appeared to solve other type of problems.
I am talking about things like batch normalization, RELUs, LSTMs or GANs. Yes, neural networks still use gradient descent, but there are people working on other algorithms now and they seem to work but they are just less efficient.
> This is similar to how the integrated circuit enabled the personal computing "revolution" but down at the transistor level, it's still using the same principles of digital logic since the 1940s.
This claim has exactly the same problem as before. You can also say evolution has done nothing because the same principles that are in people they were there with the dinosaurs and even with the first cells. We are just a lot more cells than before.
Assuming you're asking specifically about the history of DL/ML, I can recommend this (4-part) blog series: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-.... It includes references for the relevant publications that identify problems (e.g., exploding gradients) and their solutions.
I have read a lot of papers and usually, you end up in the original one if you check the references. But you have to check papers in specific areas. I am not aware of any good document where everything is there.
There was a revolution, when they started using backpropogation to optimize the gradient search.
It's also why I don't agree with calling them "neural" anything because there is no proof brains learn using backpropogation.
I feel like current direction threw away all neurobiology and focus too much on the mathematical.
The GP implies that was not tried before: "There was a revolution, when they started using backpropogation to optimize the gradient search".
"Back-propagation allowed researchers to train supervised deep artificial neural networks from scratch, initially with little success. Hochreiter's diploma thesis of 1991[1][2] formally identified the reason for this failure in the "vanishing gradient problem", which not only affects many-layered feedforward networks,[3] but also recurrent networks."
The best way to advance AI is probably to make the hardware faster, especially now that Moore's Law is in danger of going away. The people doing AI research generally seem to be fumbling around in the dark, but you can at least be certain that better hardware would make things easier.
I am not in the AI space at all, but I am under the impression that python is the most used language for it. If workload is becoming an issue, wouldn't the low hanging fruit be a more performance driven software language?
With how fast things are evolving, developing something like an ASIC ($$$) for this might be outdated before it even hits release, no?
The heavy lifting done in neural networks is all offloaded to C++ code and the GPU. Very little of the computation time is spent in python.
ASICs will definitely be very helpful, and there's currently a bit of a rush to develop them. Google's TPUs might be one of the first efforts, but several other companies and startups are looking to have offerings too.
> Google's TPUs might be one of the first efforts, but several other companies and startups are looking to have offerings too.
NVidia's Volta series includes tensor cores as well [1]. So far, I think they've only released the datacenter version, which is available on EC2 p3 instances [2].
It's actually super refreshing learning that even programming masters like Carmack are just now learning NNs and watching Youtube Stanford classes like the rest of us. These are actual people, not gods :) Everybody poops!
Yeah .. but his first impulse was to write backprop from scratch. I saw the lectures, been dabbling with NN for years, and I never thought to do it. I always thought the Stanford people made you do it on assn 1 to pay your dues or something. I continue to think of Carmack as the Master hacker.
> his first impulse was to write backprop from scratch
backprop is a very simple algorithm, nothing to fear there. The problems are to calculate the derivates if you want to be flexible building your model. But for feedforward networks with sigmoid activation, the equations to update the weights are a joke.
Backpropagation is based on linear optimization (aka calculate the maximum or minimum of a function based on the derivades of that function, this is taught before university in my country). And also in the chain rule to calculate the derivades of functions (first year of university).
But I meant, if you see the equations and the steps without understanding completely the insights, it is a joke of an algorithm. It just does some multiplications and applies the new gradients, move to the previous layer and repeat.
It's been a while, but I remember backprop starting at the end of the neural net, and working backwards. Each weight that contributed to a wrong answer had its weight value weakened or even reversed by some small factor. And each weight that contributed to a correct answer had its weight value strengthened by some small factor.
exactly, the someFactor is the error of the next layer calculated with the derivades of the functions (as you would calculate the minimum of a funtion using the derivades). The tricky part is to calculate the derivades, but since auto differentiation we can do a lot of cool stuff.
The differentiation is probably why I wouldn't have bothered to hack it myself. Curious how you/others would tackle it? What do you mean by auto differentiation?
I should do that sometime. I'm afraid I've lost my ability to focus, something like that might help. I'd need a solid mission though, instead of a "look into tech X" without a goal.
Pretty awesome! If I ever had to say the one thing that differentiates successful people from unsuccessful people it wouldn't be intelligence, or even perseverance, or passion. It'd be focus. With focus, you can be amazingly successful in so many types of occupations.
(That being said, passion / perseverance / intelligence can often lead to focus)
Are these even legitimate? If a person can paint a masterpiece but can barely balance their checkbook, IQ tests won't show their mastery in painting, only their mediocrity in math.
The argument you're making has often been made as "there are different kinds of genius". The research seems to have debunked this. Rather than seeing multiple ways to be a genius, we see some general intelligence ability that helps you no matter what you're trying to do.
In the case of our painter, if we're talking about Michaelangelo or something, he'd likely come out with an excellent IQ.
As an aside, IQ tests don't try to make you do math problems and the like, it's all pattern matching kinds of questions where you learn all of the necessary context in the test itself. Which makes sense right, it's trying to measure your ability to learn and generalize, not what you happen to know before you take the test.
However you're defining focus I'm sure correlates pretty well with the personality trait conscientiousness. (IQ is the best single-variable predictor of nice things like life success, for various definitions of success, IQ + Conscientiousness predicts even better...)
They're not called that, but they do exist. I recently went through a full day's testing for issues relating to autism and a number of the tests related to ability to hold focus rather than skill, per se.
One I recall vividly was basically a page full of weird symbols and a "key" at the top mapping symbols to numbers. I had to go through the whole page converting the entire page and various elements of my performance (variance of speed, attention, etc.) were measured during the task. This does not really correlate to what I'd consider focus or "flow" during typical programming tasks, however, but it did try my patience! :-)
Personally, I found it took me much longer. I watched Karpathy's lectures, took notes and stewed upon the ideas, and read a bunch of other materials such as blog posts and research papers to try and truly comprehend some of the concepts mentioned in the course.
I found myself knowing how to create CNN's, but the why of the entire process still feels under-developed. But I'll admit it could be because my understanding of Calc and Linear Algebra was far more under-developed back when I was studying the course than it is now.
What did you read/do to develop your calc and linear algebra skills? I feel like I know calc and linear algebra fairly well on paper, but I'm unsure how to translate that to the computer.
Are these courses being taught by graduate students? The three main instructors seem like they are students themselves, with an army of under and grad TAs.
Pity that you pay so much money to attend Stanford only to be taught by your peers. Not knocking on Stanford as this is how is being done much everywhere in the undergrad level now.
In the first few lectures of the course you get a pretty good history of deep learning and you'll see it didn't really take off until around ~2012. And the reasons for it taking off is mostly because people are getting better at the black magic of training a deep network.
So these grad students are exactly the people you want to learn from because they have done the dirty work of fiddling with parameters to know what tricks work and what doesn't. It's probably preferable to a more theory-heavy course because very few people (not even the more experienced professors) understand why those tricks work.
Note: I took an older version of the course which was started by Andrej Karpathy who was a grad student at the time but is now the Director of Artificial Intelligence at Tesla.
The idea of programming something from "scratch" (whatever your definition, and programming language) is the best way to really understand something new. Reading about it, hearing someone speak about it is one thing ... but opening up a blank .c file and adopting a "ok, let's get on with this" approach is something much different.
It takes time though and one has to combat the "how come you're reinventing the wheel" comments from co-workers, spouses, bosses, etc., which can be a challenge.
This is the biggest challenge I face. As I attempt to teach myself computer science and programming, the toughest aspect is to work through SICP. I am always tempted to take the path of least resistance and follow a tutorial to build a tangible program that will impress. Must remember that this is a journey and to build brick by brick, even if that means gathering the ingredients for the clay, then mixing, then laying the bricks!
I'm not really sure it's the most effective learning paradigm. It must be easier to just copy-paste trivial programs and then try to modify them, gradually increasing complexity level, until you get certain level of understanding.
"I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made."
That is the bane of doing probabilistic code. Errors show up not as clear cut wrong values or crashes but as subtle biases. You are always wondering, even when it is kinda working, is it REALLY working or did I miss a crucial variable initialization somewhere?
There might be something deeper there. I am thinking of the line of research, associated with Bengio, about biologically-plausible backprop - it turns out that you can backprop random weights and backprop will still work! Which is important because it's not too plausible that the brain is calculating exact derivatives and communicating them around to each neuron individually to update them, but it can send back error more easily.
It's actually not very different from graphic programming, where a simple rounding error can cause all kinds of troubles, from very small (surfaces or ray not reflecting perfectly where they should be) or very big (completely messing your entire rendering).
Both activities very much resemble chaotic systems and they are both very challenging to debug.
The recurse center (formerly known as Hackerschool) is offering one week mini retreats (their typical program is months long). I applied and didn't get in but after reading this, I may try again.
>On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”
I am the same kind of person. But when John Carmack approaches this with scepticism and concludes that it is indeed not over-hyped, I guess its worth learning after all!
I admit to having the same grouchy thoughts about machine learning and AI: When is this fad going to blow over and we can all get back to writing deterministic programs instead of collecting data and training models, or whatever it is this new breed is doing? Might be time to rethink and revive my curiosity.
That attitude is quite prevalent on the games industry, hence anyone that would spend some time among AAA devs would learn that they don't care one second about stuff like portable 3D APIs and OSes, as people in forums like HN think they do.
What matters is making a cool game with whatever tech is available and ship.
Game engine programmer here. I came up on DOS and Windows, and for a long time that's what I was comfortable with, and everything else I presumed to be alien. Once I'd worked on things that were both cross-platform and low-level, the idea that different hardware platforms and OSes were different went out the window somewhat. It was literally enlightening. In the old days, we had to be mercenaries. SNES is popular now? Port the code to that. PC getting big? Port to that. Later on, it was more like you had to be on everything. The last proprietary engine I worked on would compile with large subsets of features intact on Windows, MacOS, PS2 through 4, XBOX 1 through One, iOS, and Android. Probably more than that! Someone in the company was always adding a new backend, and I was frequently surprised to learn we ran on something I didn't even know we ran on. To a user, these platforms are totally different from each other, right? They aren't. It's all putting data in buffers, pushing parameters on the stack, and jumping. More importantly, the backends were tiny compared to the game engine.
This perspective makes it supremely frustrating anytime I try to get a 'normal' programming job, and run into the 'but do you have .NET4, ASP.NET Core, and JS6??' mentality.
There really is only a few post-game programmer job that will use the skills you've developed creating games: science and new technology development. That is where you'll find developers and dev shops that respect coding without frameworks (because the platform is so new there are none) or without the popular, latest dev tool toys. Working in such a shop is often supremely satisfying because your job is to make that new platform conform to modern development standards. A job an experienced game developer does on their own anyway.
Great advice! I did front and back end web dev at Microsoft for a few years. One of many nice things about working there was I, personally, didn’t get typecast too much. Though, ironically, after my web stint, some game teams didn’t want me anymore :( But I eventually did find a role in HoloLens.
I think the subtext of your comment here is something like, "Look how productive he is without mastering the real tools we Unix hackers use! He can get by with second rate Windows stuff!"
Maybe I'm reading you wrong. But if I am right, it's good to take a look outside of the Unix bubble. Visual Studio is literally the world's most sophisticated developer tool. More human hours of engineering have been poured into it than likely any other piece of software we use on a daily basis.
I laughed when I read the tools he was using, fvwm and vi. Back in 1993 or so, I wrote a hundred thousand lines of filesystem performance benchmarks using those exact tools. They're surprisingly productive.
I'd say it's part of the plan - don't pass judgment over a system until you've tried it for an X amount of time. Same with Windows and Visual Studio. It's sorta related to cargo cult I guess.
If anybody else is interested in doing something similar, I highly recommend Michael Nielsen's online book: Neural Networks and Deep Learning[1]. He gives really good explanations and some code examples.
I ended up writing the most basic feed-forward network in C[2]; although I didn't use base libs like Carmack :(
I think the point about NN and ray tracing being simple systems that allow for complex outcomes is something that seems to be a deeper truth about the universe. Stephen Wolfram and Max Tegmark both talk and write about this - it also shows up in old cellular automata like Conway’s game of life.
It’s pretty cool that so much complexity can come from a few small rules or equations.
Interesting to hear that C++ isn’t that well supported on OpenBSD. The story is quite the opposite with FreeBSD, where it’s really easy to use either clang or gcc. I usually spin up new jails to keep different installations sandboxed. CLion works quite nicely with most window managers on FreeBSD, but I rarely boot Xwindows these days and usually prefer to work with emacs inside tmux from the console.
C++ support is pretty good with base Clang 5, OpenBSD is using the libc++ standard library and libc++abi (FreeBSD uses PathScale's libcxxrt). The situation with debuggers is another story, the base version of gdb is outdated (GPLv2 version). The ports/packages version (egdb) has better support for things like C++, threads, but may lack other features. There is currently no support for the LLVM debugger, sadly.
My gut feeling is(was) that C++ is a first class citizen on OpenBSD. Its interesting having my beliefs challenged like this from someone so influential.
The question becomes; if its not C++, then what is the first class citizen on OpenBSD. And if it is C++ then how do we improve support? Just going to ports seems like a poor answer.
Sure, but is there a popular, modern C compiler out there that doesn't compile C++ as well? Clang, GCC, Intel, MSVC... all of them compile C++ as well as C. I don't think it's possible to have first-hand support for C and not C++ any more unless you ship your own compiler. It sounds like the issue here is that OpenBSD is shipping older versions of LLVM/Clang and GCC.
I find it very admirable that he can sit down for a week and just focus on one main subject. Personally, I get derailed all the time when I don't have a very well defined goal in mind.
It looks a little something like this: I'll be reading a manpage and notice another manpage referenced at the bottom. So I obviously keep crawling this suggestions tree until I bump into a utility whose purpose is unclear. So then I'll go searching online to try and figure out what kinds of problems or use-cases it's meant to help with.
I've structured my life like this. Remote employer in a time zone 10 hours away, I get months long tasks I simply update boss on progress and am left to focus so deep my health suffers because I naturally obsess over my work, which I love. And working from home, my wife also works from home (freelance film producer), so we just immerse ourselves, have meals together, but otherwise we both obsess over our work. No commuting makes this very enjoyable.
I really like the attitude of picking a topic at hand and hack around it for fun. It radiates a very MIT Hacker feel. The writeup is very motivating.
John has been experimenting with a lot of stuff -- Racket, Haskell, Computer Vision and now Neural Networks. I guess there is no professional intent, but the spirit of hacking lives on.
After a several year gap, I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work. My wife has been generously offering it to me the last few years, but I’m generally bad at taking vacations from work.
As a change of pace from my current Oculus work, I wanted to write some from-scratch-in-C++ neural network implementations, and I wanted to do it with a strictly base OpenBSD system. Someone remarked that is a pretty random pairing, but it worked out ok.
Despite not having actually used it, I have always been fond of the idea of OpenBSD — a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship. Linux is a lot of things, but cohesive isn’t one of them.
I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows. I thought a week of full immersion work in the old school Unix style would be interesting, even if it meant working at a slower pace. It was sort of an adventure in retro computing — this was fvwm and vi. Not vim, actual BSD vi.
In the end, I didn’t really explore the system all that much, with 95% of my time in just the basic vi / make / gdb operations. I appreciated the good man pages, as I tried to do everything within the self contained system, without resorting to internet searches. Seeing references to 30+ year old things like Tektronix terminals was amusing.
I was a little surprised that the C++ support wasn’t very good. G++ didn’t support C++11, and LLVM C++ didn’t play nicely with gdb. Gdb crashed on me a lot as well, I suspect due to C++ issues. I know you can get more recent versions through ports, but I stuck with using the base system.
In hindsight, I should have just gone full retro and done everything in ANSI C. I do have plenty of days where, like many older programmers, I think “Maybe C++ isn’t as much of a net positive as we assume...”. There is still much that I like, but it isn’t a hardship for me to build small projects in plain C.
Maybe next time I do this I will try to go full emacs, another major culture that I don’t have much exposure to.
I have a decent overview understanding of most machine learning algorithms, and I have done some linear classifier and decision tree work, but for some reason I have avoided neural networks. On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”
In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere, but I wound up watching a lot of the Stanford CS231N lectures on YouTube, and found them really valuable. Watching lecture videos is something that I very rarely do — it is normally hard for me to feel the time is justified, but on retreat it was great!
I don’t think I have anything particularly insightful to add about neural networks, but it was a very productive week for me, solidifying “book knowledge” into real experience.
I used a common pattern for me: get first results with hacky code, then write a brand new and clean implementation with the lessons learned, so they both exist and can be cross checked.
I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.
I was pretty happy with my multi-layer neural net code; it wound up in a form that I can just drop it into future efforts. Yes, for anything serious I should use an established library, but there are a lot of times when just having a single .cpp and .h file that you wrote ever line of is convenient.
My conv net code just got to the hacky but working phase, I could have used another day or two to make a clean and flexible implementation.
One thing I found interesting was that when testing on MNIST with my initial NN before adding any convolutions, I was getting significantly better results than the non-convolutional NN reported for comparison in LeCun ‘98 — right around 2% error on the test set with a single 100 node hidden layer, versus 3% for both wider and deeper nets back then. I attribute this to the modern best practices —ReLU, Softmax, and better initialization.
This is one of the most fascinating things about NN work — it is all so simple, and the breakthrough advances are often things that can be expressed with just a few lines of code. It feels like there are some similarities with ray tracing in the graphics world, where you can implement a physically based light transport ray tracer quite quickly, and produce state of the art images if you have the data and enough runtime patience.
I got a much better gut-level understanding of overtraining / generalization / regularization by exploring a bunch of training parameters. On the last night before I had to head home, I froze the architecture and just played with hyperparameters. “Training!” Is definitely worse than “Compiling!” for staying focused.
Now I get to keep my eyes open for a work opportunity to use the new skills!
I am dreading what my email and workspace are going to look like when I get into the office tomorrow.
Better non-JS link: http://archive.is/MvKHy. This link is of a snapshot of the DOM of rendered page. The one you linked has a not-so-comfortable layout due to being made for old phones with small screens.
I've also taken the time to implement a NN in C++ and train it on the MNIST handwriting data. It's a lot of fun :)
As a result I have some pretty fast CPU NN code lying around.
I think his passion is and always will be total immersive VR. I truly believe we are where are today completely on the back of Carmack's recognition of Occulus potential. He was just trying to be friendly and gave palmer assistance. Glad it all worked out though.
I find it peculiar about people's continued good feelings towards Carmack.
During the Zenimax/Oculus case:
He claimed that he never wiped his hard drive: An independent court expert found that most of his hard drive was wiped after Carmack heard about the lawsuit. So Carmack lied in his affidavit.
He claimed that no source code from Zenimax ever got transferred over to Oculus. Then later admitted that the emails he had taken form his Zenimax laptop on his last day there did contain source code. He denies that the source code in the emails benefitted him and that he "rewrote" all the code anyways. But this runs counter to the testimony of Oculus programmers who admitted they copied Zenimax code straight into the Oculus SDK.
He also has not outright denied the copy claims from the testimony of David Dobkin's in which Dobkin testified about the similarity between the source code in Zenimax and the source code in Oculus's SDK. Carmack instead accuses him of doing it for money and argued that the methodology wasn't very robust. But no denial, just ad hominem attacks.
So why do people still fawn over him when it seems like his ethics are dubious?
(Granted it could be possible that the technical aspects of the case went over the heads of jurors. And I will admit to being wrong if the Oculus appeal ends up revealing more information.
But his HD was discovered to be wiped, and even Oculus programmers admitted to copying code. Why does he get a free pass on this?)
He doesn't. He is in the wrong. I just don't care. I'm in the wrong every time I pack a bowl of weed. Still don't care.
I understand why John doesn't own the fruit of his labor. I understand why John is willing to lie, cheat, and steal the people benefiting from the system that takes the fruit of his labor from him.
If you don't like it, build a better system that prevents it.
I don't know anything, at all, about this case or much about him except the standard.
But is there any point in bringing it up now? I mean, whether or not he did something illegal/unethical, does this now mean that literally every single time he writes something, someone feels the need to chime in with "but remember that bad thing he did once?".
I'm not saying this isn't a legit conversation or that you're unethical or something for bringing it up... I just really think this isn't the time or place for this.
I agree. The guy had a quarrel with someone over something. Maybe he was wrong. Maybe he was right. Maybe it's just not that simple. I don't see the importance of it one way or another.
But this seems to be a trend right now. Our tolerance for any sort of moral ambiguity or uncertainty has sunk to such low levels that we can no longer appreciate someone's work or insight without establishing that this person is flawless in every respect.
It's not a 'double standard' to treat different situations differently. If you personally want to consider sexual assault and IP theft as the same situation, then you can, but IMO (and I believe in the average worldview) sexual assault is worse.
Why is that surprising? The idea of not owning the code you wrote isn't exactly cherished in hacker culture. People may half-heartedly accept that what he did was wrong, but they will be far from feeling outraged about it.
Yeah, Carmack has a history of unethical behavior. From the book "Masters of Doom":
> Late one night Carmack and his friends snuck up to a nearby school where they knew there were Apple II machines. Carmack had read about how a thermite paste could be used to melt through glass, but he needed some kind of adhesive material, like Vaseline. He mixed the concoction and applied it to the window, dissolving the glass so they could pop out holes to crawl through. A fat friend, however, had more than a little trouble squeezing inside; he reached through the hole instead and opened the window to let himself in. Doing so, he triggered the silent alarm. The cops came in no time.
> The fourteen-year-old Carmack was sent for psychiatric evaluation to help determine his sentence. He came into the room with a sizable chip on his shoulder. The interview didn’t go well. Carmack was later told the contents of his evaluation: “Boy behaves like a walking brain with legs ... no empathy for other human beings.” At one point the man twiddled his pencil and asked Carmack, “If you hadn’t been caught, do you think you would have done something like this again?”
> “If I hadn’t been caught,” Carmack replied honestly, “yes, I probably would have done that again.”
> Later he ran into the psychiatrist, who told him, “You know, it’s not very smart to tell someone you’re going to go do a crime again.”
> “I said, ‘if I hadn’t been caught,’ goddamn it!” Carmack replied. He was sentenced to one year in a small juvenile detention home in town. Most of the kids were in for drugs. Carmack was in for an Apple II.
well... damnit, it is frustrating when people ask you something then infer something different from a question they think they asked, vs what they asked! :/
Carmack: Breaks into a building at age 14, steals a bit of code today. "Oh, he has a history of unethical behavior, we shouldn't look up to him."
Bill Gates: Steals a bulldozer to race with his buddies at age 19(?), is responsible for Microsoft's corporate culture and history of predatory behavior. Hackernews loves him and can't get enough of Microsoft. Boy, they're a fair sight better than Google, eh lads?
I pity the person who learns ethics from Hackernews.
I haven't experienced a consistent love fest here for Bill Gates. There are supporters and detractors, just as you'd expect for any public figure. Perhaps you notice more those opinions with which you disagree.
> He claimed that no source code from Zenimax ever got transferred over to Oculus. Then later admitted that the emails he had taken form his Zenimax laptop on his last day there did contain source code.
There is a big difference between wrapping up a source control tree and having fragments in emails.
Also let's not pretend that the idea that you can remember something and re-type it is fine, but if you copy and paste it from an email, that's terrible. The line is pretty blurry.
I 100% agree that hackers should be judged by his/her hacking.
On the other hand, I found that many people consider me wrong when I judge a hacker by his/her hacking when he/she is accused of sexual abuse. But apparently they are okay with someone accused of IP crime.
Sexual abuse and IP crime are very different in terms of their stigma, and rightly so, in my opinion, so I don't see what's surprising about your final paragraph.
You can separately judge the abuse or the hacking.
If you judge the person as such (that entity which you're calling "the hacker"), then, though both are part of the person, the abuse is more relevant than the hacking. At least, I would guess, in most people's estimation.
Referring to that person using the noun phrase "the hacker" rather than "the abuser" doesn't make the hacking a more relevant measure of the person.
> I found that many people consider me wrong when I judge a hacker by his/her hacking when he/she is accused of sexual abuse. But apparently they are okay with someone accused of IP crime.
So what should we do, burn out his synapses with mycotoxin?
The history of this business is written by IP thieves who, once they get big, turn around and lobby Congress for tougher IP laws, to criminalize the very methods they used to get big in the first place, so that the next gang of thieves will not be able to steal so effectively.
What Carmack did is not egregiously unethical for this business. If anything he embodied the hacker spirit of "fuck the rules". It's on dubious ethical ground and certainly not the cautious approach used by Stallman, who colors assiduously within the lines so no one could ever accuse him of anything untoward. But again, look at the history of this business.
He's always been one of my programming heroes. I don't know what he did or didn't do in this case. But, if he did lie, that would bring him down a notch in my mind. However, if there's anyone who could quickly rewrite a system like that, it's definitely Carmack.
Anyway, I will probably always read or listen to anything he has to say on the subject of programming.
> So why do people still fawn over him when it seems like his ethics are dubious?
> Why does he get a free pass on this?
Because many people draw a distinction between something being "unethical" vs being "illegal". What exactly do you think is the problem with his ethics? How serious do you think it is?
It's often a mistake to infer someone's ethics from their apparent behaviour within the context of a legal battle. The consequences from even very minor details slipped or misinterpreted can be huge.
Is this the same court expert who allegedly used unreadable code printouts (that nevertheless look different at a distance) to establish non-literal code copying and whose expert testimony is still under court seal instead of in the public record? It's hard to draw any inferences from that trial as an outsider. You'd do better to focus on the stories in Masters of Doom that show dubious ethics (like 'borrowing' work resources to work on games you later sell to get money and be free from the job where you were borrowing resources). Of course to some of us those ethics aren't really that dubious, it's the law that has dubious ethics.
From the post:
> a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship
While I dont't believe that one could argue that FreeBSD isn't focused on quality, nor craftsmanship, it always struck me as lacking a consistency. FreeBSD on the surface seem focused, because the base system is bundled, like with the other BSDs. They don't however seem to have a clear syntax for tools and configuration files. The worst offender, and this is just my personal opinion, is ZFS. ZFS makes no attempts to hide that it's bolted in. In terms of configuration there's no doubt that it was lifted from Solaris.
OpenBSD have less features that FreeBSD, or Linux, but the features that are available are clearly made by one team with the same focus and direction.
That being said, I don't think we should put to much emphasis on Carmacks choice of operation system in this case.