Engineering is the bottleneck in deep learning research

novaRom · on Jan 18, 2017

After few years of research in DL I learned not to trust any single paper at all. 99% of DL papers are not scientific but rather 'hey guys, look this trick is our new awesome discovery'.

Also I learned to communicate with other teams and exchange with ideas proved by practices - this really helps.

To improve, I would suggest to publish whole setup, all the parameters used, the programming code, and publish either all the data or a reference to a large free data set (no MNIST anymore in papers, please).

daveguy · on Jan 18, 2017

> (no MNIST anymore in papers, please)

If you have a breakthrough in transfer learning then you will be able to very effectively demonstrate it with MNIST.

The race to the bottom is essentially over, but that doesn't mean MNIST can't be used to demonstrate learning.

Regarding setup and parameters. I hope AI researchers move toward something like pachyderm [https://pachyderm.io] -- providing a single docker image to completely replicate their work. However, I sincerely doubt that will happen. As "open" as research is the details are almost always obfuscated to prevent competition with the spin-out company (or other researchers).

AIMunchkin · on Jan 18, 2017

How about unlocking access to ImageNet? Unless one has a .edu account, its overlords seem to ignore requests to access it. Mind you, it's relatively easy to social engineer access to it, but why should this be necessary? OpenAI and Google have both knocked it out of the park with easy access to datasets and examples.

But sadly, IMO at the amateur-level, TensorFlow considered harmful. I have repeatedly observed novices blow the thing up by starting from one of many of its amazing and fantastic teaching examples. It's not a question of the TensowFlow API, but rather of the engineering quality of its underlying engine, which kind of sucks. Nothing ruins an enthusiastic data scientist's day like a cryptic seg fault for no apparent reason whatsoever.

And I know they're working on it, but fer cryin' out loud, the API is great, and Google has the bottomless pockets to do a lot better than this. It's been over a year and I still see people throw their hands up in frustration trying to make use of the thing. Of course, Google has never been a customer-driven company, but if we don't want an AI Fall, methinks this needs to be fixed.

daveguy · on Jan 18, 2017

I agree. I think all reference datasets should be free and open source. Also, researchers shouldn't publish on datasets that are not free and open source. That is a basic requirement for repeatability.

As far as the TensorFlow API is concerned. This may be a tradeoff between speed and robustness. In order to have every operation checked every time would certainly slow down the code for general use. Probably better how-to / setup / use guides are a better solution for this (unless it is a flat out bug).

felxh · on Jan 18, 2017

Anecdotally, my master thesis on natural language processing was supposed to consist of first reproducing the results of an influential paper (back then) and then hopefully improving upon it by extending the model used.

The paper made it seem like they had been using a standard PCFG parser (which circulated in the research community at the time) to achieve their results. It turned out they hadn't and instead had written a custom one and in fact their results were not reproducible using the standard parser.

What was meant to be a timesaver in terms of engineering (using a standard parser instead of writing your own) turned out to be a massive time sink. It also turned out that by using a custom parser they had unintentionally diverted from a vanilla PCFG (probabilistic context free grammar), or in other words, some implementation details had led to a departure from the assumed underlying theoretical model.

mattsouth · on Jan 18, 2017

A lot of research depends on software written by researchers and yet writing software is not really supported or incentivised by academia. Organizations like the software sustainability institute in the UK (http://software.ac.uk) are lobbying to change this, with some success, but I guess it takes a long time to effect cultural change.

jpolitz · on Jan 18, 2017

See also Artifact Evaluation, a process used in several PL/SE conferences that makes evaluation of code (and datasets/studies) an explicit step in the review process:

http://evaluate.inf.usi.ch/artifacts

http://www.artifact-eval.org/

mattsouth · on Jan 19, 2017

Hey thanks - I hadnt seen those before. Good stuff.

bottled_poe · on Jan 18, 2017

Engineering is just implementation details. That has no relevance in academia. /s

PaulHoule · on Jan 18, 2017

Unfortunately CS is not as rigorous as some other academic fields.

I've been to 100+ colloquia in the physics dept at Cornell and I have never been at one that I felt was a waste of time or that the person should not belong there.

The CS department colloquium is a different story: yes I got to see Geoff Hinton before he became a celebrity but maybe half of the talks are awful.

AIMunchkin · on Jan 18, 2017

IMO (to be fair, some) CS people became engineering bottlenecks the day the universities switched out teaching C/C++ for Java and Python (IMO the Why Not Zoidberg? of programming languages). Those who learned C/C++ anyway became my heroes.

I have sat through too many presentations obsessing on HW-level perf/W especially w/r to Deep Learning ASIC wannabes. Just writing one's code in C/C++ (and doing it well) guarantees at least a 2x improvement over Java and a 10-100x improvement over Python. I won't even bring up the computational coup that is CUDA.

But hey, let's base a mobile phone OS on Java and block low-level access to its GPU, that's a fantastic idea, right?

See also many experiences with data scientist and CS primadonnas dismissing low-level coding as "ops." I liken this to the Eloi dismissing the Morlocks as "the help."

PaulHoule · on Jan 18, 2017

I have seen some painful C and Python written by luminaries. On the other hand, the MiniSAT source code is 500 lines of beautiful C++.

leecarraher · on Jan 18, 2017

wrong thread, take the best programming language fight elsewhere

AIMunchkin · on Jan 18, 2017

It's not a best programming language fight. In my experiences in the industry, an enormous of amount of technical debt and operational inefficiency is accrued when someone ignorant of how machines and processors actually work (SIMD, cache, pipelines, threading, etc) is in a leadership position to dictate the toolset for solving problems.

This wasn't a noticeable issue until about a decade ago. But it is now and it continues to get worse IMO. The "programming language" bit is just one of its symptoms when the root cause is ignorance of practical computer architecture.

That said, the mentality of throwing all big data problems at Hadoop clusters with 4 year-old GPUs and flaky 10 gB interconnect (many of which could be solved faster on one.big.modern.and.cheaper.machine(tm)) is working wonders for my Amazon stock so maybe I should just shut up and get rich?

elitro · on Jan 18, 2017

Ah yes, the story of my master's ML thesis.

I had to select features from multiple papers in order to try and select the best ones with classification results to prove it.

A few problems included:

- Incomplete/unavailable datasets (404 on some copyright pictures)

- Features consisted on Math formulas and text descriptions (no code whatsoever)

- Classifier names only (which framework did you use? parameter values?)

In the end i couldn't contribute as well, got instructions to save my work in a private repo despite being funded by an EU academical scholarship.

saip · on Jan 27, 2017

Agreed. The tooling around deep learning is not as mature as the tooling around software development. There is a fair amount of engineering and grunt work needed to even get started, let alone build on others' research. A few problems from top of mind: - Setup: Installing DL frameworks, Nvidia drivers and CUDA is an exercise in dependency hell. Trying to run someone's project, which has different dependencies than what you have is difficult to get right. Docker images [1] and nvidia-docker make this simple, but are still not the norm. - Reproducibility: This is big as Denny mentions. Folks still use Github for sharing code. But DL pipelines need versioning of more than just code. It's code, environment, parameters, data and results. - Sharing and collaboration: I've noticed that most collaboration on deep learning research, unlike software, happens only when the folks are co-located (e.g. part of the same school or company). This likely links back to reproducibility, but there are not many good tools for effective collaboration currently IMHO. [1] https://github.com/floydhub/dl-docker (Disclaimer: I created this)

rubidium · on Jan 18, 2017

Sorry to knock the post author off his high horse, but "just like you wouldn’t want a highly trained surgeon spending several hours a day inputting patient data from paper forms." Highly trained surgeons _do_ spend several hours a day doing tedious paperwork.

As a researcher, I expect 50-90% of my time to be slogging through organizational and preparatory work.

transcranial · on Jan 18, 2017

They _do_, but they _shouldn't_. It's an inefficient allocation of skills and resources.

ch4s3 · on Jan 18, 2017

>It's an inefficient allocation of skills and resources

Is it? Maybe the paperwork is important for other members of the care team and the surgeon is the only one who is familiar enough with the surgery to fill out the forms. And, you can't reasonably be doing surgery round the clock.

audleman · on Jan 18, 2017

I agree with the central thesis: engineering is a huge bottleneck. I work for a FinTech company that is building novel machine learning models and this is our experience.

We've had a few machine learning experts working here for a couple of years, but recently brought in a software engineer with a passion for machine learning. He was able to, within a few months, streamline the data acquisition pipeline to the point where we could iterate on a new models in about 30 minutes, down from days. He accomplished this not just with better data but by building efficient in-memory data structures. It saves literally days of time per iteration because of disk I/O.

Before his work the training data versus the data we used in production had minor differences. Each new release required intensive manual verification to make sure that our model worked. Now we have much more certainty that the two match up.

Looking down on engineering problems is like a famous architect looking down on structural engineers. You're not gonna have a very good skyscraper if your foundation is shaky and ad-hoc.

leecarraher · on Jan 18, 2017

coming from the machine learning research community, i am in awe of the availability and relative ease-of-use of the deep learning frameworks. Rarely can i find comparison code in ML that i didn't have to bug the author for, or try and implement myself based on the paper alone. in short DL is on a much better path than the author realizes. Perhaps we have to thank the github/bitbucket era. The real problem with DL is that until there is a more robust theory(if even possible, the bane and boon of ML is the complexity of the models), much of the application research will sorta be a form of digital alchemy.

siscia · on Jan 18, 2017

Just yesterday, browsing the forum of kaggle, I thought that we may need a GitHub for Deep Learning...

So that you could link your code to a dataset, have it automatically run, and show the result...

Not sure if it is worth the time...

daveguy · on Jan 18, 2017

I think OpenAI is working on this with gym and universe.

https://universe.openai.com

https://gym.openai.com

A general dataset pool in OpenAI would be nice. Kaggle has quite a few just basic datasets (MNIST etc) for evaluation.

autokad · on Jan 18, 2017

i was playing around with the emotion data set on kaggle using tensorflow. depending on the seed, i was getting between 58 and 60% accuracy on the hold out test set (what you submit against).

i thought i came up with a good set of hyper parameters using aws gpu instances (python 2.7). i wanted to visualize some of the outputs so I copied the code to my machine and ran under python 3.5 (windows) and only got 57% accuracy. these swings in accuracy are huge