Simple example of machine learning in TensorFlow

dbcurtis · on March 13, 2017

I like these kind of "Hello, world!" examples for TensorFlow. As a TensorFlow beginner, I need all the references I can get. Here is what I need right now: "Hello, we meet again!". I can build a neural net model, and train (albeit, often badly) a model, but saving and restoring the trained weights so that I can run the model again is giving me fits. I am clearly missing something fundamental about how to restore a TensorFlow NN model.

For your next tutorial, may I suggest: 1) a list of do's and don'ts for constructing a savable/restorable model, and 2) a wee bit of example code.

Of course, now that I have discovered Keras I'm moving away from low-level direct TensorFlow. But I suspect I'm not the only one a bit foggy about the whole save/restore work flow.

pmalynin · on March 13, 2017

As a person that uses TensorFlow for his day job:

I find that saving and restoring are of the weirder things with TensorFlow, you can either go all out an decide to save out all the variables, or only the ones needed for the model.

You usually don't want to save out gradients (which are also variables) since they take up a bunch of space and aren't actually that useful to restore. Now on the other, what are model variables -- do you want to save model variables + the moving averages ... or just the averages. But then when you're loading you'll have to "shadow" the moving averages to the real variables that actually run in your model.

Good news though, most of the scaffolding code you can write once and re-use it over and over again.

minimaxir · on March 13, 2017

In Keras, it's just a simple model.save() [to a hdf5 file] and load_model(). This includes both the weights and the architecture.

Models with a few million parameters result in a file around ~50MB, which is still reasonable for modern production use cases.

glial · on March 14, 2017

Keras makes using deep learning for simple-ish use cases sooooo easy.

matheweis · on March 14, 2017

I second this - I'm really excited about Keras being integrated into the core of Tensorflow (other than the chance it might lose the Torch compatibility).

jostmey · on March 13, 2017

jostmey · on March 13, 2017

I would put together a tutorial on how models are saved and restored if I understood it, which is to say it still confuses me

hcrisp · on March 13, 2017

This save/restore tutorial was shared early by someone who is writing similar TensorFlow primers:

https://blog.metaflow.fr/tensorflow-saving-restoring-and-mix...

dbcurtis · on March 13, 2017

Great! Thanks, bookmarked.

matsemann · on March 13, 2017

> As a TensorFlow beginner, I need all the references I can get.

I think that having to use a duck typed language makes it so much harder than it has to be. The API is huge, and you get really no help from the IDE. If I could do tf.train.GradientDescentOptimizer as in the example, and then get autocomplete to what I can do next with that object it would really help. Or list functions that takes that kind of object as input. Now, one is searching in the dark.

Jach · on March 13, 2017

This sounds like a problem with your IDE than with the language. Vim doesn't have this problem. See https://www.youtube.com/watch?v=TNMjbaimk9g and I could probably find older systems that predate the plugins mentioned in the video.

allenlavoie · on March 14, 2017

A lot of the best practices for initializing/saving/restoring/etc. are handled automatically if you wrap your model in an Estimator[1] using a model_fn. It also enforces a "clean" model specification (in a new Graph) and decoupling of the input pipeline from the model.

However, Estimators are not in TensorFlow core, which means the API isn't fixed quite yet.

[1] https://www.tensorflow.org/extend/estimators

pred_ · on March 13, 2017

That's nifty; I was looking for something like that just a few weeks ago for a work demonstration! Ended up doing https://gist.github.com/fuglede/ad04ce38e80887ddcbeb6b81e97b... instead.

rhcom2 · on March 13, 2017

Thank you to you and OP for both sharing these resources. Really helpful.

nemo1618 · on March 13, 2017

I wish there were more TensorFlow examples written in Go. I made the mistake of checking out TensorFlow as my first intro to ML and it flew about 10 miles over my head. Slowly learning now, but most of the documentation and tutorials are written in Python.

This blog series was also helpful on a conceptual level: https://medium.com/emergent-future/simple-reinforcement-lear...

make3 · on March 13, 2017

as a deep learning professional, the deep learning community is something like 99% Python. You'd probably better learn Python at least well enough to recreate the corresponding Go code in your mind instantly.

djhworld · on March 13, 2017

tbh you'll be better off just sticking with python for now, until you understand what's going on.

dnautics · on March 13, 2017

I think it's a really good exercise to take TensorFlow examples and algorithms and reimplement them in another language.

calebm · on March 13, 2017

>>> You are one buzzword away from being a professional. Instead of fitting a line to just eight datapoints, we will now fit a line to 8-million datapoints. Welcome to big data.

LOL :) (Side-note: 8 million is still not big data)

mcrad · on March 13, 2017

Big Data is a reference to complexity of the data & underlying system that data represents, NOT the number of datapoints.

lol

pyromine · on March 14, 2017

Big data is really just a buzzword that no one knows what it really means, because everyone's definition is different

lol

happycube · on March 14, 2017

I always think in terms of Munchkin: "any data that is not Big is small"

JonathonCwik · on March 13, 2017

So I'm still wrapping my head around some of the math (I haven't had a math class in a handful of years)...

I get the output of the model (y_model = m*xs[i]+b), it's the y = mx + b where we know x (from the dataset) and have y be a variable.

The error is where I start to lose it, so I get the idea of the first part (ys[i]-y_model). It's basically the difference between the actual y value (from the dataset). I get that we want this number to be as small as possible as the closer to zero it is for the entire dataset that means we get closer to the line going through (or near) all the points and the closest fit will be when this total_error is nearest to zero.

What I don't get is the squaring of the difference. Is it just to make the difference a larger number so that it's a little more normalized? How do you get to the conclusion that it needs to be normalized? Same thing with the learning rate? I believe these to be correlated but I can't tell you how...

cowabungabruce · on March 13, 2017

Squaring gets you guaranteed positive numbers. Remember, we are adding all the errors together to optimize the model: If we get

sum_errors_A = 4 + -3 + -1

sum_errors_B = 1 + 1 + 1

B is obviously the better model, but it has a higher error than A when comparing. If we squared all the terms and then added, B would be the stronger model.

JonathonCwik · on March 13, 2017

Ah, gotcha! Makes a lot more sense now.

jasode · on March 13, 2017

>What I don't get is the squaring of the difference.

The following answer is about standard deviation but the desired properties from squaring is also relevant to your question:

http://stats.stackexchange.com/questions/118/why-square-the-...

flor1s · on March 14, 2017

I don't think there is much to gain from this tutorial, then again it doesn't pretend to offer you much either. For example in the code you are discussing, defining a model as variables and operations, instead of as a function, was confusing to me. Probably Tensorflow overloads the operations, but normally when you read "y = mx + b" you expect y to be computed directly and not to be stored as a model. "f = lambda m, x, b: m x + b" seems much more clear to me.

gkjohns · on March 13, 2017

a) it ensures that they're positive so they don't just cancel each other out and b) like you mentioned, it penalizes huge errors more heavily.

There are also historical reasons for using squared error. The square function is smooth and differentiable to you can analytically solve for the gradient. Before fast computers this was crucial for solving regression problems as a closed for makes everything easier.

Kiro · on March 13, 2017

This is awesome!

I currently have a small pet project where I think some simple ML would be cool but I don't know where to start so these things are great.

Basically my use case is that I have a bunch of 64x64 images (16 colors) which I manually label as "good", "neutral" or "bad". I want to input this dataset and train the network to categorize new 64x64 images of the same type.

The closest I've found is this: https://gist.github.com/sono-bfio/89a91da65a12175fb1169240cd...

But it's still too hard to understand exactly how I can create my own dataset and how to set it up efficiently (the example is using 32x32 but I also want to factor in that it's only 16 colors; will that give it some performance advantages?).

nl · on March 13, 2017

https://blog.keras.io/building-powerful-image-classification... is what you want.

hermiti · on March 13, 2017

Kiro,

Here's how I solved a similar problem in my deep learning class. Instead of classifying between 10 animals you would just have 3 possible labels.

https://github.com/hermiti/deep_learning_project_2/blob/mast...

cosmicexplorer · on March 14, 2017

What is the meaning of the "?bare" query string in the url? I googled around for the meaning of query strings on the github site but only found rnandom repos on github (not sure how to narrow the search). The first time I tried removing it I saw another folder named "to_do", but this is gone now so it might give a version which is cached for longer somehow?

cosmicexplorer · on March 14, 2017

OK, found out what a bare repository means and pretty sure that's what it refers to. Still can't find any documentation for the query string parameter and don't know how that makes sense for github's repository view page.

blauditore · on March 13, 2017

I'm not sure about the rules, but shouldn't posts linking to own, personal projects be prefixed with "Show HN:"? I've seen a lot such posts lately where the poster was clearly the author as well.

pvg · on March 13, 2017

No, Show HN is a different thing with its own (generally stricter) rules. You don't have to add it to things just because you happen to be the author.

https://news.ycombinator.com/showhn.html

kyleschiller · on March 13, 2017

I think you meant bare bones.

mediocrejoker · on March 13, 2017

I'm guessing english is not your first language, so I just wanted to point out that "bare bottom" is generally synonymous with "uncovered buttocks" ie. in the context of changing an infant's diaper.

Perhaps you were meaning to put "bare bones"? Google's definition of the latter is "reduced to or comprising only the basic or essential elements of something."

Don't want to detract from your point but I think your title is throwing some people off. I know I would be hesitant to click something at work that sounds like it could contain nudity.

dekhn · on March 13, 2017

I think the last example was a clASSifier, so it makes sense.

fanpuns · on March 13, 2017

But it works sooo perfectly in the current vote count...

https://t.co/5cLU9F5MCh

hughes · on March 13, 2017

Well, the title of the tutorial is "Naked Tensor"...

sctb · on March 13, 2017

Thanks, we've removed that from the submission title.

bencollier49 · on March 13, 2017

"Bare bottom"? I'm not clicking on this.

bencollier49 · on March 14, 2017

Downvoted! The title might have changed now, but the original one was completely indecipherable. As far as I could tell it was genuinely some sort of image recognition algorithm for naked buttocks.

BonoboBoner · on March 13, 2017

Simple example? Before finishing the first paragraph, it says

"The slope and y-intercept of the line are determined using gradient descent."

What on earth does that mean? Maybe they should teach mathematics in english at universities outside of english speaking countries. German mathematics does not help here.

I wish there was a 4GL like SQL for machine learning using dynamic programming for algorithm selection and model synthesis like a dbms query planner.

PREDICT s as revenue LEARN FROM company.sales as s GROUP BY MONTH ORDER BY company.region

sampo · on March 13, 2017

> "The slope and y-intercept of the line are determined using gradient descent."

Slope and intercept are very standard names for the parameters of a linear regression model. Gradient descent is the name of the algorithm used.

spitfire · on March 13, 2017

This has been done before, though by who eludes me. I've seen it here on HN a few years ago.

and it is absolutely a good idea, as long as you include validation and QA abilities right along side train/predict.

ice109 · on March 13, 2017

an article written in english assumes familiarity with english vernacular? who would've thought!