More

jing · on Nov 4, 2019

In practice the functions just need to be piecewise differentiable. The RELU is the canonical example for deep learning. At kinks a subderivative is used.

mlthoughts2018 · on Nov 4, 2019

It’s a little trickier than that, but you are generally right. The relaxation can involve many different things though, not just piecewise differentiability.

For example, when processing images that themselves may be intensity functions over a spatial domain, and the intensity function can have cusps, edges, occlusions, etc., then you need different weak-sense differentiability conditions, such as from Sobolev spaces, to guarantee that numerical gradient operators will succeed / converge where needed.

https://en.m.wikipedia.org/wiki/Sobolev_space

jing · on Oct 23, 2019

Actually I'm pretty sure the Rome numbers are for double precision whereas most numbers quoted for GPUs are for single precision or less, making Rome's 3.4tf even more impressive.

bitL · on Oct 23, 2019

Half of Titan V/V100 FP64 sounds unbelievable! Can't wait to get my hands on 64c Threadripper with TRX80!

simcop2387 · on Oct 23, 2019

Yea I can't wait for the announcements to happen in the coming weeks. So far the best rumours I've seen seem to have them topping out at 48c for the new TR line, but none of them have been particularly authoritative looking, just that there haven't been leaks of anything looking at 64c yet. I suspect that what might happen is that they'll announce with up to 48c but then a few months down the line they'll announce the 64c cpu. That would line up well with what looks like caused the delays with them not being able to get the quantity of the chiplets they need. They'd be able to frontload all the lower core count demand and then when they don't need nearly as many of them start making the larger ones.

bitL · on Oct 23, 2019

I really hope 32c would work with my Zenith Extreme x399, and 64c with 8-channel TRX80/WRX80. So I could upgrade old TR with a 32c Zen 2-one and buy another 64c with 4TB ECC LRDIMM for some Machine Learning tasks. I am also fine if AMD decided to do 64c TR with Zen 3 only (4xSMT?). But based on Blur ad, I guess they are going to release 64c TR based on Zen 2 as well, just to completely obliterate Intel in HEDT, even if it costs $5000.

simcop2387 · on Oct 24, 2019

Yea the x399 compatibility will decide when my upgrade happens. The Zen+ TRs weren't enough for me to justify but the Zen2 ones seem like they've finally hit. If I need to do a motherboard and other upgrades that'll delay me doing so for a while (need to see how PCIe passthrough and other stuff settles out with the new chipsets) but in either cases I'm going to end up upgrading to this next gen one way or another.

rrss · on Oct 23, 2019

How much does threadripper usually cost relative to epyc for same # of cores?

bitL · on Oct 23, 2019

20-30% usually. You get faster cores but no LRDIMM (i.e. you are constrained effectively to 128GB ECC UDIMM, at best 256GB ECC UDIMM if you are lucky to get 32GB ECC UDIMM modules). EPYC has 4TB ECC LRDIMM ceiling, new TR on TRX80 might have the same ceiling as well. I am glad that AMD provides TR as they make way less $ on them than on EPYC, but it's a great marketing tool for them. I am running some TRs for Deep Learning rigs (PCIe slots are most important) on Linux, and they are great, Titan RTXs and Teslas run without any issue, but Zen 2 should give me much better performance on classical ML with Intel MKL/BLAS in PySpark/SciKit-Learn, so I can't wait to get some.

silvr · on Oct 23, 2019

Naive question: Are you able to use MKL on an AMD chip without jumping through too many hoops?

bitL · on Oct 23, 2019

Yes, just pip install ..., but it's 2x slower than on Intel for Zen/Zen+. Only Zen 2 is close to Intel.

sliken · on Oct 24, 2019

Intel makes rather pessimistic assumptions about AMD and uses the model name to pick which code path to use and ignores the CPU flags for floating point, etc.

So if you want to compare performance fairly I'd use gcc (or at least a non-intel compiler) and one of the MKL like libraries (ACML, gotoblas, openblas, etc). AMD has been directly contributing to various projects to optimize for AMD CPUs. They used to have their own compiler (that went from SGI -> cray -> pathscale or similar), but since then I believe have been contributing to GCC, LLVM, and various libraries.

bitL · on Oct 24, 2019

Yeah, still, Zen 2 is much faster in OpenBLAS and is faster in MKL than Zen/+ as well.

sliken · on Oct 24, 2019

It's lumpy and depends on exactly when you ask.

If shopping I'd compare the highest end Ryzen + motherboard and the lowest end Epyc single socket chip and motherboard and try to guesstimate that price/performance for your workload.

Generally the Threadrippers seem like a much lower volume product and the motherboards are often quite expensive (for the current generation). Both Ryzen and Epyc enjoy significantly higher volumes.

Keep in mind that Threadripper has twice the memory bandwidth of Ryzen, but half the memory bandwidth of Epyc.

jing · on Oct 23, 2019

Why not just get the 7702p?

bitL · on Oct 23, 2019

I guess TR will be a bit cheaper and higher clocked? And I don't really care that much about ECC errors for ML.

jing · on July 23, 2019

https://www.fast.ai/2018/07/02/adam-weight-decay/

jing · on June 23, 2019

Particularly since it would not be unreasonable to assume that the "mi" in mimalloc is incorrectly pronounced like the "mi" in Microsoft.

jing · on June 19, 2019

Yep. And Mellanox does it too fwiw -

https://community.mellanox.com/s/article/vma-improves-redis-...

jing · on June 9, 2019

I've got the 9550 and run Ubuntu on it. No issues whatsoever and I do very compute-intensive work on it.

One thing that I've found is very important is to clean out the fans often, otherwise dust builds up and prevents cooling. Just unscrew the plate on the underside of the laptop and blow / brush / pick out the dust that's built up (both the fan intake and the fan outlet). Doing this every few months has been a game changer for my 9550.

jing · on Feb 3, 2019

I think this resource could be helpful:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Essentially, RNNs and feed forward networks are very similar - RNNs are just "unrolled through time" and every timestep shares the same weights. The activations are slightly different as well, but the core concept is the same as feed forward networks; it's not a completely different concept or idea.

jing · on Oct 19, 2018

I find it hard to believe that SGD would be faster than the closed form solutions for linear regression (gels, gelsd etc.). The closed-form solutions give a lot of other benefits in practical settings as well which makes them more likely to be used if possible. SGD + related optimizers give benefits with non-convex or non-analytical loss functions or with non-linear layers / more than one layer.

gnulinux · on Oct 19, 2018

Then why would anyone use tensorflow with this loss function in practice. In my school's ML class, we used this technique too (in addition to closed form solution). Is there any practical reason to use an optimizer to solve a linear problem?

jing · on Oct 19, 2018

Note that it's not just the loss function. It's the loss function combined with a very specific problem formulation - namely a neural network with only linear activations (equivalent to a 0-layer network). Once you go to non-linear layers or a different loss it's no longer solved analytically.

I do see a lot of people writing tutorials like OP's. See for example:

https://towardsdatascience.com/linear-regression-using-gradi...

The existence of these articles should not be taken as an indication of best practice. They often have the goal of teaching SGD in a simplified setting, not teaching best practice for LLS. I suppose only nice thing about using TF / SGD for such a simple problem is that you now have starting point for solving more complex problems (RELU activation, cross-entropy loss, more layers, etc.).

A few other points as to why you would never SGD for LLS:

1) it's always way slower than the closed form matrix solutions

2) if you're doing SGD instead of just GD, there's noise in which "rows" are in a given batch - as a result, repeated runs may not converge to exactly the same final weights. This never happens with the analytical solution which always gets exactly the same result.

3) if you're doing this as part of a data science pipeline which is likely the case in the real world, you'll likely want to do some cross-validation. In the SGD case you have to recompute the entire solution for each fold whereas in the LLS case you can immediately compute CVs once you've calculated the initial XTX / XTYs. This makes the process of using LLS even faster than SGD.

jing · on April 1, 2016

The Xeons support much more RAM.

jing · on March 29, 2016

Are physics engines not yet accurate enough to enable "virtual" pre-training / full training of the networks, lighting conditions, etc? If they are, exclusively using physical robots seems somewhat inefficient.

chriswarbo · on March 29, 2016

Closest thing I can think of is Hod Lipson's self-modelling robots: http://www.creativemachineslab.com/self-modeling.html

Their system evolves a virtual body which is evaluated by comparing its predicted behaviour (e.g. if motor A is rotated by X degrees, sensor B should get response Y) to real physical movements (moving motor A and reading sensor B). Once an accurate virtual body has been made, it's used to evaluate a bunch of (again, evolved) movement styles in simulation. Once an efficient style has been found, it's used to control the physical motors on the robot.

Also related, their lab has a "universal gripper" made out of a balloon filled with coffee granules: http://creativemachines.cornell.edu/positive_pressure_grippe...

louprado · on March 29, 2016

Hmmm... does anyone know if Grand Theft Auto has an API ? I would like to pre-train my autonomous vehicle controller before connecting it to an actual car.

bgalbraith · on March 30, 2016

Ideally, yes, we want to pre-train in a virtual environment using as close to the real model robot as possible. I worked on such a problem as part of my PhD research on mobile robots using the Webots simulator (https://www.cyberbotics.com/overview) as my virtual environment.

In my case, I was working on biologically-inspired models for picking up distant objects. It's impractical to tune hyperparameters in hardware, so you need to be able to create a virtual version that gets you close enough. Once you can demonstrate success there, you then have to move to the physical robot, which introduces several additional challenges: 1) imperfections in your actual hardware behavior vs idealized simulated ones, 2) real-world sensor noise and constraints, 3) dealing with real-world timing and inputs instead of a clean, lock-step simulated environment, 4) having different API to poll sensors/actuate servos between virtual and hardware robots, and 5) ensuring that your trained model can be transferred effectively between your virtual and hardware robot control system.

I was able to solve these issues for my particular constrained research use case, and was pretty happy with the results. You can see a demo reel of the robot here: https://www.youtube.com/watch?v=EoIXFKVGaXw

iandanforth · on March 30, 2016

How hot did that first dynamixel get?

bgalbraith · on March 30, 2016

Only overheated once, though I rarely had them operating continuously for more than a few minutes at a time.

tgflynn · on March 29, 2016

That's a very interesting question. My guess is that the physics of grabbing things, especially non-rigid things, is very messy and difficult to simulate. It would be great if someone here were able to give a detailed answer to this question though.

iandanforth · on March 29, 2016

Ok here goes.

1. The best / most recent attempt at this was for the DARPA robotics challenge and the Gazebo simulator.

This was still very buggy and prone to hilarious / depressing physics.

2. Almost all game physics engines start from rigid body and slap on particles, deformables, etc.

An exciting counter example to this is nVidia Flex which starts with unified particle simulation (much closer to molecular dynamics simulation used for, you know, real work).

3. From the perspective of AI, accurate simulation might not be required.

Intelligence requires complexity and a certain degree of predictability. So as long as you can build a rich and consistent / learnable world then whatever simulation you have could be super useful.

From the perspective of transferring that knowledge into a robot though you need accurate physics.

4. Natural touch sensors are hard to do in rigid body simulators but are super important to naturalistic learning.

There's a ton of information that your sense of touch and body position provide about how the world works, and getting the tens of thousands of soft-contact touch points simulated you need for this kind of sensing is pretty challenging today.

Lots of physics engines do all sorts of things to minimize contact points, or ignore them if there's no motion. You have to work against optimization a lot if you want mechanoreceptors and proprioception.

jonnycowboy · on March 29, 2016

I agree with all the points you made but in addition I would add another - with external cameras for positioning and movement feedback, you don't need to have accurate geartrains or encoders nor have a rigid robot. Since the localization is all in software (and software is scalable/free from Google's standpoint) there are potentials for lots of weight and cost savings on the hardware side. Kind of like my robot: https://github.com/jonnycowboy/YARRM

rboyd · on March 29, 2016

I was hoping to use a robotic arm for a project I'm working on, and wondering if you guys could answer a question about motors. In my very limited research it looked like one of the factors that make the industrial (kuka, etc) robots so expensive was that they use backlash-free motors. What does that even mean?

I also saw a couple startups aimed at sub-$5k robots (like carbon.ai). Are they solving this problem in some novel way?

monknomo · on March 29, 2016

Backlash free motors are motors where the output shaft begins moving as soon as the motor starts moving. In particular, when the motor reverses direction there is no "slack" to pick up before the output shaft starts to move. The slack is called backlash when talking about gears and motors and what have you.

It's important for robots to not have backlash, because as movements are repeated, each bit of backlash adds up into a potentially big cumulative error. It could end up with the robot operating outside of the intended design envelope, which might be a safety problem.

I don't know what the startups are doing.

tgflynn · on March 30, 2016

Wouldn't that only be a problem with open loop control though ? If you have an encoder and use feedback would backlash still be an issue ?

monknomo · on March 30, 2016

We're getting outside of my home hobbyist experience here, but I think you could guarantee the robot would be in a particular position, but the backlash might make it hard to say when the robot will get to the particular position. Using encoders on all the motors would require having inputs for each encoder, which can get complex.

My guess is you can go pretty far with janky parts if you don't run for long periods of time and also measure where they are.

jonnycowboy · on March 30, 2016

I think you mean backlash-free gearboxes (ie: cable driven gearboxes, harmonic drives or spring-loaded gearboxes that always apply a minimal but constant tension).

chombier · on March 29, 2016

It is difficult, but doable assuming Coulomb friction.

The two main issues are that it is computationally expensive, but also that your mechanical modeling has to closely match that of the actual robot (especially the contact model), otherwise I suspect the training data will be useless in the end.

So if you can afford an actual robot, it makes sense to do the training using it.

tgflynn · on March 29, 2016

How computationally expensive ? Are we talking supercomputer time to simulate the few seconds it takes to grab an object ? Advanced robots are expensive too and usually much harder to get access to then computational ressources (ie. AWS).

chombier · on March 29, 2016

It depends: rigid/non-rigid objects, stiffness for non-rigid objects, approximate/exact Coulomb model, spatial/ temporal resolutions, and solution precision.

On a typical desktop computer, that would probably range from real time for the fast/imprecise simulation, to maybe one day for a full-blown simulation.

But again, most roboticists will tell you there is a world between the simulation (even an accurate one) and the actual robot.

Animats · on March 29, 2016

Gazebo with Mike Sherman's physics engine might be good enough. DARPA paid to get a decent physics engine into Gazebo; the ones from games were never quite right.

Nican · on March 29, 2016

Which physics engine is that? ODE?

Animats · on March 29, 2016

Stanford (Simbody) is the new one.

The game engines don't do complex friction, as between gripper and target or foot and ground, very well.

bliti · on March 29, 2016

There are things you can't simulate (yet). In my experience it's beneficial to run real live testing to gather data about individual parts themselves. For example, I had a robot's navigation fail when it encountered a certain type of water container (one gallon type in a given color found in US supermarkets). Like kissing, you can't replace the real thing.

zellyn · on March 29, 2016

Yes, for some things. http://arxiv.org/pdf/1603.01312.pdf