Hacker News new | past | comments | ask | show | jobs | submit login
Syberia – Make R a production-ready language for deployable machine learning (syberia.io)
100 points by michaelsbradley on June 14, 2017 | hide | past | favorite | 11 comments



We've tried using docker to put R applications into production. I'm no docker expert, and I found it pretty easy to do.

A couple of pointers: - If you have a dependency on a library like blast that takes a long time to compile, you can make a base docker image that already has that library installed. That makes iteration quicker, as you'll only need to build that base image once.

- If you put a web interface on the image using shiny, then it is straightforward to deploy it for your users to interact with.


We usually dockerize outputs of our Syberia projects as well. We have several dozen internal packages consumed by the root projects. With many contributors working on constituent packages daily, we've found frequent changes to packages can slow down a docker-only workflow. So far, using a base docker image with lockbox catching us up to the most recent daily and hourly changes has been working well.


This looks like it solves a big pain point in R. I hope that more tools like this crop up. R has a nice set of libraries but it lacks in data engineering at this point.


What does this do that R on Azure Machine Learning doesn't? Not snark, genuine question.


One of the lead developers on the Azure suite wrote a blog post that might explain some of the differences: http://blog.revolutionanalytics.com/2017/06/syberia.html. A rough analogy is that Rails is to AWS/Heroku like Syberia is to Azure. You can replace the underlying components in your project with calls to Azure services, but a large developer team may prefer to work in a unified codebase over a set of UI tools.


How is this different from the caret package? Using if(interactive()) {} as the main function and including the extremely well documented caret package seems to accomplish much of the same thing that Syberia does unless I'm missing something.


Author here. Philosophically, Syberia can be thought of as an extension of caret with hopefully clearer abstractions in large projects. In particular, all of the packages supported by caret can and will eventually be parametrized into the modeling engine.

This is the 0.6 release in which we make the scaffold available. Over time, we will fill in the pieces that are currently provided by other tools like caret or Bernd Bischl's mlr.


You know, the real problem with R for production work is dealing with munged package dependencies. Writing a makefile with hand curated package deps is something I really wish I never had to do again.


Looks like robertk wrote `lockbox` specifically to solve this.

Dependency management still sucks in R. Too many options, each a little different: Packrat, checkpoint, now lockbox. I have friends using Docker specifically to encapsulate their R package dependencies, too.


disclosure: I work with robertk at Avant, where we've developed syberia and lockbox specifically to have reproducible model builds and to turn R model objects into API servers in a deterministic way. Lockbox is closely modelled after ruby's bundler, and syberia is like rails for data science. I've you've ever written a ruby-on-rails or a django app you'll feel like at home using lockbox. I encourage you to give it a try, and let us know in github issues if you are having any trouble using it


I'd love to hear more about how it compares to other workflows. Is it worth the learning curve? There's a lot going on here, and a lot of overlap with more popular and widely used tools in the R world.

My small team is R-first and a big fan of the tidyverse, and we're exploring using Docker in tandem with tools like plumber (R package for building simple API's around R code) and pachyderm (language-independent, containerized data pipelining with straightforward cloud integration) for different projects. Does Syberia fit in nicely with these tools, or aim to replace them with its own set of conventions and philosophies?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: