Typical solution is to use an interpreter (turning data into code). How often is...

DavidGruzman · on March 24, 2012

I would expect about order of magnitude speed difference between interpreted language and optimized machine code. I can recall case when reducing analytical request from 10 hours to 10 minutes changes the qualities of research company was doing - since analysts where able to do more queries selecting better dataset for the report. Order of magnitude response time might also be go/no go for interactive analytics. In case of clouds where we can assume infinite resources it can mean 1/10 of cost. For the private clusters - it can be differenace in buying 10 machines (something common in hadoop's word) and buying 100 machines - something very few groups can get.

camuel · on March 24, 2012

In most cases within distributed computing you need either move the data or move the code. It could also be something in between with the case of "request" and "query", if they simple they are surely data, if they are complex they look more like code.

Ok, now if we agree that something should be moved lets think what makes more sense to move dataset or to move code?

And my take is "it depends". Dataset size plays a role... security plays a role and etc.. think what if dataset for another reasons than size cannot be moved. In all these cases you need to move code.

In the case the code is untrusted (malicious suspect or just being buggy) you will want some kind of sandbox.