Hacker News new | past | comments | ask | show | jobs | submit login

You can plug any compute kernel you want into spark, that's not a pro or con of R.

Column stores are standard in any analytics pipeline today. They make up Python's Pandas, R's dplyr, and Java's DataFrame. How or why does R stand out for 'massive amounts of data'?

R does not have have meaningful out of core compute offerings that compare with something like Dask.

R does not at all have cluster compute offerings that compare to Dask Distributed.

If you want to know what real performance looks like, check out Python's cudf which will shortly fully match the Pandas api. That raytracing example you linked would run at interactive rates with cudf, I really don't see any basis for perf arguments in R's favour, and 'massive data' arguments are laughable here.

Whatever advantages R has, perf or scalability are definitely not amongst them.




You are arguing for Python and speed in the same breath? If you want portable speed, you better "warm up a chair" and master Fortran.

Bonus: modern Fortran is a joy to develop in, far more fun than Python. And you get to compile to machine code, either for a processor or a GPU.


> That raytracing example you linked would run at interactive rates with cudf, I really don't see any basis for perf arguments in R's favour, and 'massive data' arguments are laughable here.

I don't see how the "GPU DataFrames" provided in cuDF would enhance a raytracer in any way.


You don’t see how a gpu accelerated numeric array would speed up ray tracing?


The bottlenecks for raytracing are primarily in scene traversal/intersection testing--which does not benefit from a GPU-accelerated array structure.



No, I'm not. I'm well aware of CUDA being used to accelerate raytracing. That cannot be accomplished by simply providing a GPU-accelerated data frame structure, as cuDF provides.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: