StarLisp is basically Common Lisp with parallel-map and parallel-reduce and a sp...

miloshasan · on Jan 8, 2011

> It would make a lot of sense to have something like StarLisp or APL for CUDA right now. Trying to do data parallelism in C is about the most brain-damaged idea ever. I don't know if anyone is working or interested in that, though.

You may well be right, but I challenge you to prove it. I myself am very interested in whether this would work. I have spent many sleepless nights programming GPU algorithms, and wondered if ideas from other programming languages and paradigms (especially functional programming) could be applied to make it easier and more elegant.

The nested data-parallelism approach does look promising on paper, and many people are well aware of the theoretical possibility of this working on GPUs (including people at Nvidia itself), but so far nobody succeeded to make it practical.

So, doing data parallelism in CUDA may be brain-damaged, but the flexibility and performance it delivers is mind-blowing. If you can achieve something comparable using a higher-level functional approach, I will be among your first users, and I'll tell everyone I know.

MultiLisp (AFAIK) is traditional task-parallelism rather than data-parallelism, and would not work well on a GPU.

sedachv · on Jan 8, 2011

I'm not talking about data parallelism being bad, but C being a bad data parallel language.

miloshasan · on Jan 9, 2011

Of course... read my reply again please.

I am not disagreeing with you, I am saying that I would love to see a better data-parallel language than C (or CUDA to be precise), but it does not exist right now (at least not a practical one that would run on a real GPU with reasonable performance and allow non-trivial nested and hierarchical algorithms).

srean · on Jan 8, 2011

If I remember correctly NESL programs are compiled into vcode a special bytecode that handled vector instructions particularly well. If vcode is ported/specialized to the commodity multicore machines and SSE instruction set, one could profitably run NESL program on contemporary machines. I do not know if there is any such ongoing initiative.

It might be the case that C source for vcode is still available so with a good optimizing compiler one could do the same. It wont quite be the same as a hand optimized assembly implementation of vcode. But perhaps still a satisfactory implementation.