Julia vs. Python: Monte Carlo Simulations of Bitcoin Options

jwmerrill · on March 22, 2014

Here's a gist that optimizes the author's Julia code to give a 4x speedup over the unvectorized code, and handily beat the vectorized codes too:

https://gist.github.com/jwmerrill/9715447

In general, carefully written devectorized Julia will be faster than equivalent vectorized code because it is possible to avoid allocating containers at intermediate stages of the computation. This is surprising for people coming from e.g. Matlab, R, or Numpy, because vectorized code is often faster in those environments. In those languages, direct scalar code is slow, and vectorization is about expressing the computation in a form that will call out to fast C operations, but the C still pays a price to allocate intermediates. In Julia, the JIT is able to emit fast machine code for scalar operations, so you don't need to express your code in a form that calls out to C. There are JIT compilers for Matlab and Python too, but Julia's JIT is very high quality and everyone uses it all the time.

The main changes I made here were hoisting computations of constants out of the loop, and preallocating a 2D Float64 array to hold the results. These were worth roughly 2x each. I've chosen the storage format so that it is accessed in column order. Both optimizations have their own sections in the Julia Manual's performance tips section:

http://julia.readthedocs.org/en/latest/manual/performance-ti...

shoo · on March 23, 2014

your point at the end is good - if we're only interested in the final price, then we should be able to do this in constant memory, independent of the number of timesteps and samples.

edit: on my machine the running time seems dominated by the generation of random numbers, rather than anything else.

perhaps another optimisation is to elimate the `exp` from the inner loop, and just do addition. if you're only interested in the final price, you can do one `exp` at the end.

i.e. change

    paths[1, i] = st = s0
    for j in 2:M
        st *= exp(a + b*randn())
        paths[j, i] = st
    end

to this

    paths[1, i] = log_st = log(s0)
    for j in 2:M
        log_st += (a + b*randn())
        paths[j, i] = log_st
    end

Retric · on March 23, 2014

One solution to this is use a circular data structure filled with random numbers. Then dedicate a thread to updating that ring with new random numbers. Meanwhile you pull random numbers as fast as you want. It's not good for encryption, but for simulation it's fine. http://en.wikipedia.org/wiki/Circular_buffer

Another approach is to keep reseeding a fast RNG with a slow but higher quality RNG.

StefanKarpinski · on March 23, 2014

You don't have to do anything quite that complicated. You can just pre-generate an array of Gaussian random samples and do table lookup in the inner loop:

    paths = zeros(Float64, M, I)
    rands = zeros(Float64, M-1, 1)

    for i in 1:I
        randn!(rands)
        paths[1, i] = st = s0
        for j in 2:M
            st *= exp(a + b*rands[j-1])
            paths[j, i] = st
        end
    end

That shaves off another 35% percent of runtime on my machine.

jwmerrill · on March 23, 2014

Nice! In case it isn't obvious, randn! takes an existing vector and fills it up with new random normal variates. This allows computing a bunch of random variates at once, which is a little more efficient, without having to allocate a new vector every time through the loop.

Another option would be to generate all M-1xI random numbers ahead of time, but that would be less memory efficient.

StefanKarpinski · on March 23, 2014

I tried that too, but it wasn't any faster – you've already gotten most of the gains to be had by lifting the random number generation out of the innermost loop. So I went with the option that uses less memory.

shoo · on March 23, 2014

i tried the approach of generating them ahead of time and then loading them from disk. it only uses 80mb of memory for this example.

the difference i see between generating numbers on-the-fly using boost and reading them from disk is 6.41s vs 0.32s, ie a 20x speedup, for 10 repetitions of 100,000 paths over 100 timesteps.

edit: here's the code - https://gist.github.com/fcostin/3ed3512ad167adf9c785

jwmerrill · on March 23, 2014

I would not recommend that circular buffer idea for quantitative calculations.

Consider averaging n numbers from a buffer that you've filled with m random variates with m < n: the standard deviation of your average will decrease like 1/sqrt(m), not 1/sqrt(n), so you might as well have just averaged the independent random numbers you produced instead of reusing some of them.

Maybe the circular buffer idea would be useful in non-quantitative contexts, though, e.g. for simulating something in a game or a movie, where all you need is the appearance of randomness.

StefanKarpinski · on March 23, 2014

Agreed, that circular buffer thing is bound to do one of two things based on whether the RNG thread or the work thread is faster:

a) work slower than RNG – do too much work, wasting CPU;

b) work faster than RNG – produce very poor randomness.

Neither of these situations is good, and dealing with two threads in the first place is a bit of a complication. If you're going to do that, it's better to synchronize the two threads so that the work thread blocks when it needs more randomness than the RNG thread has produced and vice versa.

pavanky · on March 22, 2014

We have a matrix library in C++ that works on top of CUDA and OpenCL.

I recently added a monte carlo based option pricing example. which can be seen here: https://github.com/arrayfire/arrayfire_examples/blob/master/...

I am bringing this up because all the different paths can be calculated in parallel. Usually without a matrix library in C++, you will end up having to write low level CUDA / OpenCL / SSE code to get the performance.

Here is what an intel i7 CPU does using our library + OpenCL:

----

[1]: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 7875 MB, OpenCL Version: 1.2 Compute Device: [1]

Time for 100000 paths - vanilla method: 65.815 ms, barrier method: 68.271 ms

----

BTW, we are working on porting the library to other languages as well. Right now, it is supported for R, Java and Fortran.

noname123 · on March 22, 2014

Thanks pavansky for sharing. Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.

Currently I use BSM; however, live performance is poor in extracting implied volatility from NBBO of option spreads as I use a naive approach to iterate and converge on the IV. I assume that's what people use CUDA and GPU for to calculate the greeks and pricing of the whole US option chain series in realtime.

I'm curious if using CUDA or GPU rented from aws would make performance faster, in a) extracting IV from market ticks, b) figuring out break-even and hedging/adjustment points in different complex option spreads, c) calculating expected payoff in term's of Kelly's for a spread or potential adjustment.

Gl with your endeavor.

S4M · on March 23, 2014

> Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.

I don't work in finance anymore, but Heston is not a pricing method per say. In the linked article, the OP assumes that the price of a bitcoin follows a lognormal distribution (i.e: dS_t = mudt + sigmadW_t), while Heston assumes a different diffusion process (dS_t = mudt + digma_tdW_t, and d(sigma_t) = kappa(thau - sigma_t)dt+ zetasqrt(sigma_t) dW'_t) [0].

I would say that Monte Carlo is in general slower and less accurate than the other methods you mentionned, but more flexible in the sense that it allows more complicated diffusion models or payoffs (e.g: option on the max or min of a basket of underlyings).

[0]: https://en.wikipedia.org/wiki/Heston_model

pavanky · on March 22, 2014

Heh, I am a HPC programmer, I am not from the financial sector :)

If you can give me links or pseudo code for these algorithms, I can look it up (may be even make a simple example and put it on our github page!).

ajtulloch · on March 22, 2014

I wrote http://tullo.ch/articles/python-vs-julia/ a few weeks ago, which compared Cythonized implementations of particular ML algorithms in Python against Julia implementations.

There were some useful discussions on HN at https://news.ycombinator.com/item?id=7383121

tanvach · on March 22, 2014

I have written something similar comparing Julia, Python, Cython, Matlab and C++ for European options Monte Carlo simulation.

http://pithawat.com/post/monte-carlo-simulation

http://pithawat.com/post/monte-carlo-in-cython

http://pithawat.com/post/first-brush-with-julia

phpnode · on March 22, 2014

FAO amurmann - you made a slight faux pas in HN etiquette[0] and were sentenced to over 1000 days of purgatory without your knowledge. Almost no one can see your posts and no one at all can reply to them. For literally years. Sorry about that!

[0] https://news.ycombinator.com/item?id=2385595

spikels · on March 22, 2014

Anyone know why the two Bitcoin Call Option Valuations get such different results: Python 361.203 and Julia 374.496? Monte Carlo will always some error but that's a pretty big difference. Can't confirm which more correct right now.

sirsar · on March 23, 2014

This bothered me as well. That's a 3.6% error, which is far from negligible. Was the main problem too few trials?

rawrjustin · on March 23, 2014

the problem seems to be too few trials. 10x the number of paths and the calculations for both languages converge on around 361ish

lolball · on March 23, 2014

FWIW, there exists a package for calling Pandas from Julia with almost no overhead. It lets you set and get elements of Pandas timeseries as fast as native Julia arrays: https://github.com/malmaud/Pandas.jl (I'm the author).

Thus Julia can be seen as a tool that leverages Pandas, not as a rival. In fact, most Python libraries are very easy to wrap in Julia because of the PyCall package (https://github.com/stevengj/PyCall.jl).

AlwaysBCoding · on March 22, 2014

I've been using R a bunch lately, but would love to use either Python or Julia instead.

Does either Python or Julia have an IDE as good as RStudio? Where it's really easy to write code and get visualizations side by side?

gajomi · on March 22, 2014

The IPython notebook is not quite a full blown IDE, but does support precisely this feature. Notably the notebook can support a Julia backend. For certain types of work one might prefer this kind of thing to an IDE.

philsheard · on March 22, 2014

I've not personally used RStudio but the IPython Notebook is great for interactive code with inline visualisations. I believe there's also a Julia kernel for IPython too so you could have either there.

I'd recommend Python as it has a lot of feature parity with R these days (especially with the Pandas library for fast dataframe operations).

Myrmornis · on March 23, 2014

I'm a daily python user, and mostly ex-R user. Agree data-frames in pandas help. But python really doesn't have much feature parity with R when it comes to statistical data analysis.

aeroevan · on March 23, 2014

As another (mostly) ex-R user I have been trying to move to statsmodels, scikit-learn, etc. with pandas.

I still fire up R/RStudio with ggplot2 for more complicated plots though...

dagw · on March 23, 2014

Spyder (https://code.google.com/p/spyderlib/) is python's answer to Rstudio and the Matlab IDE

rawrjustin · on March 22, 2014

Not sure about python, but i've been playing around with Julia Studio: http://forio.com/products/julia-studio/

Haven't used it long enough to form an opinion on it but check it out, I think it's what you're looking for

jmpeax · on March 23, 2014

I'd like to see the non-vectorized Python version with a @jit annotation. (That is, from the Numba package: http://numba.pydata.org/)

apendleton · on March 22, 2014

I'd suggest trying the non-vectorized version on PyPy. This code seems very JIT-able, and CPython is nowhere near the state of the art for performance with this kind of code.

emidln · on March 22, 2014

On my rMBP, I did a quick back of the envelope comparison using PyPy 2.2.1 vs CPython 2.7.6 on OS X Mavericks and came up with (best of 3 runs):

    $ time pypy testcode.py

    real    0m1.529s
    user    0m1.402s
    sys     0m0.121s

    $ time python testcode.py                                                                                            

    real    0m13.880s
    user    0m13.635s
    sys     0m0.241s

    $ time python testcode_vectorized.py 

    real    0m0.746s
    user    0m0.551s
    sys     0m0.139s

phyalow · on March 23, 2014

Hmmm, practically speaking pricing European Options using simulations is not recommended when you can use a beautiful close formed solution (along the lines of a modified Garman–Kohlhagen model)! But for other styles american, quantos, barriers etc this lattice simulation is great. Curious as to why the author only focused on Europeans!

shoo · on March 23, 2014

i would also find it very interesting to see distinct mathematical approaches for solving these kinds of problems compared.

are finite-element models or some other approximation of the probability distribution used in practice as an alternative to simulation when closed-form solutions are not available?

phyalow · on March 30, 2014

Hi, apologies for the slow reply. Yes predominately Monte-carlo simulations are used. If you are interested in the field an easy entry point is Paul Wilmotts "Introduction to Quantitative Finance".

wheaties · on March 22, 2014

And I expect to see large improvements in Pandas as it is an active and vibrant project. This will be hard to replicate in Julia, not for lack of trying by the community, but rather due to the head start of the Python community.

To be honest, if Julia wanted to really captivate an audience it would need to come out with a package or library which did something needed by the community in a way that Python didn't. Then you could conceivably see larger adoption rates.

andrewcooke · on March 22, 2014

Wut? Python is going to get better because it has a head start? What's the logic in that? Surely the incumbent is less likely to improve than the challenger because it's already been polished. IOW it's usually the new guy that has room to improve.

wheaties · on March 22, 2014

I never said Python is going to be better. I merely stated that with a head start in terms of functionality you get from Pandas that Python will have the upper hand.

Also, if history is any guide, the "new" and the "hot" tend not to beat an incumbent just because they're new and hot. Having the most room to improve is not a blessing. You have to improve at a rate that is faster and greater than the incumbent can improve. And if they can't improve at a faster rate, libraries and ecosystems win by offering something compelling that is either hard or impossible to replicate in the incumbent.

StefanKarpinski · on March 23, 2014

Your argument assumes that improvement is equally easy in both systems, which is not the case. The Python ecosystem has a substantial head start, but Julia makes library development much easier – I would hazzard a 10x factor, but it's really hard to quantify. The result is that developer effort is amplified significantly, making it plausible for the newer system to "catch up" faster than you'd expect.

I don't really like to look at it so competitively, however. Open source is notoriously not a zero-sum game – everyone wins when there are more good options, whether they are in Python or Julia.

tantalor · on March 23, 2014

Is this a performance-sensitive exercise? Is the naive python implementation good enough? How frequently would you need this, and under what kind of load?

I take it the point is that Julia is generally faster without the optimization step, but the article seemed so heavily focused on this particular problem that I have to question whether it matters in this case.

jwmerrill · on March 23, 2014

Monte Carlo simulation in general is definitely performance sensitive. If you're averaging over realizations of a process, the accuracy of your estimate typically depends on the square root of the number samples you take. For a given time budget, every factor s improvement you make to the speed of the calculation earns you sqrt(s) more accuracy.

In a high frequency trading situation, if you can estimate a "correct" price faster than your competitor can, then you get to make the trade, and they don't. Doesn't get much more performance sensitive than that.

amurmann · on March 22, 2014

I just started learning R and am still in the phase of learning a new language where I feel insecure that I picked the tight language to learn. Can anyone say anything about how R relates to the examples we have seen here for Python and Julia? Also Julia seems yo be the new hotness in this space, is that right? Should I skip R and learn Julia instead?

edwinksl · on March 22, 2014

Impressive.

kartman · on March 22, 2014

For practice, you can test out your bitcoin option pricing skills in real life at our site, feedback appreciated :) https://www.cointures.com