In general, carefully written devectorized Julia will be faster than equivalent vectorized code because it is possible to avoid allocating containers at intermediate stages of the computation. This is surprising for people coming from e.g. Matlab, R, or Numpy, because vectorized code is often faster in those environments. In those languages, direct scalar code is slow, and vectorization is about expressing the computation in a form that will call out to fast C operations, but the C still pays a price to allocate intermediates. In Julia, the JIT is able to emit fast machine code for scalar operations, so you don't need to express your code in a form that calls out to C. There are JIT compilers for Matlab and Python too, but Julia's JIT is very high quality and everyone uses it all the time.
The main changes I made here were hoisting computations of constants out of the loop, and preallocating a 2D Float64 array to hold the results. These were worth roughly 2x each. I've chosen the storage format so that it is accessed in column order. Both optimizations have their own sections in the Julia Manual's performance tips section:
your point at the end is good - if we're only interested in the final price, then we should be able to do this in constant memory, independent of the number of timesteps and samples.
edit: on my machine the running time seems dominated by the generation of random numbers, rather than anything else.
perhaps another optimisation is to elimate the `exp` from the inner loop, and just do addition. if you're only interested in the final price, you can do one `exp` at the end.
i.e. change
paths[1, i] = st = s0
for j in 2:M
st *= exp(a + b*randn())
paths[j, i] = st
end
to this
paths[1, i] = log_st = log(s0)
for j in 2:M
log_st += (a + b*randn())
paths[j, i] = log_st
end
One solution to this is use a circular data structure filled with random numbers. Then dedicate a thread to updating that ring with new random numbers. Meanwhile you pull random numbers as fast as you want. It's not good for encryption, but for simulation it's fine. http://en.wikipedia.org/wiki/Circular_buffer
Another approach is to keep reseeding a fast RNG with a slow but higher quality RNG.
You don't have to do anything quite that complicated. You can just pre-generate an array of Gaussian random samples and do table lookup in the inner loop:
paths = zeros(Float64, M, I)
rands = zeros(Float64, M-1, 1)
for i in 1:I
randn!(rands)
paths[1, i] = st = s0
for j in 2:M
st *= exp(a + b*rands[j-1])
paths[j, i] = st
end
end
That shaves off another 35% percent of runtime on my machine.
Nice! In case it isn't obvious, randn! takes an existing vector and fills it up with new random normal variates. This allows computing a bunch of random variates at once, which is a little more efficient, without having to allocate a new vector every time through the loop.
Another option would be to generate all M-1xI random numbers ahead of time, but that would be less memory efficient.
I tried that too, but it wasn't any faster – you've already gotten most of the gains to be had by lifting the random number generation out of the innermost loop. So I went with the option that uses less memory.
i tried the approach of generating them ahead of time and then loading them from disk. it only uses 80mb of memory for this example.
the difference i see between generating numbers on-the-fly using boost and reading them from disk is 6.41s vs 0.32s, ie a 20x speedup, for 10 repetitions of 100,000 paths over 100 timesteps.
I would not recommend that circular buffer idea for quantitative calculations.
Consider averaging n numbers from a buffer that you've filled with m random variates with m < n: the standard deviation of your average will decrease like 1/sqrt(m), not 1/sqrt(n), so you might as well have just averaged the independent random numbers you produced instead of reusing some of them.
Maybe the circular buffer idea would be useful in non-quantitative contexts, though, e.g. for simulating something in a game or a movie, where all you need is the appearance of randomness.
Agreed, that circular buffer thing is bound to do one of two things based on whether the RNG thread or the work thread is faster:
a) work slower than RNG – do too much work, wasting CPU;
b) work faster than RNG – produce very poor randomness.
Neither of these situations is good, and dealing with two threads in the first place is a bit of a complication. If you're going to do that, it's better to synchronize the two threads so that the work thread blocks when it needs more randomness than the RNG thread has produced and vice versa.
I am bringing this up because all the different paths can be calculated in parallel. Usually without a matrix library in C++, you will end up having to write low level CUDA / OpenCL / SSE code to get the performance.
Here is what an intel i7 CPU does using our library + OpenCL:
Thanks pavansky for sharing. Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.
Currently I use BSM; however, live performance is poor in extracting implied volatility from NBBO of option spreads as I use a naive approach to iterate and converge on the IV. I assume that's what people use CUDA and GPU for to calculate the greeks and pricing of the whole US option chain series in realtime.
I'm curious if using CUDA or GPU rented from aws would make performance faster, in a) extracting IV from market ticks, b) figuring out break-even and hedging/adjustment points in different complex option spreads, c) calculating expected payoff in term's of Kelly's for a spread or potential adjustment.
> Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.
I don't work in finance anymore, but Heston is not a pricing method per say. In the linked article, the OP assumes that the price of a bitcoin follows a lognormal distribution (i.e: dS_t = mudt + sigmadW_t), while Heston assumes a different diffusion process (dS_t = mudt + digma_tdW_t, and d(sigma_t) = kappa(thau - sigma_t)dt+ zetasqrt(sigma_t) dW'_t) [0].
I would say that Monte Carlo is in general slower and less accurate than the other methods you mentionned, but more flexible in the sense that it allows more complicated diffusion models or payoffs (e.g: option on the max or min of a basket of underlyings).
I wrote http://tullo.ch/articles/python-vs-julia/ a few weeks ago, which compared Cythonized implementations of particular ML algorithms in Python against Julia implementations.
FAO amurmann - you made a slight faux pas in HN etiquette[0] and were sentenced to over 1000 days of purgatory without your knowledge. Almost no one can see your posts and no one at all can reply to them. For literally years. Sorry about that!
Anyone know why the two Bitcoin Call Option Valuations get such different results: Python 361.203 and Julia 374.496? Monte Carlo will always some error but that's a pretty big difference. Can't confirm which more correct right now.
FWIW, there exists a package for calling Pandas from Julia with almost no overhead. It lets you set and get elements of Pandas timeseries as fast as native Julia arrays: https://github.com/malmaud/Pandas.jl (I'm the author).
Thus Julia can be seen as a tool that leverages Pandas, not as a rival. In fact, most Python libraries are very easy to wrap in Julia because of the PyCall package (https://github.com/stevengj/PyCall.jl).
The IPython notebook is not quite a full blown IDE, but does support precisely this feature. Notably the notebook can support a Julia backend. For certain types of work one might prefer this kind of thing to an IDE.
I've not personally used RStudio but the IPython Notebook is great for interactive code with inline visualisations. I believe there's also a Julia kernel for IPython too so you could have either there.
I'd recommend Python as it has a lot of feature parity with R these days (especially with the Pandas library for fast dataframe operations).
I'm a daily python user, and mostly ex-R user. Agree data-frames in pandas help. But python really doesn't have much feature parity with R when it comes to statistical data analysis.
I'd suggest trying the non-vectorized version on PyPy. This code seems very JIT-able, and CPython is nowhere near the state of the art for performance with this kind of code.
On my rMBP, I did a quick back of the envelope comparison using PyPy 2.2.1 vs CPython 2.7.6 on OS X Mavericks and came up with (best of 3 runs):
$ time pypy testcode.py
real 0m1.529s
user 0m1.402s
sys 0m0.121s
$ time python testcode.py
real 0m13.880s
user 0m13.635s
sys 0m0.241s
$ time python testcode_vectorized.py
real 0m0.746s
user 0m0.551s
sys 0m0.139s
Hmmm, practically speaking pricing European Options using simulations is not recommended when you can use a beautiful close formed solution (along the lines of a modified Garman–Kohlhagen model)! But for other styles american, quantos, barriers etc this lattice simulation is great. Curious as to why the author only focused on Europeans!
i would also find it very interesting to see distinct mathematical approaches for solving these kinds of problems compared.
are finite-element models or some other approximation of the probability distribution used in practice as an alternative to simulation when closed-form solutions are not available?
Hi, apologies for the slow reply. Yes predominately Monte-carlo simulations are used. If you are interested in the field an easy entry point is Paul Wilmotts "Introduction to Quantitative Finance".
And I expect to see large improvements in Pandas as it is an active and vibrant project. This will be hard to replicate in Julia, not for lack of trying by the community, but rather due to the head start of the Python community.
To be honest, if Julia wanted to really captivate an audience it would need to come out with a package or library which did something needed by the community in a way that Python didn't. Then you could conceivably see larger adoption rates.
Wut? Python is going to get better because it has a head start? What's the logic in that? Surely the incumbent is less likely to improve than the challenger because it's already been polished. IOW it's usually the new guy that has room to improve.
I never said Python is going to be better. I merely stated that with a head start in terms of functionality you get from Pandas that Python will have the upper hand.
Also, if history is any guide, the "new" and the "hot" tend not to beat an incumbent just because they're new and hot. Having the most room to improve is not a blessing. You have to improve at a rate that is faster and greater than the incumbent can improve. And if they can't improve at a faster rate, libraries and ecosystems win by offering something compelling that is either hard or impossible to replicate in the incumbent.
Your argument assumes that improvement is equally easy in both systems, which is not the case. The Python ecosystem has a substantial head start, but Julia makes library development much easier – I would hazzard a 10x factor, but it's really hard to quantify. The result is that developer effort is amplified significantly, making it plausible for the newer system to "catch up" faster than you'd expect.
I don't really like to look at it so competitively, however. Open source is notoriously not a zero-sum game – everyone wins when there are more good options, whether they are in Python or Julia.
Is this a performance-sensitive exercise? Is the naive python implementation good enough? How frequently would you need this, and under what kind of load?
I take it the point is that Julia is generally faster without the optimization step, but the article seemed so heavily focused on this particular problem that I have to question whether it matters in this case.
Monte Carlo simulation in general is definitely performance sensitive. If you're averaging over realizations of a process, the accuracy of your estimate typically depends on the square root of the number samples you take. For a given time budget, every factor s improvement you make to the speed of the calculation earns you sqrt(s) more accuracy.
In a high frequency trading situation, if you can estimate a "correct" price faster than your competitor can, then you get to make the trade, and they don't. Doesn't get much more performance sensitive than that.
I just started learning R and am still in the phase of learning a new language where I feel insecure that I picked the tight language to learn. Can anyone say anything about how R relates to the examples we have seen here for Python and Julia? Also Julia seems yo be the new hotness in this space, is that right? Should I skip R and learn Julia instead?
https://gist.github.com/jwmerrill/9715447
In general, carefully written devectorized Julia will be faster than equivalent vectorized code because it is possible to avoid allocating containers at intermediate stages of the computation. This is surprising for people coming from e.g. Matlab, R, or Numpy, because vectorized code is often faster in those environments. In those languages, direct scalar code is slow, and vectorization is about expressing the computation in a form that will call out to fast C operations, but the C still pays a price to allocate intermediates. In Julia, the JIT is able to emit fast machine code for scalar operations, so you don't need to express your code in a form that calls out to C. There are JIT compilers for Matlab and Python too, but Julia's JIT is very high quality and everyone uses it all the time.
The main changes I made here were hoisting computations of constants out of the loop, and preallocating a 2D Float64 array to hold the results. These were worth roughly 2x each. I've chosen the storage format so that it is accessed in column order. Both optimizations have their own sections in the Julia Manual's performance tips section:
http://julia.readthedocs.org/en/latest/manual/performance-ti...