I am bringing this up because all the different paths can be calculated in parallel. Usually without a matrix library in C++, you will end up having to write low level CUDA / OpenCL / SSE code to get the performance.
Here is what an intel i7 CPU does using our library + OpenCL:
Thanks pavansky for sharing. Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.
Currently I use BSM; however, live performance is poor in extracting implied volatility from NBBO of option spreads as I use a naive approach to iterate and converge on the IV. I assume that's what people use CUDA and GPU for to calculate the greeks and pricing of the whole US option chain series in realtime.
I'm curious if using CUDA or GPU rented from aws would make performance faster, in a) extracting IV from market ticks, b) figuring out break-even and hedging/adjustment points in different complex option spreads, c) calculating expected payoff in term's of Kelly's for a spread or potential adjustment.
> Can you tell me what is the performance and model accuracy trade off between Monte-Carlo option pricing vs. BSM vs. Binomial vs. Heston.
I don't work in finance anymore, but Heston is not a pricing method per say. In the linked article, the OP assumes that the price of a bitcoin follows a lognormal distribution (i.e: dS_t = mudt + sigmadW_t), while Heston assumes a different diffusion process (dS_t = mudt + digma_tdW_t, and d(sigma_t) = kappa(thau - sigma_t)dt+ zetasqrt(sigma_t) dW'_t) [0].
I would say that Monte Carlo is in general slower and less accurate than the other methods you mentionned, but more flexible in the sense that it allows more complicated diffusion models or payoffs (e.g: option on the max or min of a basket of underlyings).
I recently added a monte carlo based option pricing example. which can be seen here: https://github.com/arrayfire/arrayfire_examples/blob/master/...
I am bringing this up because all the different paths can be calculated in parallel. Usually without a matrix library in C++, you will end up having to write low level CUDA / OpenCL / SSE code to get the performance.
Here is what an intel i7 CPU does using our library + OpenCL:
----
[1]: Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 7875 MB, OpenCL Version: 1.2 Compute Device: [1]
Time for 100000 paths - vanilla method: 65.815 ms, barrier method: 68.271 ms
----
BTW, we are working on porting the library to other languages as well. Right now, it is supported for R, Java and Fortran.