I authored this package as I needed to generate confidence intervals for time series data without using SciPy. Sharing this here, as this could be a useful package for others :)
Included Models:
- Linear regression
- Ridge regression
- Linear spline
- Isotonic regression
- Bin regression
- Cubic spline
- Natural cubic spline
- Exponential moving average
- Kernel functions (Gaussian, KNN, Weighted average)
Thanks for releasing this. I was just wondering if something like this exists. I've worked on a few projects where scipy was banned due to the large dependencies it pulls in.
I did look into stats models, but this was a large library with more than I needed. I did not look into R.. Are there any lightweight alternatives in this language?
Not all as far as I know. For ridge regression you will want to install glmnet, for example. mgcv which is usually shipped with R provides access to a few common fast kernels which seem to be the ones python programmers are familiar with.
Am I guess correct to assume that you could not use sklearn/scikit as well because it depends on SciPy? (I am under the impression that sklearn/scikit is the dominant library for an implementation of these algorithms.)
That is correct. I had to generate confidence intervals on over 8000 univariate data sets using very small VMs, so I needed to limit large dependancies as much as I could. This package was the result of this!
Well, SciPy depends heavily on NumPy, which as a CPython-specific extension won't run on other Python interpreters in general. Although for example there is ulab for MicroPython which replicates part of NumPy, and PyPy has a compatibility layer for CPython extensions.
Edit: well, Regessio itself also depends on NumPy, but might be able to run on top of ulab whereas I really doubt SciPy would.
Responding that there's something out there called ulab doesn't really answer my question, which was: where does op's requirement to not use scipy come from.
".. I had to generate confidence intervals on over 8000 univariate data sets using very small VMs, so I needed to limit large dependancies as much as I could. This package was the result of this!"
Based on the comments in this thread, it may be worth trying to make this package not dependant on Numpy as well?
Nice job -- I have examples using statsmodels for similar (not time series) data [1,2]. I typically use this for EDA before regression modelling, so dependencies in that scenario are not a big deal. But I might weep if someone told me no scipy in production.
It looks pretty good, but I’d love to see you make better use of the routines already in numpy. In particular, I see you are solving the OLS problem using direct inversion when you already have QR and SVD available to just call. There are other simple things that can do a lot for your results, centering, scaling, etc, too. I guess it works well enough for small well behaved problems as is though.
In theory they should be useful when know that the underlying process should be monotone. I think in the past I found them more sensitive to noise and wondered if monotone approximation might not be better than monotone interpolation for that reason.
I added class comments to each class which explain the high level implementation details. Clamping is supported with natural cubic splines, and this is done by taking the slopes at each endpoint.
Monotonicity is currently not supported (for cubic splines).
In addition the already mentioned GSL, there are two more I know of and which I have used for spline interpolation in C/C++: John Burkardt's spline library[1] and Netlib[2]. The pppack library from Netlib is Fortran code so you have to write a wrapper when using it from C++.
Included Models: - Linear regression - Ridge regression - Linear spline - Isotonic regression - Bin regression - Cubic spline - Natural cubic spline - Exponential moving average - Kernel functions (Gaussian, KNN, Weighted average)