Can confirm, it is fast. I've been struggling with a plane equation for a plane based on isotropic segments. 3 segments, 3 complex points - only 6 equations. Almost every CAS can solve the system per se fast, it's the simplification process that costs time.
On my machine, I managed to get a ~300KB long solution from SymPy in under a minute, and a comparable solution from Maxima in a few seconds. Symbolica found a ~84K long solution in about a second... in Colab. That's impressive.
Author of Symbolica here: Symbolica is used in physics calculations to do arithmetic on rational polynomials that are hundreds of megabytes long.
For my physics research I have worked with expressions that was just shy of a terabyte long and had > 100M terms. The way that works is that you stream terms from disk, perform manipulations on them and write them to disk again. Using a mergesort, terms that add up can be identified by sorting them to be adjacent.
This is fascinating. What is the real-world application of these polynomials? I mean, what technical-scientific problems can only be solved with such large objects? I love the thought of "if I don't solve this enormous polynomial, this solar panel won't have very good efficiency" or something like that.
These polynomials appear when computing Feynman diagrams, which are used to make predictions for the Large Hadron Collider. The collider can measure collisions with such astonishing precision (<1% error) that predictions of the same order are also needed. The more precise you want to be, the more terms in a series approximation of the mathematical description of the collision you need to compute. For example, I computed the fifth-order approximation of the QCD beta function, which governs how intense the strong force affects matter. This takes 5 days of symbolic manipulations (pattern matching, substitutions, rational polynomial arithmetic, etc) on 32 cores.
The large polynomials appear in the middle of the computation, often referred to as intermediate expression swell. This also happens when you do a Gaussian elimination or compute greatest common divisors: the final result will be small, but intermediately, the expressions can get large.
Is there a document/book you can recommend that includes a simplified example/tutorial of this computation process from beginning to end? (Or, what are the right keywords to search for such a thing?)
I'm looking for something like: here's the particle interaction we will work on, this is a very simple Feynman diagram, and here's the simplified data the LHC gave us about it, here's the resulting equation from which we'll derive a series, etc.
Not looking for how to program it, but actually for seeing the problem structure, and the solution design from beginning to end. (Familiar with high level physics concepts, and comfortable with any math).
You can even make a toy-problem with large polynomials yourself.
Just make a Taylor approximation of some e.g. transcendental function like sin(x) around some point, and use more and more terms to get a higher precision.
I used polynomials and specifically special symmetric polynomials extensively during my Phd research (mathematical physics).
For instance symmetric polynomials (x_1^2 + x_2^2 + ...) can describe the state of a system of particles where exchanging any 2 particles does not change the system at all (exchange x_1 and x_2 in the previous expression, same polynomial = same system & state).
If you have a system of equation that you can solve exactly with special polynomials, you can approximate real world system governed by similar equations by using your special polynomials as a starting point and adding a correction to your solution.
There's so much to say about polynomials, but I'll leave you with a basic example that shows how multiplying infinite polynomials allow you to count the number of ways there are to hand you back your change at the till:
The basic units of change are 0.01$, 0.05$, 0.10$, 0.25$, 1$, 5$, 10$, ...
For example, you can always give exact change back with only 0.01$.
So the set of all the different amount of change you can produce with 0.01$ is given by the exponents in the following
There are thus 4 ways of handing back exactly 10 cents.
So for any amount, you take the following:
product(c in (0.01, 0.05, 0.10, 0.25,...) (sum_n (q^c)^n)
= sum_(a >= 0.01) [Number Of Way To Give Back Change for amount `a`] * q^(a)
So that would be the "generating series" of the number of ways to hand back change.
In this context, polynomials bridge the gaps between combinatorics and analytical computation.
I forgot. You can do SICP over the web without installing an Scheme interpreter.
The web has a complete enough Scheme interpreter to do and eval the whole book exercises.
While I agree with this attitude towards obscure acronyms, SICP is widely known among those who voluntary enter a site called Hacker News, and it's easily searchable too.
I can remember the days! I suppose as problems grow in size, it's not too surprising that older methods of coping with what seemed like lots of data are still applicable.
Indeed, it's hard to me to imagine such an expression without any of the following qualities:
- immediately converges to zero
- immediately heads to infinity
- is dominated by only a few terms (thus obviating the needs for the other X million terms)
It sounds like the problem is that it’s dominated by a few terms, but finding them is tricky. If you have 1.0 x + 2.0 x - 3.0 x - 5/4 x + x - x + x - x … (etc for a few GB) and then quadratic terms, it’ll take work to figure out whether the dominant term is linear or quadratic. Cancelling out the potentially important intermediate terms (especially in higher dimensions) sounds like a mess.
I am leading an academic research group with a site-wide Symbolica license.
Because Symbolica is in early development and the work of a single person for now, one great benefit for us is the dedicated attention we receive.
Any bug or feature we're particularly interested in gets immediate attention.
And despite the product being licensed, Symbolica's source-available nature gives us full confidence in the author's long-term intent.
We therefore trust that accepting Symbolica as a dependency of the framework we are building will never constitute a point of failure.
Supporting Symbolica's nascent effort and having a direct line of contact with its single author also means it is easy to discuss particular arrangements specific to our use case, such as end-user license requirements.
He's a one-man army of a one-year old software, so it makes sense that he would focus on core features that are most relevant to his existing customers and capable of attracting new ones.
If this was important to you or someone else who would sign up for a Symbolica license, I'm sure he would improve on this.
A streamlined C++ API is comparatively easier to achieve than beating state-of-the-art efficiency on key CAS algorithms...
We should encourage this original approach of licensed source available software, otherwise you end up with either black-boxy Mathematica-like software of xzlib disasters and nothing in between.
Yeah, I agree that it should be advertised differently, stressing the obvious WIP aspect.
I guess the idea is just to show that the seed for a proper C/C++ API is there, and ready to be developed further for whenever a customer requires it.
Out of curiosity, what is your use case?
I feel like this is missing some comparisons to Sympy and Mathematica. Probably it's more polished than Sympy, but if I'm going to closed-software software why would I pick this over MMA?
The most obvious comparison is in syntax. Easier syntax means less user time spent. In Symbolica, an example from the documentation
from symbolica import Expression
x = Expression.var('x')
f = Expression.fun('f')
e = x + f(x) + 5
e = e.replace_all(x, 6)
print(e)
In Mathematica, the same thing is
e = x + f[x] + 5
e /. x -> 6
What I find striking is that, we know that Python is the most popular language because you don't have to declare your variables or your function headers, and can just start writing the logic of your computation. Then, people keep inventing these mathematics programs in Python, that need you to declare your variables before using them.
Should note the `/.` is syntactic sugar (or special syntax) for `ReplaceAll[]` function. And basically anything can be written as M-expressions. For any expression this form can be retrieved with `FullForm[expr]` (may have to use `Hold[expr]`). So for your example with no special syntax:
Of course, to refer to a specific variable of the CAS in Python, you need to bind it to a Python variable. There is no way around this.
In the end most of the time of the user is not spent on syntax, but on designing algorithms. With Mathematica, you are locked into the Mathematica ecosystem. With Symbolica / sympy and other CASs that are libraries, you can use all the familiar data structures of the language you are coding in.
It depends on what you want to do. It should be much faster than Sympy and it is often much faster than Mathematica, for example if you do pattern matching or are manipulating rational polynomials.
The idea of Symbolica is that you can use it as a library inside your existing projects, which is harder to do with MM since it is its own ecosystem.
Moreover, for personal projects Symbolica is free.
Despite Symbolica being licensed, its source code is still available which clearly shows the author's intent.
It may be important in various contexts, e.g. security or satisfying certain grant requirements in academia, or whenever custom minor modifications are necessary.
cynically, one possibility is the intent to be able to sue open-source cas authors and distributors who had access to it and then wrote substantially similar code. charitably, it's intended to allow users to resolve whatever problems they have with the system, by debugging and, if necessary, patching it
I've actually been a developer of FORM since my PhD (which I did with the author of FORM, Jos Vermaseren)! In Symbolica, I am taking the best features of FORM, while making it easier to use.
I've never got to solve a simple equation under Form. Everything else looks easy,
as most of the answers are either tautological or factor arrangements without stating
explicit units or magnitudes; but I didn't know how to get numerical answers.
It is an interesting combination of open source yet paid.
But the price is "Contact us for a quote"
Can't they at least give a ballpark number of what the price is? Matlab and Mathematica are about $1000-3000 per year for commercial license, they aren't hiding their price.
Author here: thanks for the feedback. I will add a price range soon. The reason why it is not there in the first place is because I am thinking about site-wide licenses, which are often quote-based (also for Mathematica). For universities site-wide licenses are common, but for industry maybe less so. For reference, a university site-wide license is about 6000 EUR per year at the moment.
This isn't open source software, it's source available. I don't see any claim on that page that it's open source, the page only says "source available" instead.
Apart from programmatic UX, I'm unsure what the use-case or competitive advantages. Perhaps it could use a comparison matrix along with well-known CAS systems, i.e., Giac (Erable's successor), Mathematica, and Maple. Mathematica and Maple are among the powerful CASes. See also: [0]. *
I used HP 48 (Erable) extensively in college for physics, math, chemistry, and electrical engineering courses. It could simplify systems of equations, symbolically integrate, and solve ordinary and partial diff eqs. Not quite Mathematica or Maple, but Erable was portable before laptops or smartphones were widely available.
If you want to push a CAS, try deriving a general time-dependent for the Schrödinger wave equation (PDF) for hydrogen (1s1). ;)
It works by allowing the project to deal with honest people and ignore assholes efficiently.
And for a small team (in this case, one person), getting rid of assholes who just want to rip the project off with minimal effort is a big win.
After all, if you just want to play around with the system you are a hobbyist and can get it for free anyway. Why go to the extra effort of stealing the code?
And if you actually are creating value with the system, don't you think you might want some support?
If your answer in either case is that you would rather cheat the author then there is nothing tangible stopping you, but you just defined who you are and selected yourself out of the class of people the maintainer has to deal with.
I don't think it is about people being assholes, but just going the way of least resistance. So if it is easier to buy a license key and use it, than to just remove the license check, then that's what people will do. Same the other way around.
I read somewhere here that the free version is restricted (single thread only?). Also the mechanism for offline work sounds like a pain, so if multithreaded and offline is your use case, the case for getting a license key just got murky.
I am wondering about this because it just seems hard to sell source available software. Everybody is complaining about price and license keys. But if you put the software behind an SaaS, and go the subscription way, you avoid all of this trouble. Of course, in this case, that just doesn't seem possible.
Personally, I'd like to sell my software source available and without an SaaS indirection, but I am wondering how to get paid in that scenario.
License keys make the software worse, so there just seems to be a logical problem here in the case of source available software: Why pay for worse software? In case of closed-source software, no such logical problem exists: license keys make the software infinitely better, because otherwise it wouldn't work at all.
I forgot the main point of my reply. If this is about filtering for honest people, why require a license key at all, especially one that needs to be checked online? Honest people will pay for the license even if no license key is necessary, right?
> a state-of-the-art commercial computer algebra system while being as open as possible
> Get a license key for offline use, generated from a licensed Symbolica session. The key will remain valid for 24 hours.
lmao
This is a really impressive project, but the practically always-on requirement - including an internet connection to go offline - is absolutely hilarious.
Also, the requirement for me to "get a quote" to get an idea of how much it'll cost me? I'm not going to even bother trying the free trial. I can tell you how much MatLab cost me for my personal projects - $149.
Now I'm not saying MatLab and this are an apples to apples comparison (not even close), but y'all are chopping yourselves off at the knees.
Author of Symbolica here: for your personal projects the cost is 0, as you'd be a hobbyist.
Symbolica is developed by only one person, so forgive me if I can't get every aspect of the project right on the first try (especially the non-technical part) :) I will see about removing the online on start-up part. It's essentially the only anti-piracy step that I have.
At the moment there is no fixed cost for use in industry, since the price will vary based on the amount of users and other factors.
I don't mean to belittle your great achievement with license annoyances so forgive me if this sounds harsh, but I won't look twice at any software with an online requirement, if internet connectivity is not an integral part of the software itself. I do realise open source can be a dark, unthankful place that will not pay bills for most developers so this is not about the cost itself.
Unfortunately, pirates almost always have it easier than paying customers, since license checks will probably be removed. Though this is hearsay (anyone?), there was "legal piracy" (no, there's no such thing :P) in the license-via-printer-port-dongle-era, since the dongles sometimes failed at the worst of times (e.g. live recording at a studio). So some users/studios bought a license to be legally covered on paper, but used a pirated version.
I live in a "well-connected" country, but a traveling a few kilometers in the wrong direction leaves me in complete radio silence, and sometimes work (academia) has required me to stay at such places for a couple of days. Since you know Rust, I can (and have had the need to) add `--offline` when building, with required crates locally cached. I know many researchers with the need to bring high-tech equipment + software to remote locations for work over many weeks, sometimes months. The need to find a city just to do a software license check would be crazy.
Unfortunately, I also don't have a good solution, other than trying to trust your customers. An online check once, at install, I can possibly live with, even if I've had to do reinstalls from local files in the field as well...
He's already said the fee for you is 0. It seems he's mostly going for the university side wide licenses, I don't think issues such as yours are very important for him.
Most people at a university who want to run a computation for a few days will do it on a computer that's permanently connected to the net, I think. Not their laptop.
Give people a ballpark single user price, and a note that discounts can be given for multiple users etc. If you don’t quote any price the assumption will be, “if you have to ask you can’t afford it.”
That's probably how it is. Given that he sells only academic licenses, if he sells 3 site licenses for €6000 each, that's a salary (a low one, but you can actually live from it).
If he advertises a single-user license for €200, let's say, then each site has to have at least 30 users to make the same kind of money. Unlikely each current site has that many users, so it doesn't make sense to offer single-user licenses if there is the danger that sites will jump to that.
What’s the use case overlap between symbolic and huge? I might expect that with most huge expressions (they talk about disk space being the limit, not memory) you’d be satisfied with a numerical approximation. Does someone need to simplify a huge expression down to a small symbolic expression?
One use case is in theoretical physics, where expressions that take up about a terabyte are generated when computing Feynman diagrams, but only in the intermediate stages. By the end of a two-month computation you get a result similar to equation 4.5 in this paper: https://arxiv.org/pdf/1707.01044
An exact answer is desired since the final result reveals some structure that can be studied, and because it is very hard to get a numerical result due to the occurrence of spurious poles. For example, evaluating
(1-x)/x - 1/x
numerically is challenging around x=0 even though symbolically it can be made regular.
Evaluating the expression naively near zero you're going to get wild numerical errors, but if you do the symbolic manipulation you're going to notice that it's just equal to -1.
Edit: e.g. consider this interaction I just had with the python interpreter
I've looked through the rust code on GitHub, and it's amazingly readable, like one of the good lisp systems. Esp. the matcher, eg. in the normalize or solver.
Just much faster.
while i do not believe you have any obligation to license it under an open-source license, i think people who use it under a proprietary license are being foolish. a cas is an essential tool for day-to-day work, and becoming dependent on a proprietary software vendor for that usually ends badly
so in a sense that's a 'sage feature i'd like to see'
I feel similarly. While I myself might gamble on learning a tool like this, I'd be reluctant to recommend it to students unless I was very confident that it would always be there when they reached for it. Source-available does scratch that itch more or less, but it creates some reluctance.
Cool stuff! One comment on the landing page: the slideshow view thingy of examples is fast, and when I click one to read it, it moves to the next one so fast again, so I can’t really look at the demos.
I also can’t quickly find any details about what field/algebra/whatever these cover. If I make a Expression.var, what structure is it assumed to have?
Thanks for the feedback! I will see if I can slow down the slideshow. You can see more demos in the docs and the live notebook.
An expression is very general. Each variable and function is commutative and should hold for the complex numbers. For functions you can set if they are symmetric, linear or antisymmetric. Non-commutativity can be emulated by adding an extra argument that determines the ordering or by giving them different names.
If you use the Polynomial class instead of Expression, you can choose the field yourself. This is especially the clear in Rust where most structures are generic over the field. For example:
Thanks for the quick answer! I had only looked at the python API docs, so I missed that. I’ve just been wrapping up learning a bunch of Lie theory and will keep symbolica in mind next time I want to play with some big nasty expressions!
> Non-commutativity can be emulated by adding an extra argument that determines the ordering or by giving them different names.
Would you mind giving an example? (or just tell me it’s in the docs and i’ll look deeper)
Yes, that is super annoying. I was trying to read the examples and they kept changing out from under me. To the developer: don't slow down the animation of the gallery. Make the examples static.
Yeaaaaah, that is a weird license, and if I needed non-free software for anything, I'd go with Mathematica. Happy to play with it once it's open source and the maintainer has given up on trying to sell it.
I learned about Monte Carlo integration when looking through the codebase, though, so that was cool!
Author here: you can generate the Groebner basis in lex order or compute a resultant, but it doesn't automatically backsubstitute the system at the moment. This is mostly because I haven't committed yet to a format to describe roots of polynomials. It's on the todo list though :)
I actually had an idea to do an OEM license check in build.rs that compiles out all license checks so that this version can be included in customers' software ^^
Ooh, you were right. Either something has changed since my initial viewing (unlikely) or I have incorrectly assumed that it is in build.rs because all other examples don't contain `LicenseManager` (much more likely).
Maxima is a system for the manipulation of symbolic and numerical expressions, including differentiation, integration, Taylor series, Laplace transforms, ordinary differential equations, systems of linear equations, polynomials, sets, lists, vectors, matrices and tensors.
Most of these features are included in Symbolica in some capacity (ODE solving is missing) and there are CAS features Symbolica has and that Maxima has not (like advanced pattern matching), even though it is only a year old.
It is not just a matter of whether a feature is there, it needs to be usable in practice. You cannot use Maxima to do computation with large rational polynomials as this paper shows:
Symbolica is 10 times faster and uses 60 times less memory than Maxima on a medium-sized problem. The larger sized problem does not run with Maxima. Note that this is tested with an older version of Symbolica, the latest version is even faster.
> Symbolica is 10 times faster and uses 60 times less memory than Maxima on a medium-sized problem. The larger sized problem does not run with Maxima
Hah...change "Maxima" to "Macsyma" and "Symbolica" to "SMP" and that is close to a slide I remember seeing around 1980 in a presentation by Wolfram and Cole explaining why they had developed a new computer algebra system, SMP, instead of using one of the already available systems.
I don't remember the exact numbers on their slide, but same situation. Existing systems could handle the medium problems that came up in their physics research but were slow and used a lot of memory, and could not do the large problems.
Thanks for the reply! If you don't mind asking more: what do you use for polynomial GCD? Apparently it is quite fast, do you use some standard algorithm implemented well or is there some kind of algorithmic improvement? Is it described somewhere, say a paper or a book? Have you tried to benchmark it against NTL, for example? Thanks again!
> Sometimes a Symbolica license key is needed on a machine that is not connected to the internet. For this purpose, an offline key that is valid for 24 hours can be generated from a valid license key on a machine that is connected to the internet.
Hi. Nice work, I'm really glad to see a new CAS and this is an area that is close to my heart. The problem I have isn't so much that it's difficult to jump through the licensing hoops (although anything is more difficult that not using licenses), it is that you put restrictions on the user of the software that your competitors don't have.
I make closed-source commercial software at my dayjob, I understand you want to get paid. The trouble is that a featureful CAS either takes decades to build by hand (e.g. Axiom) it relies on a huge amount of FOSS libraries (e.g. Sage), or both (e.g. Mathematica).
Making me "phone home" to check I'm up to date on my subscription to some algebra manipulation algorithms is a non-starter, especially when even a PhD student has to pay. (I know you have a single-threaded free version, but that didn't make much sense when your USP is speed.)
> In the case of halted development, existing and new users can buy a perpetual license for the most recent version.
Promises, promises. I don't doubt the author's intentions, but then later the owner(s) get bought out and the new owner doesn't honor that promise. We've seen this kind of stuff way too often in the past.
> Students and hobbyists can use Symbolica for free. Once Symbolica is being used professionally, either in an academic context or in a commercial company, a license is required. The preferred license model for academic use is an institution-wide license.
while i do not believe the author has any obligation to license it under an open-source license, i think people who use it under a proprietary license are being foolish. a cas is an essential tool for day-to-day work, and becoming dependent on a proprietary software vendor for that usually ends badly
> Students and hobbyists can use Symbolica for free. Once Symbolica is being used professionally, either in an academic context or in a commercial company, a license is required. The preferred license model for academic use is an institution-wide license.
On my machine, I managed to get a ~300KB long solution from SymPy in under a minute, and a comparable solution from Maxima in a few seconds. Symbolica found a ~84K long solution in about a second... in Colab. That's impressive.