Hacker News new | past | comments | ask | show | jobs | submit login
React, Flux, RethinkDB and SageMathCloud – Summer 2015 Update (sagemath.blogspot.com)
154 points by williamstein on Aug 31, 2015 | hide | past | favorite | 62 comments



We are currently completely rewriting the entire frontend of SMC using React.js, Flux, and RethinkDB ...

... I'm living on credit cards

Are you sure a full rewrite of the front-end is a priority at this point?

I've been there. When your business is going down, it's easy to believe that a technically beautiful v2.0 is going to save the day. It didn't do anything for my business though, and in retrospect I should have spent the year on something else.


Agreed. The author states:

> There are much better approaches now, which are critical to dramatically improving the user experience with SMC, and also growing the developer base.

I seriously urge William to, at the very least, survey developers before making this conclusion. I would expect something like "we spoke to 500 developers and at least 60% of them would use the application if it had a more robust dev stack". I find it amazing that extremely analytical and rational people (including myself!) tend to make generalized conclusions without informed data.


It can be easy to justify uninformed decisions with vision. Did Steve Jobs ask people what they want? How about Henry Ford, who supposedly said that "they'd just ask for faster horses"?

I don't love data myself either. It feels better to be right by divine vision, although it sure doesn't happen that often.


Having data is fine, but even the richest dataset is wide open to misinterpretation. This is even more true when the data is a snapshot of a compression of human opinion and feelings.

The interpretation of information collected from a survey such as '60% of developers I asked say..', even assuming it's statistically valid and reasonably accurate (which is not trivial to achieve), still involves a lot of assumption, intuition, and frankly guesswork about the real problems and facts of the situation.

It might provide slightly more of a hint towards the facts than just blindly following a strategy based on vision alone, and there are many success stories of data-driven approaches that vastly outperform the predictions of human 'experts', but implicitly trusting that this will always be the case is probably unwise.

I'm for data-driven decision making in general, just saying that it's never a magic bullet and is often not even an advantage (could indeed be the opposite) if done naively.


This holds back so many projects - how expensive hosting web applications eventually becomes. I'm hoping for more decentralized options to eventually emerge (like IPFS, etc.), but in the meantime, I'd probably recommend to this guy to perhaps offer a paid app version, or just charge for accounts on his site. Or else try a kickstarter or indiegogo campaign.


How expensive would an application like this be? I don't know much about this, but I'd like to learn.

What's an algorithm for figuring out a pretty good guess of any random application's hosting costs? Also, is there a way to figure out how large the expenses could become over time? Is there some way to relate number of users to cost?

There's no upper bound for how much money one could spend, but let's use the midpoint between "extremely frugal" and "money isn't really a concern."


Hosting on GCE costs about $1700/month right now, and at this moment we have $490/month in recurring revenue (we have 56 subscribers to various plans). I've put much work into making SMC more efficient, in order to bring the hosting price down, but there are limits. The reason it costs this much are: (1) there are often about 500 users signed in, every user is using at least one Linux account, and what users do is often very computationally and memory intensive (mathematics, number crunching, etc.), (2) I snapshot and backup all files both to Google Cloud Storage and also copy backups offsite. Doing offsite backups mainly costs bandwidth -- I spent about $20 in the last 3 days on downloading offsite backups of user data (to a USB drive on my desk). (3) In addition to compute nodes, there are database and web servers, which are redundant so that two can go down and things still work; this is very important since teachers often give lectures from SageMathCloud or run computers labs, so downtime is very bad. (4) I also snapshot all the disks images regular, which costs more, but reduces the chances of data loss. I care that users don't lose their data in case of a disaster (hackers or lightning striking Google four times), which just makes things cost more.


How important is your tool to your users?

    teachers often give lectures from SageMathCloud
    or run computers labs
OK, sounds important.

    I care that users don't lose their data in case
    of a disaster (hackers or lightning striking
    Google four times), which just makes things cost
    more.
And it sounds like you care. So, how much do you charge for such an important tool?

    we have $490/month in recurring revenue (we have
    56 subscribers to various plans)
$8.75/month. Try tacking on an extra zero to all of your plans. Or, better yet, tack on an extra zero and ALSO let your customers decide whether or not they care about things like backups.

I don't want to sound like an asshole, but your business is never going to succeed if you keep going down this path. And to be clear: I want to see you succeed.

Here are a few things by patio11 you should go read right now:

http://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

https://training.kalzumeus.com/newsletters/archive/saas_pric...

http://www.kalzumeus.com/2012/08/13/doubling-saas-revenue/


At ShareLaTeX (https://www.sharelatex.com), our hosting costs are around $1500/month (can probably be doubled once you add in backups and other supporting services). This is for a similar service (LaTeX is just subset of what SageMathCloud does, but a resource heavy subset). However, ShareLaTeX handles orders of magnitude more traffic as far as I can tell. One of the big factors in a service like this is being able to get the cost-per-user down low enough that it's a viable business model given that a student/academic is not going to pay more than about $10/month and most won't pay at all.

One of the big wins for us has been using Docker to isolate projects. Sure, each project is resource heavy when run/compiled/executed, but if you have lots of users, they're probably not all resource heavy at the same time. The more lightweight the virtualisation/containers, the more they can share resources. It sounds like maybe each user is getting to hold on to too many resources that they aren't using, and so it's costing an order of magnitude more than if they could share all the resources perfectly?

I'd be happy to chat more about this stuff (almost all of the ShareLaTeX code is open source as well, except for the enterprisy stuff). We've also got a new project called DataJoy for Python and R (https://www.getdatajoy.com) which has similar scaling challenges that we've been working on.


The typical usage pattern we have is somebody interactively using a SageMath worksheet over the course of an hour or two. Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation. Also, people will often open many worksheets, which spawn numerous processes. We use fork for Sage processes to keep down resource usage (maximizes shared memory). Each project is not in its own VM; instead we use cgroups extensively (similar technology that Docker uses under the hood) to control resource usage. All the CPU/memory of the free computers is typically maxed out, and being shared (controlled by cgroups) fairly between users. cgroups is awesome technology.


> Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation.

1. I thought sage used a ton of RAM partly because of the huge amount of statically linked libraries. I see you said you're using fork to maximize shared memory. Have you tried KSM (Kernel Samepage Merging)?

2. Have you looked at zram? Certain matrices and such may be easily compressible.


Thanks -- these are both great ideas; I've opened a ticket: https://github.com/sagemathinc/smc/issues/93


From what you describe, shouldn't pricing be higher if users are costing you so much?


What do you mean 56 subscribers, 500 people are using the app concurrently?

Are the rest unpaid subscriptions? Or do the subscribers have numerous users?


At this moment there are 585 people connected to SMC (a bit higher than usual due to Hacker New effect), and most are using it for free. We only introduced a fully automated paid plan about 10 days ago, and many of our sign-ups have been in the last week. Paying customers get enhanced support, the ability to upgrade project quotas, and can ask (it's not yet automated) to have projects moved to members-only servers. The members-only servers have an order of magnitude less users on them. There were also until recently obstructions to charging users due to IP and other issues involving University of Washington (my employer).


It would be great to get these paid users writing a line for SMC expenses into their NSF grants. For instance, you could charge $400 for a year's worth of supported SMC for a group (PI and her grad students, postdocs), or $200 for an REU group SMC. It may be easier to get money by asking for a rather larger amount up front, that people plan into their grants or get departments to pay for, rather than asking for $9/month, which I'd feel compelled to pay personally because the hassle of getting reimbursed $9/month is more than the 3 lattes it costs me.


We now offer $79/year and $499/year plans, which would fit perfectly the model you describe. We only started offering them a few days ago due to demand.


You REALLY need to be charging more that the ~$9 a month you are charging.

~$20 seems reasonable. Cheap even. You can always discount when you get to a scale that sustains.


> This holds back so many projects - how expensive hosting web applications eventually becomes.

Hosting web applications costs a fraction of what it did 5-10 years ago, both in terms of straight costs as well as operational expenses.


The interesting thing, to me, is not the overall cost of hosting, but the new ways that we can host.

I've got a current project that I'm working on. I have no idea how successful it will be and I'm not interested in putting a ton of my own money at risk to see if it will be successful. But using a model where I have zero of my own servers (Lambda, API Gateway, Static hosting) means my fixed costs are under $20/mo and my variable costs scale with usage. All I need to do is ensure that my per-user monetization is higher than my per-user cost and my application scales up without any effort on my part.

In the era of hosted servers, capacity planning and scale out took a ton of my time and energy on projects like this. And while I might end up paying a bit less in the long run if I followed the same methodology today, not having to worry about that kind of stuff and being able to focus solely on the application is really nice.


I think what you said and what the parent comment said are both entirely true.


This sort of thing would make a great Sandstorm app, for what it's worth. See https://news.ycombinator.com/item?id=10147774 for more about that.


As an outsider, here is my perspective on a number of things you could do:

Segment your users into plans. Do some user research on the archetypes and Jobs To Be Done that different people are using your product for. Segment features according to this and charge accordingly. For example all of the high availability, redundant web servers and snapshotting should go only on the paid plans. You can make the reasonable assumption that if people aren't paying for your service then they don't value their data very highly, so why should you? Some rough plans I can imagine would be: Undergraduate, Graduate, Professor. These people have different needs, desires, and fears; cater to them.

This might be a hard one, but dropping the free plan will make you cashflow positive immediately. For people who can't/won't pay, you can provide good instructions on how to set up the VM if people don't want to pay. Give people 30 days notice, and give them benefits for signing up, e.g. 20% discount for life of their account.

Look at partnering with teaching institutions that are using your software. Offer them a discount on a class purchase of accounts, or sell them support running the VM on their own infrastructure (perhaps integrating with Moodle or their auth system for an additional fee?).

Look at how people get value from SMC, your costs scale linearly with usage, so it may be good to scale your prices too. One option could be to replace the free tier with a pay-as-you-go tier where people pre-pay for x hours/month, and higher tiers get unlimited usage.

On the homepage, remove the section about grants received as it sounds like that is no longer the case.

Make the homepage a lot nicer and present the benefits of SMC much more clearly.

As others have said, it's not clear at all what plans are available or what differentiates them until you put in a payment card. This is not a compelling prospect.


Many thanks for your feedback and for carefully looking over the site! I really appreciate it.


The goal appears to be to appeal to developers to entice them to contribute; not so much the end-users - although one could argue whether this is the correct business goal?


I also found that further development on SageMathCloud was becoming too slow and frustrating for me, even though I knew the system extremely well (certain important things turned out to be very hard using the original approach). In order to implement the features a lot of users demanded, it was necessary to use a better approach. It is starting to pay off.


This sounds depressingly familiar to me too!


x1000 this.


In 2009 during my Masters I was taking a course in elliptic curves and I was having a tough time getting to grips with them. I discovered Sage and was suddenly (after an eight-hour compile time!) able to easily create and manipulate these objects in an interactive environment. It was mind-blowing! I ended up doing quite well in that module and I'd say that's largely due to William Stein's work.

I'm no longer in academia and haven't used Sage for years but it's great to see how far this project has come. I hope it gets the funding and development it needs.


Thanks. The primary motivation for creating SageMathCloud is to solve the problem "(after an eight-hour compile time!)" for many people.


It's a exciting project, suggestions:

A. Turn your billing model inside out.

- Create an AWS account for each user, and start them in an AWS free tier[1] micro instance. Let them decide when to upgrade to a bigger box, how much to spend.

- Put your images on AWS marketplace[2]. Then customers pay Amazon, and Amazon will pay you your markup.

B. Sample projects! Make some demo videos to give the general idea, and when a person creates a new account, let them try out some sample projects, just to get a feel for the system. After I register my only option is "create new project". I'm also in Seattle, happy to discuss, buy you lunch. Contact info in my profile.

[1] https://aws.amazon.com/free/

[2] https://aws.amazon.com/marketplace/management/tour/


> Put your images on AWS marketplace[2].

Since the project is 100% open source, I'm not sure this would make much money. Do you have experience with marketplace images that are 100% open source?

Your idea about automated subscribing customers is interesting and likely to generate revenue... but wouldn't they pay Amazon instead of SageMath, Inc., which wouldn't support further development?


I would keep your onboarding/signup process intact. I'm just suggesting Marketplace as a back-end billing mechanism where premium users pay hourly usage charges, their cards are billed by Amazon, and Amazon sends you a check for your percentage markup. (you define the percentage markup over standard EC2 prices, and that markup goes to you).

They're still your users. The biggest benefit is Amazon paying for the free instances for the free users. And Amazon WANTs this. They WANT people to get hooked on using cloud services.

You might even have a mode where the expensive node is only allocated when a big computation kicks off. Hours are way cheaper than months.


This is another great idea. AWS is not so expensive for the types of calculations SMC is doing, and distributing computation costs to users is very reasonable. Another item people might start writing into their grants: x dollars in computation time on SMC, with part going to AWS (oh well) and part to you/Sage.


Another option is Amazon Devpay. He could let them decide how much to spend and Amazon takes care of billing + his fee for the service.

https://aws.amazon.com/devpay/


I used Sage in the math department at Bard College with Prof. Greg Landweber. Honestly it felt a little half baked but when it was good, it was amazing (and my impression was that it was completely free, no option to pay). My armchair critique:

- The concept/vision are great and you could be a serious competitor to Mathematica if you got a product person on board. It has all the power of mathematica, a more popular programming language (python), and a good user-acquisition strategy (professors are already using it for college courses... you just need to get the students to take it with them when they leave the course).

- You need a product person / designer because there are serious UX gaps and the feature-set feels really scattered. For instance, the fact that I had no idea you could pay is a big red flag.

- Figure out what your goal is. Are you making Mathematica or Linux? My guess is it's more towards the former and you should apply to YC.

Edit: Thanks for making Sage, BTW!


If the goal was to get more developers, I would just go with say.. Rails (or node, or even Java!) and pgsql. Why use all these hipster technologies? But who says it's because of the technology stack you're using? Maybe people are not motivated enough NOT because of the tech stack but because that's how it's supposed to be. That's a big and risky assumption thinking that you'll get more developers if you switch to react/flux/rethinkDB. Also, why is this the top priority when you're seriously considering quitting the project because you have no money? You should be either: 1. out there raising investment; 2. Get more users to use your product so you can raise investment; 3. Actually generate revenue so you don't need investment.


I started a project the other day, started writing the basis for an isomorphic/universal/whatever it is now app (all the app starters out there are miserable), spent two days on it...eventually realized that trying it is going to slow me the hell down and make sure I quit my project before I even start.

You can fix legacy later when it's time. And let's be honest, there are proven, good web applications out there using <that stack you're used to> that are in reality no worse than node/whatever, just not as hip.

I love react for the pure frontend and I'm still using it there for many pages (except the ones I want to just be static), but I'm sticking to my python roots for a backend and I'm 100x more productive.


Hey check this out:

https://github.com/tylerkahn/isomorphic-es6-react-martyjs-to...

This may help you out. I created this after also struggling to find a good isomorphic example app.


Two huge things I think they're all missing:

1. Some sort of backend DB integration. It's very hard to figure out how people are interacting with NodeJS databases other than MongoDB these days, frontend alone isn't the issue. Sequelize + postgres for example would be great to get an app up and running I expect, including things like connection pooling. (Bookshelf seems okay too, but I don't like it purely because it seems to automatically infer attributes from the database, which makes coding harder as you'll need to hit the REPL or schema to know what's coming out). In addition, webpack is easy to get running. Making it sane from a module size perspective (dead code elimination...etc) is harder.

2. Test coverage. Unit + integration. There's a million test runners, mock frameworks, etc etc out there. It's not trivial to get something you know is sane working (more so from integration side). Not to mention new-style imports entirely break most frameworks like rewire in my experience, so I wouldn't encourage their usage.

And, well, for me I KNOW those answers for python (I haven't done that much nodejs backend work since ~2013. I've used the node ecosystem exclusively for frontend since, a lot), so it's a lot more simple for me to get started.


The model I was shooting for was having your API be an independent HTTP rest service (whatever framework/language you want) with the nodejs server consuming it to render the initial view and then the client also consuming it in order to carry on with execution.

Martyjs (and the marry-express module) makes this very easy to do in that you can write your application without having to consider the context in which it is running (server or client).

I think the backend DB integration is a separate concern from how to build isomorphic js apps.

In terms of test coverage I don't have much experience testing js apps so I can't speak to the pain points there.


Yeah, I think that's a reasonable model where it will make sense since the client/backend can actually be more or less identical, since they'd both just be consuming REST.

But that's a pretty painful way to start a project as well. /shrug


Check this out for frontend in python: https://github.com/zoofIO/flexx


This is a really cool project. Have you tried to get funds from a university? I don't see why no universities would be interested in that.

Anyway, I hope things will get financially better for you.


Related: The NSF did provide funding which I used to support SageMathCloud during the last two years. When I applied for the funding, I hadn't started SMC, so the NSF grants weren't actually for SMC.


Hi, William; Couple-few things: * I've been using the SMC a decent amount, and it occurs to me that I've never seen an 'ask' for contributions or a subscription.

* On that note, I just went to sign up for a subscription because it's ghastly that I haven't. The 'upgrades' tab just says that I should 'Sign up for a subscription in the billing tab', and the billing tab only asks for a credit card number. The subscription page should probably pitch a few benefits of subscription, with an easy click from a given subscription to the billing page.

* I've used SMC for writing a couple collaborative papers, and have found it fantastic for that. Checkpointed latex with easy access to Sage code is great for collaboration, far better than subversion or (shudder) dropbox. I wonder if there might be good ways to build that userbase a bit: Perhaps some interesting collaboration with the arxiv? Transparent checks for arxiv compatibility as you're building your document? Link 'public code' and data sets in an SMC project on the arxiv side?


I wish I could upvote this multiple times! Providing "public code" and data for arXiv papers would be fabulous.


Thanks. SMC projects are private by default, but it does let users easily selectively publish any file or directory tree publicly. For example: https://cloud.sagemath.com/projects/4a5f0542-5873-4eed-a85c-...


What I'm trying to get at, though, is that it may be worthwhile to try to make some high-profile connections which can increase the userbase. Building the paying userbase is the important part right now.

Making it easier/better for people to get their papers on the arxiv would be fantastic, and bring in more users. If the arxiv is bought into the effort, it gets SMC more visibility, bringing in more users than just making the features available. Now, the arxiv is extremely conservative by design, so this isn't an easy partnership to make, but it's a pretty obvious one to try in terms of getting a bigger SMC userbase.

But really, the important part is trying to bring in as large a flock of paying users as possible, for a minimum amount of effort. Maybe this is undergraduate students, maybe it's individual researchers (like me) signing up for subscriptions, and maybe it's university math/cs/physics departments signing up for big subscriptions. I know a couple departments that are running their own servers which might be able to save a decent chunk of money by moving to SMC, for example. But it requires some outreach work to get in touch with them, and some bargaining to get them to switch models.


Great ideas -- thanks for spelling them out in more detail. Do you know anybody who runs arxiv? I wouldn't know how to get started with making such a connection (if you think of anything, feel free to email wstein@sagemath.com). However, it's something I had not considered before, and if an opportunities arise, I'll be more ready now (such opportunities aren't unlikely; e.g., I always run a Sage booth at the huge joint math meetings, which can lead to such things).

The market and strategy you describe at the bottom is exactly what I've been pursuing, and I hope we turn a corner with it soon due to the new academic year.


I'm not sure how much of a priority you think this is, but the one main problem I have with SageMathCloud is the lack of good documentation. Mathematica, as a counterexample, has comprehensive, searchable and easily accessible (just hit F1) docs. I humbly submit that including a way to browse and search documentation - not just Sage-specific, but also for every library it includes - would go a long way towards making SMC more usable.


Sage is an amazing project and framework. I think, the current SMC front-end works as a product and time might be better spent on business development and trying to get paying customers, or rethinking the subscription model.

As as stop gap measure to slow your burn rate, you should start limiting the resource consumption of your accounts until your subscription revenues at least equals your costs.


I have to agree with pavlov and say that while I sympathize with your situation, I can't say I'm surprised. You fell victim to one of the classic blunders - The most famous of which is "never get involved in a land war in Asia" - but only slightly less well-known is this: You have decided to rewrite everything from scratch. http://www.joelonsoftware.com/articles/fog0000000069.html


What's the rationale? Who are the target users? I go to the site and I have no idea what exactly this does that's better than the alternatives. It sounds like its trying to do everything.


The target users are academics who collaboratively use mathematical software like SageMath (http://sagemath.org), Octave, Cython, R, IPython, etc., in their teaching and research, but don't want to have to wrestle with installation problems and coordination with collaborators (say via Git). Numerically, most users are students taking courses from such academics. I started the project because I was teaching courses on SageMath, Cython, and LaTeX to students, and the installation burden for the students was a major problem. Also, I was frustrated by how difficult SageMath is for people to install on their own computers... even after 8 years of development (it only seems to get harder over the years, not easier!).


This may be less interesting to you, but SMC seems like an ideal platform for data-science and software related job interviews. A few years ago I interviewed with Enthought. We used Google Docs for the real time coding!


Thanks for suggesting that. I think SMC could in fact work well for those applications, though I don't have any real insight into them or know how to get into those markets myself.


I might focus more on the user and features then on re-writing if it isn't making money unless you think you can cut cost dramatically. Also maybe look to get a little funding.


As someone who still makes a decent living off coding, I must agree with the comments here - a better backend DB is not going to save the project. Get the users, get the contributors, listen to the feedback and don't rewrite the whole thing on a whim


What are your peak memory, cpu, and bandwidth requirements? GAE is expensive for what you get. Have you considered dedicated hardware to offset hosting costs?


I'm using GCE (=Google Compute Engine), not GAE. I considered using GAE long ago (seriously testing it and writing a first version), but was scared off by the vendor lock in. I don't want anything to fundamentally depend on a 100% open stack. GCE's pricing is competitive with AWS and Azure.

I did run SageMathCloud on a lot of dedicated hardware that I hosted at Univ of Washington from March 2013 until May 2015, but had to stop due to University rules. I had planned to buy computers and rent hosting in a data center, but when I looked into the costs of commercial dedicated hosting, bandwidth, and the time and people required to maintain physical hardware with the availability requirements I have, it started looking much worse than using GCE (especially as GCE prices kept dropping). I don't have any employees at all, so when something goes wrong with the hardware, I would have to drive there and fix it myself. What if asleep or traveling across country? No matter what, the odds GCE will fix any problem in a timely manner is much higher than the chances I will. The middleground is something like Rackspace, etc., which doesn't look that much better regarding cost than GCE. Of course, the price of hosting on GCE is a lower order term compared to the price it would cost to pay myself to admin everything, if I wasn't doing it in my spare time.... and then there is development work too.


Fair point. That was a good decision, others I know who went with GAE ended up getting locked in.

Good luck with the other suggestions in this thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: