Hacker News new | past | comments | ask | show | jobs | submit login

You should try Scalyr (https://www.scalyr.com/). It's easier than juggling between six different tools. It was built by ex-Google DevOps engineers for the same reason you made this Slideshare: the available tools suck.

(Full disclosure: I'm working with Scalyr, but you should still try it.)




I don't think something like this should be hosted/SAAS. I don't want to send my Gigabytes of logfiles (sometimes containing confidential informations ...) over the internet to some unknown entity with probably questionable security.

My small startup company already has more than 10 (virtual) servers which would cost us 500 dollars a month to monitor which is more than the servers itself cost.


[Scalyr founder here]

We take security very seriously, but let me turn this around into a question: what would it take for you to trust an external service to manage your logs? Some of the things we're doing:

1. SSL everywhere (including internal traffic between our backend servers).

2. We add a tag to the raw representation of every string value, so that we can verify that data never leaks across accounts. (This has never detected a problem -- except in tests, because yes, we do test it.)

3. Implementing in a "safe" language (Java), to rule out low-level buffer management bugs.

4. As Greg noted, we make it trivial for you to redact sensitive data before it leaves your server.

We are sometimes asked for an on-premises installable version of our service. We don't provide that because we're using economies of scale on the backend to completely change the log management experience: when you give us a query, every CPU and spindle in our entire cluster is briefly devoted to that query. This means that you aren't limited to graphing predefined metrics; you can do ad-hoc exploration of your entire log corpus in on the fly. E.g. display a histogram of response latencies for all requests for url XXX on server group YYY in the last 48 hours, and expect a near-instantaneous response.


>what would it take for you to trust an external service to manage your logs?

I think that's a really weird question that completely fails to address the concerns that some people might have. We do logging of sales, profit margins and stuff like that. You can't have access to that because: "You're not us". If you can read our data, then we're not going to use your service and to do anything useful the logs you really do need read access.

Of cause you might have no reason to spy on our data, but the only safety is that you promise not to. We could seperate logs for different things, so webserver logs go to you, but email logs goes to an internal system, but then we would need two systems.


Do you ever email about this data or put it into spreadsheets on Google's servers?


What's your point there, that because they are already exposed in some areas they shouldn't care being exposed in more ? Or are you looking for a "ah-ah" moment; "you're already compromising that data" ?

Because either way, that doesn't change at all the concerns he voiced regarding this particular service.


I suppose both? I think it's pretty reasonable to expect a business dealing with storing sensitive data to not look at that data, regardless of it is email or logs.


> what would it take for you to trust an external service to manage your logs?

An act of God. This implies, of course, certain further prerequisites that are themselves probably quite challenging to meet.

> economies of scale

Under your current plans, the money spent putting my entire infrastructure on Scalyr would pay 2-3+ developers (or some mix of developers and sysadmins) in Taiwan, where my employer is based. We are not a tiny company, but that would be a substantial and welcome manpower increase for the server team, and only a fraction of that manpower would need to be spent meeting even our "would be nices" for logging and monitoring.

This would only get worse as we grew. Realistically, I would expect us to be able to open an office in Silicon Valley and start hiring there at market rates for the amount of money we'd be giving you.


>what would it take for you to trust an external service to manage your logs?

For a publicly traded company, a relaxation of relevant law to allow arbitrary data to leave the company.


> what would it take for you to trust an external service to manage your logs?

Well the no-brainers are:

1 - Encrypt everything before living my infrastructure (with my key, not SSL), only decrypt it in my infrastructure when generating reports. 2 - Anything that runs on my infrastructure is open source, and widely distributed. Bonus points for a simple protocol that I can write plugins for. 3 - Make it possible for me to back everything up, and restart everything in case you go out of business.

Those are the must have "I won't even let you get through security otherwise" features. Now that I think about it, #3 alone makes anything you can offer worst than doing it in-house.

But none of those are features that'll make your system look any good in my eyes, they are just enough for your system to not look like an enemy.


You are using economy of scale, but you're not serving customers that are afraid to give you important data, which is not economical at all.


Completely valid points. Here's how we're dealing with each:

1. With our custom parser you can replace or delete confidential information from your logs before they're stored on our servers.

2. We're exploring a different pricing structure right now that would address this scenario. If this is your only hesitation, I hope you check us out anyway and contact us about pricing.


Custom parser won't guarantee that all of the data is dumped...and some of it you may even want in the system.

Also -- lots of companies are seriously concerned about pushing their data externally. Making it a hard sell.

However, as a long tail service, this looks great.


> With our custom parser you can replace or delete confidential information from your logs before they're stored on our servers.

This is both error prone and utterly defeats the purpose. Why would I pay a bunch of money for somebody else to manage my logs when I'd just have to keep them all anyway so I can get at the unredacted versions when there's a problem?


Does the parser run on the client? If not, it defeats its own purpose.


Yes, on the client. See "Redaction" at scalyr.com/agent.


Agreed. Contracting out your monitoring seems to me to defeat the point of running your own infrastructure.


So what happens when my email server goes down? One of the huge advantages nagios has is that there are plugins that'll send SMSes, plugins that'll send me phone calls. Hell, people have written scripts that let them call to get nagios alerts [1]. One of the huge advantages of having something hosted in-house, and with nagios, is that it can be configured to a level of precision I can't see your tool coming close to achieving.

Yes, Nagios' configuration is ugly and occasionally requires sacrifices to elder gods. At the same time, I've never found any sort of monitoring/alerting I've needed done that it can't handle. As much as your service looks cool for a specific subset of monitoring, it is still missing half the hooks as to why nagios is stubbornly sticking around.

[1] http://www.googlux.com/callnagios.html


There's a distinct lack of images on that website. I for one would like to see how I would diagnose problems without opening 5 tabs. A "demo" mode would be even more helpful.


I was about to say the same thing... not only are there no images, but there is a such an overwhelming amount of text and bullet points on the "About" and "Feature" pages.

This looks more like documentation and less like a product landing page.

Here are some nice examples that might help: http://land-book.com/

https://www.gosquared.com/ (pulled from the first page of land-book)


We're releasing a new design in a few days, precisely because of past feedback like yours. It gets more to the point and has more UI screenshots. Thanks for the comment and examples!


Cool, you're welcome, and good luck!


Thanks for the feedback! Demoing a log management product is a bit tricky, because it's hard to come up with a representative data set that isn't sensitive. FWIW, we have put together a demo based on data scraped from the Github API -- it's not a typical server log data set, but it can be fun to explore:

https://www.scalyr.com/login?prefillEmail=demo-account%40sca...


https://www.scalyr.com/dash?page=Github-Statistics is broken. "Operation not permitted (Read Configuration permission required)"


Oop -- thanks! We'll sort this out, might take a little while though.


Yeah, no, not what I want to hear from the people hosting my logfiles.

There's your answer to "Why won't you host your server logs (which are usually key for troubleshooting flaming boxen)?"


As I probably should have clarified, this issue is specific to a single page in the demo, which we do not even link to at the moment (aside from a couple of older posts on our blog). Yes, it is embarrassing, and I apologize. However, if this issue had been standing between a customer and their data, we would have scrambled instantly.

Everything in life is a tradeoff. If you entrust your logs to us, you run the risk that we have an outage or failure of some sort. On the other hand, internal systems can fail as well. We hope to serve people who prefer not to carry the responsibility of maintaining their own monitoring infrastructure, and/or are interested in the features and performance we provide.


Fair enough. :)

Out of curiosity, what is your target customer? People running their own dedis probably are alright to grudgingly setup a monitoring solution, and people who are just using the ~=cloud=~ probably don't care.

What's your ideal customer?


Typically, someone who is already using some form of cloud-based infrastructure. From there, it's a fairly easy step to send your monitoring data to a specialized service. And cloud infrastructure can pose monitoring challenges (servers coming and going; multiple systems logging different types of data; unpredictable I/O performance causing problems for monitoring backends) that we help with.


If you entrust your logs to us, you run the risk...[of] failure of some sort.

Well, yeah.


Maybe I'm missing something but this just looks like log analysis (akin to Splunk) and not actually server monitoring? (active health checks, notifications, snmp, etc) Pricing seems wonky too...is it cheating the licensing model to aggregate logs on a single syslog server and submit from a single agent?


Log analysis is the heart of the product, but we also gather system metrics, provide notifications (scalyr.com/helpalerts), and we recently rolled out a basic active-checks feature (scalyr.com/helpMonitors). There's lots more to be done; for instance, we don't have any SNMP support today. But the vision is to be a full-spectrum tool, and we're actively working toward that.

As for the licensing model: we're going to move to per-GB pricing anyway, so no worries there. If you'd like something more concrete today, e-mail us at contact@scalyr.com and I'm sure we can sort out any pricing concerns.


Regarding the pricing, we're in the process of revising the pricing structure so that it's based more on volume instead of how many servers you have. In that case, you wouldn't need to cheat. :)


This is interesting for us as well... We currently have "only" 50 servers (mostly virtual) that need to be monitored, a per-server pricing would push the cost far too high regardless of the quality of the solution.


May I email you about this? The pricing structure will be changed soon to accommodate cases like yours--there are many of them--so if that's the only thing keeping you from trying Scalyr then I'd love to chat.


We've already started using a competitor's product (Logentries) and we're happy with it, so we're not looking to Scalyr or other log management solutions at the moment. Thanks for listening to feedback though!


The price is absolutely insane and a non-starter.


We've been hearing that loud and clear. :)

FWIW, the price actually works quite well for a lot of people. On a GB-for-GB basis, we're actually much cheaper than other hosted log management solutions -- we work hard on backend efficiency and we pass that along. But yes, if you're using small virtual servers then the pricing model breaks down. We originally went with this model to provide more predictability; log volume is often more volatile than server count. We've heard enough complaints that we've decided to move to more volume-based pricing model, we're just working out the details.


Thanks for the reply... as an OPs guy there are a ton of layered problems with running a highy elastic infra on something like AWS:

1. Dynamic registration of ephemeral systems with a monitoring platform.

2. Security monitoring of same

3. Meaningful graphing

When we are optimizing the purchase of hundreds upon hundreds of spot instances daily, where we are looking at grabbin hosts for just a couple cents an hour, the model of per host fees for things like StackDriver, CloudPassage and your service makes per-host pricing completely a no-go.

I don't have a good idea how these should be priced; but I think its important for people to understand all the other costs associated with having a solid management platform for your environment that covers all the bases and doesn't require another round of funding! :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: