More

jud_white · on May 31, 2017

* Disclosure: I sometimes contribute to NSQ.

We use NSQ at Dell for the commercial side of dell.com. We've been in Production with it for about 2 years.

> what are the typical use cases for NSQ ?

In the abstract, anything which can tolerate near real-time, at-least-once delivery, and does not need order guarantees. It also features retries and manual requeuing. It's typical to think order and exactly-once semantics are important because that's how we tend to think when we write code and work with (most) databases, and having order allows you to make more assumptions and simplify your approach. It typically comes at the cost of coordination or a bounded window of guarantees. Depending on your workload or how you frame the problem you may find order and exactly-once semantics are not that important, or it can be made unimportant (for example, making messages idempotent). In other cases order is important and it's worth the tradeoff; our Data Science team uses Kafka for these cases, but I'm not familiar with the details.

Here are some concrete examples of things we built using NSQ, roughly in the order they were deployed to PROD:

- Batch jobs which query services and databases to transform and store denormalized data. We process tens of millions of messages in a relatively short amount of time overnight. The queue is never the bottleneck; it's either our own code, services, or reading/writing to the database. Retries are surprisingly useful in this scenario.

- Eventing from other applications to notify a pinpoint refresh is needed for some data into the denormalized store (for example, a user updated a setting in their store, which causes a JSON model to update).

- Purchase order message queue, both for the purpose of retry and simulating what would happen if a customer on a legacy version of the backend was migrated to the new backend; also verifying a set of known 'good' orders continue to be good as business logic evolves (regression testing).

- Async invoice/email generation. This is a case where you have to be careful of at-least-once delivery and need to use a correlation ID and persistence layer to define a 'point of no return' (can't process the message again beyond this point even if it fails). We don't want to email (or bill) customers twice.

- Build system for distributing requests to our build farm.

- Pre-fetching data and hydrating a cache when a user logs in or browses certain pages, anticipating the likely next page to avoid having the user wait on these pages for an expensive service call. The client in this case is another decoupled web application; the application emitting the event is completely separate and likely on a different deployment schedule from the emitting application. The event emitted tells us what the user did, and it's the consumer's responsibility to determine what to do. This is an interesting case where we use #ephemeral channels, which disappear when the last client disconnects. We append the application's version to the channel name so multiple running versions in the same environment will each get their own copy of the message, and process it according to that binary's logic. This is useful for blue/green/canary testing and also when we're mid-deployment and have different versions running in PROD, one customer facing and one internal still being tested. I think I refer to this image more than any other when explaining NSQ's topics and channels: https://f.cloud.github.com/assets/187441/1700696/f1434dc8-60... (from http://nsq.io/overview/design.html).

Operationally, NSQ has been not just a pleasure to work with but inspirational to how we develop our own systems. Being operator friendly cannot be overrated.

Last thing, if you do monitoring with Prometheus I recommend https://github.com/lovoo/nsq_exporter.

jud_white · on May 2, 2016

A point of clarification for anyone skimming the original article:

> Top Tip — Libraries should never vendor their dependencies.

Peter goes on to clarify:

> You can carve out an exception for yourself if your library has hermetically sealed its dependencies, so that none of them escape to the exported (public) API layer. No dependent types referenced in any exported functions, method signatures, structures—anything.

I think this is the way to go if you're writing a library which has its own dependencies. You get a repeatable build and free yourself to change which dependencies you rely on without impacting users of your package.

There are exceptions, such as if your dependency has an init which only makes sense to run once. Loggers come to mind, where the setup should be determined by the main package. The f.Logger point in the article is friendlier to users of your package than just using log.Printf, and frees you from having to vendor logrus, for example, if you want to support structured logging.

jud_white · on May 2, 2016

> having no stack for errors is insanely frustrating.

Check out https://github.com/pkg/errors

If you want more rich control of the output of the stack trace there's https://github.com/go-stack/stack

dalyons · on May 2, 2016

Yeah, we've started using that. Kind of amazing that you need an addon lib & Wrap() everywhere to make errors useful. Just more boilerplate! :)

jud_white · on Jan 14, 2016

> Coming from pretty heavy background in MSSQL internals, this article is really great.

Do you know of any good resources for the undocumented function fn_dblog? I'm looking to understand the structure of RowLog Contents and Log Record in different Operations/Contexts to reconstruct DDL/DML.

http://www.sqlskills.com/blogs/paul/inside-the-storage-engin... is a good example but is really just an introduction.

jud_white · on Dec 22, 2015

I have a question I haven't been able to find an answer to, hopefully someone here can help.

Why is HMAC+(hash) considered secure, while being considerably faster than say bcrypt with a cost of 12? For example, if a service used a user provided password to validate a "secret" (what would normally be the signed message), is that less secure than bcrypt? If so, what makes guessing the secret used in HMAC difficult?

aidenn0 · on Dec 23, 2015

Presumably you will use a key-stretching algorithm before applying HMAC.

[edit]

So yes, directly using a password in HMAC is a bad idea, and less secure than using some function designed for deriving keys from passwords.

You are confusing two things about sha-1. Assuming a 100% secure cryptographic hash function that is as fast as sha-1, you should not use it directly for hashing passwords (though it can be part of a larger construction like PBKDF2).

This is because the number of passwords you can check per second in an offline attack is related to the speed of the hash function, and bcrypt (and pbkdf2, and scrypt, argon2) are all various ways of slowing down the hashing process.

Similarly it is likely that md5 would be roughly as secure for protecting passwords as sha-256 when used in pbkdf2 because the known weaknesses of md5 would not be of assistance in performing a dictionary attack against a hashed password.

If you have a high-quality key then HMAC is secure without needing the hash function to be slow, so if you wanted to use a password with HMAC, you would first use a KDF to generate a high-quality key from the password, and then use the key in hmac. This is similar for any cryptographic tool that wants a high-quality key as input (i.e. most of them).

[edit2]

A shorter answer is that HMAC is secure and fast because its input key already has sufficient entropy. bcrypt is slow because its whole point is to make it difficult to attack a key with low entropy.

jud_white · on Dec 23, 2015

Thanks for the varying levels of explanation (thanks to viraptor too). I think part of the reason I was confused is because GitHub's web hook setup allows for a supplied shared secret which, based on what I understand from above, is not as secure as it could be unless the user ensures the shared secret has sufficient entropy. If I'm still not getting it please let me know. Thanks again.

aidenn0 · on Dec 23, 2015

A quick 30s scan of the webhooks docs looks like that is correct; if you used e.g. 12345 as your secret, you would be susceptible to a dictionary attack from anybody who was able to record a message, on the order of 10s of billions of keys per second can be tested with a multi-GPU setup.

I suspect that the web hooks typically run over TLS, so recording the plaintext of a request would be a challenge in and of itself.

KirinDave · on Dec 23, 2015

If your shared secret is vulnerable to brute forcing, it's vulnerable to brute forcing. An easy fix for this: generate your shared secret by hashing or salthashing a low-entropy password.

As a general rule though, HMAC is used with randomly generated secrets. I don't know why GitHub doesn't just tell you the secret.

Amazon's implementation is much more correct.

viraptor · on Dec 22, 2015

I'm not a crypto master, but as I understand it, while it's possible to find md5 collision, it's many times more difficult in hmac, because it's:

    hash((secret+pad1) | hash((secret+pad2) | message))

so you would have to find a collision of one key that matches collision of another key, so you're back to relying on basic birthday attack rather than any specific hash weakness.

If you're looking for an actual proof, it's at http://cseweb.ucsd.edu/~mihir/papers/hmac-new.html

taejo · on Dec 23, 2015

bcrypt and HMAC fill different roles and have different security properties: bcrypt is a key-derivation function, and HMAC is a message authentication code. They're not comparable. In particular, HMAC should be used with a high-quality key; bcrypt is for deriving high-quality keys from lower-quality keys.

jud_white · on Nov 5, 2015

Please remove the access to Private Repositories, or make it optional.

mixonic · on Nov 5, 2015

https://api.monosnap.com/rpc/file/download?id=kelMhl0A6kbbaN...

Haha, yes this is a pretty high bar of entry.

jicooo · on Nov 5, 2015

What, you don't trust me with ALL YOUR REPOS? Haha. I know, I mentioned this permissions thing in the blog post. Building it for myself, that scope is the only way I could read contributor stats for my company's private repos. Definitely don't need 90% of the other things, write access especially. Optional private repository access seems like a good solution.

cheshire137 · on Nov 5, 2015

That's what made me not do it. To be fair, I wrote a Github app once where I wanted read access to public and private repos, and I couldn't find anything in the Github API to give me specifically that. I had to request read+write for public+private, which is horribly permissive and gave me access to a bunch of stuff my app didn't need.

jicooo · on Nov 5, 2015

Yeah, I wish there were more fine-grained controls for permissions, i.e. just let me access meta-data like stats for repos (because that's all this app really needs).

jicooo · on Nov 5, 2015

Good suggestion, I like the idea of making it optional.

lorenzfx · on Nov 5, 2015

and why does it need write access?

jud_white · on July 26, 2015

Even the wiki is a git repo, though there's no built in search like there is for repositories and issues.

jud_white · on March 29, 2015

One of my most often used Google queries is "time in [city]"

jud_white · on Dec 19, 2014

Oct 4: Apple launches Siri

Oct 5: Steve Jobs dies

  One kind of side note. On October 5th, Steve Jobs died.
  He had been involved in a lot of the process leading up to it.
  We know that he was watching this launch from his house.
  I don't know what he thought about it, but I like to project
  that he saw it, said "It is good. This is the future, Apple's
  in the middle of it. I can go now." I don't know if that's true,
  but that's a projection that I like to put onto it.

I suppose this is the kind of statement you could expect from the creator of a predictive personal assistant, but wow.

valleyer · on Dec 19, 2014

I too found this extremely gratuitous.

jud_white · on Dec 11, 2014

> no good IDE

LiteIDE is open source, cross-platform and pretty enjoyable. https://code.google.com/p/liteide/

bsaul · on Dec 11, 2014

It's the one i'm currently using, but calling it enjoyable is really excessiv. You can't compile or get any kind of completion while running your program, you can't define a script to be called when hitting "run" ( it always tries to launch your current file). It doesn't have any kind of macro or refactoring features, etc..

Coming from xcode and pycharm, it still feels a decade younger.

jksmith · on Dec 11, 2014