Hacker News new | past | comments | ask | show | jobs | submit login
Post Mortem: A single whitespace character (eatabit.com)
333 points by goleksiak on Oct 27, 2014 | hide | past | favorite | 205 comments



Likely "Cowboy" is a transparent proxy added by your mobile service provider. I had a similar thing happening a year ago when the mobile provider used by most of our barcode scanners decided to add a transparent proxy into the loop (without telling anybody).

The solution for this problem: Use SSL.

I mean: There are already many good reasons to use SSL, but whenever you need to send any kind of mission critical data over the mobile network, you practically must use SSL if you want any kind of guarantees that the data you send to the server is what actually reaches the server (and reverse).

Here's my war story from last year: http://pilif.github.io/2013/09/when-in-doubt-ssl/


Cowboy is the name of an Erlang web server and Heroku uses Erlang for their routing. I imagine the reason Cowboy is showing up is due to Heroku's routing layer.


I would agree, but the fact that it suddenly stopped working points to carrier interference rather than Heroku (because as a developer focussed company, they would have informed OPs company of the change)


It seems it is on Heroku's end. https://news.ycombinator.com/item?id=8515164


Heroku isn't in a frozen state. I'm sure that they change the code daily. The fact that it suddently stopped working doesn't really point to any specific endpoint unless you know which ones have changed. Then it might mean something.


Or perhaps they would have noticed increased error rates for a client and investigated. Monitoring right?


We would really like to use HTTPS but it's not supported by the Arduino chipset as I understand it. Though I'm not the hardware guy here at eatabit...


Why not ROT13? Or a simple substitution cypher?

Not trying to be silly. But if the only goal is to prevent man-in-the-middle attacks such as someone mangling the data, why not "corrupt" the data such that the phone company in the middle can't read it?

You control both ends. You can make your own "security".

You're not explicitly worried about security. You're not worried about Evil Person reading your messages. You just want your carrier to stop f'ing with your data.

If the data is slightly corrupted so the carrier's crappy software can't recognize it as http headers then the carrier's software (hopefully) won't fck with it.


They could try a different port - some systems won't bother.

They might also use TLS with null cipher. That should be not-so-intensive, even on a tiny processor. And it could be enough to defeat some packet-modifiers (they may notice it's TLS and not analyze), while maintaining HTTPS compatibility.


Last I checked, you have to use some sort of special sockets add on to use raw TCP instead of HTTP over TCP with heroku. So you are making your heroku setup more complicated and potentially more expensive since they sometimes charge for add ons. It may disable some of their routing and load balancing capability as well. It is kind of silly to start rewriting standard transport layers anyway since you are going to spend a long time doing that instead of working on your product.


You guys build all the hardware - why not just use one of the pre-shared key TLS ciphersuites? No expensive public-key crypto required, just AES and SHA.


Last time I checked (about a year ago), support for PSK was pretty abysmal on the server side of things.


Why don't you guys just use a Beaglebone Black ($50), M2M cape (http://www.yantrr.com/products/m2m-cape-for-beaglebone), and a thermal printer (https://www.sparkfun.com/products/10438). BOM for that, plus project enclosure, is like $200. That's also just after a bit of quick googling...there's bound to be a much cheaper solution.

What's your price point for hardware?


We are at about $200 now for hardware. We have a custom PCB and 3d printed case. We also have an LCD and some control knobs in the mix. Checking out the Beaglebone stuff now...very interesting since there is basically no 3G/4G modems for Arduino right now...


I'm just curious why the default response is still to reach for an Arduino--much more powerful SoC chips are cheap these days.


For us, the reason is that my co-founder and I (both are not hardware guys) were able to build a proof of concept in my garage and Arduino seemed like the best (easiest) choice. Since then, we hired a hardware guy who designed a custom PCB etc. HTTPS would be great but we don't transmit any personal data so it's not a high priority right now.


> HTTPS would be great but we don't transmit any personal data so it's not a high priority right now.

You are sending people's orders around the web. I'd consider that "personal".

None-the-less, use SSL, there is little reason not to use it these days. And as others have pointed out, it's the only good and easy way to guarantee what you send to one of these printers is what it actually received (no carrier tampering of your packets, etc).

Just use SSL.


If SSL isn't supported by the Arduino chipset, then that sounds like more than a "little" reason not to use it. That sounds like it might be an "it would be a whole lot of work" reason not to use it.

(I don't actually know how much work would be involved, but goleksiak says they would really like to use it, so I assume it's not trivial.)


SSL on the Arduino--usually a little 8-bit (!) chip--is a bit resource intensive, in both memory and processing time.


Hey, no worries. Real artists ship. :)


One reason to still use an Arduino is realtime. The arduino's minimalist OS is realtime by default, but the BBB runs non-realtime Debian. They might need RT timings for the printer interface? You can add realtime linux extensions to BBB, but it's not a beginner task -- basically rolling your own Linux installation and writing your app as a kernel module. You can also use the PRU on the BBB to get insane RT performance, but you'll have to code it in assembly. I love the BBB, but Arduino is still an easier package for simple realtime.


Yeah, like any number of Olimex parts. Or a Raspberry Pi Model A.


larger pool of cheap developers, and a hook onto an ongoing popular trend (arduino gizmodery), cheap prototyping.


if you buy a batch of beaglebones from adafruit.com or another, perhaps larger mid-tier distributor, you can likely get a discount per unit.


SPDY is an alternative to HTTP that avoids at least Verizon's carrier shenanigans.



Your network stack is being handled by a GSM/GPRS module, it might support SSL. Using Arduino was easy in this case, but I would have gone with something a little more substantial like an STM32F4


Then, don't use SSL if it's too heavyweight.

I know everyone will tell you not to roll your own cryptosystem, but rolling your own is superior to having no encryption or authentication, and so long as you're sane about it the result should be no worse than passing plaintext.

Your messages are small. Encrypt (or maybe just sign) them with RSA and call it a day. You don't really need to use port 80 and a HTTP preface at all, do you?


That is pretty bad advice. RSA is slow, needs a lot of memory and is difficult to get right. Just go with AES in CTR mode if you absolutely have to. And remember that encryption != authentication.


Who was replacing numbers with asterisks, and for what purpose?


That was a "privacy" mode of some personal firewall that was protecting the user's phone number from leaking out.

Incidentally, this has leaked the users phone number because only that specific numer was being replaced with asterisks.

Welcome to the world of very crappy "security" (-theater) end-user products.


The asterisks hide this request's BasicAuth credentials between the cellular printer and our app servers...we don't transmit customer phone numbers between the printer and our app servers. Twilio transmits them to our app servers but that is HTTPS


I think pavel_lishin was referring to a side-note in my war-store I listed as the parent comment.


Yup.


ah - my bad


I didn't want to hang our BasicAuth creds out to dry


You were sending them over http, so aren't they already kinda fucked?


y, true


Oh, I was asking pilif to comment about their experience.


What's wrong with transparent proxy ? Isn't how HTTP caching is supposed to work ? I would think the cache headers are the solution rather than SSL.

It feels like you are kind of throwing the baby with the bath water. IMHO, badly configured transparent proxy does not mean the concept is bad, does it ?


No, "transparent proxying" is a clear violation of HTTP specs (as well as TCP protocol, and IP's "thou shall not mess with packets in transit" principle/specs). It's essentially a MITM attack and all bets are off wrt correctness.


We'll disagree then :)

From RFC 2616 "The HTTP/1.1 protocol allows origin servers, caches, and clients to explicitly reduce transparency when necessary."

As I said, bad configurations dos not mean the principle is unsound.


No, RFC 2616 uses transparency in a different meaning that the common usage of "transparent proxy" is.

Common meaning (from https://en.wikipedia.org/wiki/Proxy_server#Transparent_proxy): "Also known as an intercepting proxy, inline proxy, or forced proxy, a transparent proxy intercepts normal communication at the network layer"

RFC 2616 uses the term to describe a property of a normal, opt-in HTTP proxy: "A 'transparent proxy' is a proxy that does not modify the request or response"

In preceding discussion we were using the term in its common usage meaning.

Also, you misrepresent what RFC 2616 says about the its concept of transparency. The part you quoted continues:

  "the protocol requires that transparency be relaxed

      - only by an explicit protocol-level request when
        relaxed by client or origin server

      - only with an explicit warning to the end user when relaxed by
        cache or client "


Problem is often the competence of people handling these proxies. Yesterday I discovered that my ISP is blocking all DELETE http calls (ACT fibernet in India)!

And before that they decided to enforce a reverse DNS lookup so all the name based virtual servers with local DNS entries stopped working - I had my staging instances setup that way.

And not only all this happens without any prior information, it is next to impossible to climb through the support layers to finally find someone who even understands what are you talking about. They just want you to restart your modem to "resolve the problem".


Sounds like you need to VPN out of your ISP, ;)


Yeah but the DELETE blocking affects any customers on this ISP as well. I am now working on moving to SSL asap.


The problem is that you have no recourse. When the carrier decides to f up your connections somehow, you can try to work around the issue, until they break it some more to the point where the one thing you really needed also stopped working.

Then you can hope that you are big enough to have priority with the carrier or you know somebody who knows somebody who can fix it.

Or you don't deal with any of this and just go SSL. A certificate will cost you $100 per year in the worst case. Thats about one hour of your time spent fixing proxy issues (not including customers and/or end users breathing down your neck because their software just stopped working for some as yet unknown reason)


Except the carrier, or the that case the hoster, did not break anything. It's their software that don't comply to the RFC. HTTP is supposed to be cacheable when it makes sens. With SSL to work that around you break the Internet somehow.

To me, there are valid usecase for SSL, using it to work around proxies is not one. That said, I get your point, you prefer the possibly easier and safer way. But you still might run into another set of problems (https://news.ycombinator.com/item?id=8471877).


This would require a vast, vast upgrade of client power to achieve the same communications performance. If you could achieve it all, SSL would also likely decrease reliability over a spotty GSM link.


Is client CPU actually a limiting factor? How does this affect reliability?


You cannot physically fit a whole SSL datagram (max size 16KB) into 8KB of RAM. SSL requires multiple passes over the data to (eg) decrypt and verify a datagram. At this point, you cannot use standard SSL at either the server side or client side.

On the subject of reliablity: a 8 bit uC running at 16MHz needs a long time to do the public key crypto required to set up the connection. This means you need a GSM data link to be continuously available for a longer period.


This is true, but if you're using GSM you have a much beefier processor handling the GSM side and there's no need to use a tiny microcontroller. Some of the GSM modules will offload the whole HTTP(S) request for you.

Edit: or you could get a Cortex-M0 with 32K RAM for $2.


SSL supports a null cipher, so why not use that? The handshake alone may be enough to prevent packet inspection.


as someone unfamiliar with their specific workload, but who has used such boards to do vpn/ssl stuff, no -- it's not a limiting factor unless trying to skim power requirements.


Heroku came back and said:

Looking through the system, I see that you were sent two emails (in August and September) as several of your apps were migrated to the new routing stack (https://devcenter.heroku.com/articles/heroku-improved-router). As mentioned in the documentation, the new router follows stricter adherence to the RFC specification, including sensitivity to spaces.

...and sure enough, there is a line that says:

The request line expects single spaces to separate between the verb, the path, and the HTTP version.

So the lesson is: RTFM

-G


The team at Heroku (where I currently PM) is constantly trying to improve our communication and documentation. We're definitely sorry that this caused problems, and we'll work even harder to make sure that our communication calls out any potential issues. Again - thanks for reaching out to us, and let us know if we can help.


Heroku did their best here. They reached out to us (twice) advising of changes and linking to a document that describe EXACTLY the bug that we discovered (later). Honestly, I don't feel bad about his bug because even if I would have read the alert to the letter, we would not have audited the entire codebase because we don't have that luxury of time. Yes, it took our whole operation down...but we found it, fixed it and now we're back up. ...it's all in the game. -@eatabit


That's all true but once you start accepting illegal input on a protocol for a long enough time you can't just suddenly go and break things without an automated alert to the customer when that particular thing starts acting up.

After all it would not be that hard to scan for which customers are going to be bitten by that particular change when it actually happens rather than using some fire-and-forget email.


This very example -- requests were technically illegal all the time without devs realizing, but something in the stack changed to start rejecting them -- demonstrates the fallacy of the "be liberal in what you accept, strict in what you issue" principal. If all the web servers involved had been strict in rejecting the illegal request from the start, they would have noticed the bug in development before deploying to firmware in the field.


I don't agree that "be liberal in what you accept, strict in what you issue" is a fallacy. The client actually failed to adhere to the "be strict in what you issue" principal, just as the Cowboy was not liberal in accepting. All software will sooner or later exhibit bugs or be stricter or more lenient about a standard.

I think the fallacy is to assume that once stuff works in production, only your changes can trigger a bug. There's way too much software involved in a standard webserver stack to assume anything about it. Any patch, any update to software or devices not under your control has the potential to break your stack. The thing the OP did was the right thing: Monitor, monitor, monitor.


The liberal/strict thing is a terrible idea. It introduces completely busted behavior.

Consider a client that emits \n instead of \r\n. How do you handle it? Liberally? OK, treat 'em like CRLFs. Now you read \n\n. Everything after that is content, right?

Oops, you're now ignoring headers, potentially security-sensitive ones.

I've run into this exact bug in production, leading to a security problem. The client, proxy, and endpoints had different ways of handling CRLF. Some would treat \n\n as the end of headers, some not. Exploiting this, clients could route requests through the proxy and add special headers that only the proxy should have been able to add (like X-Client-IP).

Apart from this, the whole "robustness principle" just leads to a bunch of guessing and even more incompatible implementations. See HTML as another example mess.


> The client actually failed to adhere to the "be strict in what you issue" principal

Well, that's the rub, right? How do you know how strict you're being if your tools accept things liberally? If anything, the lesson here is to test with the strictest possible tools.

> just as the Cowboy was not liberal in accepting

And this is hard too, because on what dimensions should you be liberal? How do you decide what the "real" set of inputs you're going to accept?

And that leads to my real issue with the principle: what should you, as the liberal accepter, do in those cases? Here it's easy enough to guess what the behavior should be with the extra space (just accept the damn request), but in general it's not -- you're creating implementation-specific behavior; what happens when you accept undefined or incorrect inputs will vary from implementation to implementation, creating a nightmare of uncertainty for people sending you stuff. Of course, you can always say, "they should send stricter stuff!" but then what's really the point of accepting inputs liberally?


The problem is that "be liberal in what you accept" is, by definition, saying to go beyond the standards, accepting things that are technically illegal according to the standards.

So different software will necessarily do it differently. For all software to be doing it the same, there would realistically need to be some specified standard on how to do it, and then we're no longer talking about 'be liberal in what you accept', but just 'accept exactly what the standards say.'

Of course, in this case the client software was not being 'strict in what you issue' -- I am not challenging that part, of course you should _always_ issue exactly correct according to standard requests or other protocol communications. But there will inevitably be bugs, bugs happen.

"Be liberal in what you accept" makes it harder to find those bugs, and leaves them waiting to surprise you when the (non-standard) level of "liberalness" on the receiving end changes, which it inevitably will because it was not according to standard in the first place.

I think the HTML/JS/CSS web provides another good example of the dangers of 'be liberal in what you accept', very similarly -- you may think your web page is 'correct' because one or more browsers render it correctly while being 'liberal', and not realize it's in fact buggy and will not render correctly on on or more other past, present, or future browsers. This example has been commented upon by others, and I think has led to a move away from 'be liberal in what you accept' in web user agents. http://books.google.com/books?id=5WXp4j4eV4UC&pg=PA136&lpg=P...


How about this as a middle-ground:

Be strict in what you issue (duh!), be liberal in what you accept - but both emit strong warnings when the input isn't strict, and have a strict mode.


That doesn't work. Strict mode ends up getting turned off by default, or turned off at the earliest problem. After all, what's the point in being so strict? I've seen security bugs arise from this, nicely commented in source with a "// spec says x but no need to be so pedantic".

If everyone can be strict in what's sent, then the problem is solved. But since that won't happen, even on accident, the only solution is to be harsh on receiving input and hope things fail early in the dev cycle.

Also, text-based protocols are especially prone to this poor handling, A: because spec writers (like HTTP's) go moronically overboard, being all creative (line folding? comments in HTTP headers? FFS!) and B: because text is so easy, everyone just figures anything goes and pays less attention.


I'd say the fault with HTML/JS/CSS is that the implementation of the rendered (the browser) broke the stack by not being strict in what it emitted. Put another way, a badly formed page should render badly and/or issue errors. For historical reasons, browsers did not and do not. Hence, the reason the browsers are "broken".


It's quite a game theoretical problem. Make a strictly standard compliant browser and nobody will use it, since it won't display most of the websites. You have to render badly formed pages somehow if you want your browser compete with other browsers, since they are doing the same.


Or, maybe instead of ballooning this thread with unending hairsplitting, we should recognize the principle as a heuristic that fails on non-representative or extreme cases...


This goes double if you're hosting on Heroku as you won't be able to correlate the changes Heroku makes with issues showing up for you. They're lucky that they hadn't pushed a change at the same time as Cowboy changed, or the debugging could have taken even longer.


I have to agree. I developed a proprietary embedded web server using a streaming HTTP parser. Complying with the HTTP parsing rules is a headache to say the least. Variable amounts of whitespace; 2 variants of line terminators (\r\n or \n) with the provision that the latter SHOULD be accepted by the server and line continuations make complying with the whole specification a real pain if you only have 100 bytes to parse pieces of your request.

Maybe for a server with massive resources (I am talking about megabytes of RAM compared to kilobytes I work with) being liberal in what you accept works, but not when you are on a budget.


Every appearance of SHOULD/MAY in a spec is just begging for bugs or incompatibility. We'd be better off if those words were banned. Spec writers would be less inclined (hopefully) to come up with all sorts of arbitrary behaviour that might happen and could be maybe handled.


Should and may are spec weasel words, in specs there must (hah!) be 'MUST' and 'MUST NOT'. Otherwise a spec is just a piece of rope with a pre-tied noose.


I think Postel's law should be read in the context of “when you cannot control the outside”. It's probably the least-bad option when you are forced to support unknown clients – see e.g. http://daniel.haxx.se/blog/2014/10/26/stricter-http-1-1-fram... for a very recent example – but that clearly doesn't apply in this case where they control both sides, or in many other cases where the number of clients is small and/or there's a solid communication mechanism to tell developers when they need to fix something.


Not "principal", "principle".

Not being critical, just pointing out a common mistake.

http://blog.oxforddictionaries.com/2011/08/principle-or-prin...

Principal: Main, most important

Principle: A rule, a system of belief


It's really interesting to see something like this down-voted. There's nothing pedantic about this. It's offered with nothing but respect. Perhaps the comment writer isn't a native speaker and this was an honest point of confusion. What is wrong with trying to be helpful?

It is a common mistake I see all the time here on HN (along with "your" vs. "you're" vs. "you are"). Why is it that is offensive to the point of deserving a down-vote? Please help me understand.


I didn't downvote.

People downvote corrections because they're usually noise. When someone makes a typo - and homophones are usually slips equivalent to typos - it's noise to point it out.


It's only noise if it has no value. A post such as mine would not have to appear too frequently for HN readers who might be having difficulties with such words to understand the problem and correct their writing. Not going after perfect English, few of us could approach that. But I see a few common patterns on HN all the time and nobody takes a second to say "hey buddy, just in case this wasn't clear to you, here's a helpful tip". Some of these are confusing to non-native speakers. When trying to be helpful is frowned-upon what are you left with?

This comment needed to be left alone. No down vote, perhaps an up-vote by the comment writer if s/he found it helpful and that's it.

One of the things that continues to disturb me the most about HN is how thin skinned the community seems to be. It is impossible to consistently offer a contrasting point of view here without down-vote attacks that make your point of view virtually disappear. Mind you, this particular post isn't that. It just reminds me that HN is really weird.

I get down voted a lot despite the fact that I am a successful entrepreneur since age 15 who has built several companies and continues to do so. My perspective, however, seems seldom welcome here (based on how often I am down-voted) because I don't tow the line of the 20-somethings that are the bulk of this audience. Instead of learning they choose to pound what they don't like out of existence. Weird.


> But I see a few common patterns on HN all the time

> A post such as mine would not have to appear too frequently

Which is it? All the time or not too frequently?

And while you might only make rare posts some people would point out every error and mistake and difference in style. People downvote your post to dissuade those other posts.

About your downvotes: I'm guessing they're for your incredible arrogance.

https://news.ycombinator.com/item?id=8443553

https://news.ycombinator.com/item?id=8440762

https://news.ycombinator.com/item?id=8440847

People see that level of arrogance as ugly. You might want to either change your posting style or stop complaining about the downvotes.


It's easy to make someone sound arrogant by (a) taking comments completely out of context and (b) not bothering to understand the frame of reference by at least asking the question.

HN only does well with well defined technical discussion. On everything else it has degraded to almost what happened to every USENET list in the past. USENET did not have any voting mechanism to make opposing views disappear. In that case those who wanted command of the list and felt ownership of it simply resorted to brutal flaming attacks. Some lists were really horrible places for anyone to say "I disagree".

HN can be like that, in a different way, if you are not a 20-something drinking from the same koolaid bowl. To the point of someone taking the time to take something out of context and then using it to call someone arrogant.

So, come to HN to agree with the herd or risk being called arrogant for presenting a different point of view. Brilliant.


I agree with you 100% about the English correction, but you do an awful lot of complaining about the field rather than admitting that there was something wrong with your play, and occasionally some thinking that you know the demographics of the person you're taking to, and disqualifying of that person's participation in the discussion based on that fiction.

Notice that your correction is in the black, but these complaints are in the grey.

Unpopular opinions do have a lot of trouble on HN, but I think that dang's efforts with algorithms and intervention have improved the situation, and at least show good intent.

I attract downvotes like honey attracts flies, but I deserve them. I really disagree, and am happy to repeat myself. I double-down on my most downvoted comments; people may not know quite how much they disagree with me unless I expand on what I said.

>risk being called arrogant

Not very high risk then? Sounds like a very safe place.


I'll admit that being silenced through down votes simply for presenting alternative points of view does not sit well with me and this might lead to less constructive conversations from time to time.

The huge difference I see on HN between someone like me and the HN "crowd" boils down to: life and business experience. I too was an idealist at 22 years of age. And I too thought and said a ton of dumb things for most of my twenties.

I was VERY lucky in that my first job in technology had me working within a team of engineers that were at least 10 years older than me. I was 19 years old when I got hired as a junior engineer. I hadn't finished college yet but I was able to convince the VP of Engineering that I had what it took. And I did. Not being arrogant at all. By 19 I had already designed and built (from raw chips) at least two computers and had presented a paper at an ACM conference.

Anyhow, the education I got from the "elders" was priceless. I am not talking about technology. Yes that was invaluable. No, I am talking about how to be a man rather than a child. How to think instead of reacting. How to question what amounted to indoctrination being dished out by some of the professors at school. How not to come of as a 20-something ignorant moron.

We often had pretty deep discussions about business, politics, ethics and all manner of subjects. I stayed at that job for ten years. It was an education I didn't know I needed. Over the ten years I was there I noticed how my mental process was growing apart from that of my friends. They were growing through their 20's without the benefit of a team of "elders" applying corrections and providing advice on a daily basis. I was in an environment where I had to behave like an adult and think like an adult in a serious organization. To this day I run into circumstances where some of those lessons come back to the forefront.

Surveys set the bulk of the HN audience somewhere in the mid 20's in terms of age. It has been my experience through hiring dozens of engineers and engineering students that kids today are not benefiting from the same level of interaction with adults. Yes, a 20-something man is still a kid. Women tend to grow up a lot faster than we do. Unless the kids have a strong family social group to guide them they can get to their mid twenties and still be complete juvenile morons. I've seen it in more than one occasion.

Culturally the US presents a case where kids are "kicked out of the house" at 18 or thereabouts and go off into the wild to become men and women. In other cultures the family unit stays far more connected and provides a regulating mechanism. For example, the drunken orgies around Spring Break are a uniquely American phenomenon. In other parts of the world no 20-something would even think of behaving like that and then have to answer to their family for the failed moral choices made during that time.

Kids who behave like that have no common sense or manners yet if you spoke to them at any other time (or here on HN) they'd probably tell you that they are perfectly sensible people. Kids like that go to school on their own and suck in all the crap dished out by an educational system permeated with far left extremists in some cases. It isn't my intent to turn this into a political discussion. It is a well known fact that some of our universities have, perhaps by accident, turned themselves into left wing indoctrination centers. And the kids take this shit and make it their own belief system without even making an attempt to question any of it because, well, "lord of the flies" is their environment, self regulation is hard at that age.

Long way to say that if you are an older engineer with more life and business experience posting on HN it is almost assured that the kids are going to pummel you with down votes because, well, they think they know better and are not open to considering any other ideas. On matters of technology they do well because it is often very clear cut. On matters involving life or business experience they just don't have a clue yet they think they do. Instead of taking the opportunity to learn they engage in confirming each other's biases and push back hard on anyone who is not drinking from their koolaid.


> and nobody takes a second to say "hey buddy, just in case this wasn't clear to you, here's a helpful tip".

"hey" should be capitalized, it's at the beginning of a complete sentence. Also, a comma should be before the quotation.

> No down vote, perhaps an up-vote

down-vote and up-vote should at least be hyphenated consistently.

> One of the things that continues to disturb me the most about HN is how thin skinned the community seems to be. It is impossible to consistently offer a contrasting point of view here without down-vote attacks that make your point of view virtually disappear. Mind you, this particular post isn't that. It just reminds me that HN is really weird.

Likewise, down-vote should be hyphenated consistently with the previous use.

> I get down voted a lot despite the fact that I am a successful entrepreneur since age 15 who has built several companies and continues to do so. My perspective, however, seems seldom welcome here (based on how often I am down-voted) because I don't tow the line of the 20-somethings that are the bulk of this audience. Instead of learning they choose to pound what they don't like out of existence. Weird.

I don't get what kind of prank you're trying to pull here. Is it "down voted" or "down-voted"???

Just being helpful!!


Let me guess. You are 15 years old and just ditched school to screw around on HN. Right?

The difference, in case you did not understand that. Is that my earlier comment was purely constructive in nature.

Your comment was a juvenile "I'll show him. I'll rip his writing apart and put him to shame".

One is an adult constructive post. The other is what I would not allow from my eight year old kid.


This is true but typos/misspellings/etc still reflect poorly on the author in most situations. After all, isn't the subject of this post a rather critical typo?


Misspellings don't tend to bother me as much these days because of, well, the iPad. Seriously, I hate typing on that thing with a vengeance. The problem is exacerbated in my case (and those of others) because I have to turn off auto-correction. Why?

Because I communicate in multiple languages and auto-correction/completion makes it very difficult. Switching the keyboard back and forth doesn't help either because it isn't uncommon to use more than one language within a single email or comment (in other words, mixing languages).

My little post was about pointing out a mistake in usage that isn't a spelling problem but rather using the wrong words altogether. I see this A LOT in technical websites, writing, job posts and resume's.

Look around and see how many job positions are asking for a "Principle Engineer" instead of a "Principal Engineer". The first is some kind of a moral cop position within the company, I guess, the second is an engineer in charge of a project or department.

But, yes, you are right. If I know that someone is a native English speaker and they have bad typos, misspellings and generally can't communicate well in written form it does reflect poorly. If they are not native it is a matter of their position. I would expect someone with a university degree to not confuse "principle" with "principal" or "your" with "you're" (and other such examples).


Exactly. Totally agreed on all counts :)

Btw that is why I use the somewhat pretentious sounding "Written on my tablet" or "Written on my phone" in email signatures on devices like that in hopes that people will re-attribute typos that might otherwise reflect poorly. But it is still a good idea to proof read written communication of any significant value...


Yes! It's down-right embarrassing in business communications. My signature says "Please excuse any spelling errors or strange words. This was typed on an iPhone which is a terrible text entry device."

I never had to worry about this when I had a Blackberry with a physical keyboard. In fact, to this day I have never understood why Blackberry didn't create a campaign of really funny TV ads with people sending hilarious or out-of-place text messages because of the issues with screen based typing. The ad would end with some kind of a catch phrase pointing out that this won't happen to your on a Blackberry.

Ditto for other smart phones. The easiest way to compete with the iPhone is to offer what I will call a "true business smart phone" with an fold-out keyboard for accurate text entry. You could drive that point with ads until everyone is blue in the face. Typing on a touch screen is a horrible experience.


> resume's


Doesn't it just demonstrate that you shouldn't switch from being liberal to being strict?

For it to hold up, you need to provide the further argument that you frequently need to switch from liberal to strict.


Or... here the problem is that the "be liberal in what you accept" design principle failed to be captured by the HTTP specification writer, that forced a single SP character. It looks like a specification issue to me to use a syntax which is very prone to errors, and is even not much visible (you can't easily inspect double spaces in protocol traces when checking just with your eyes), and then be strict about it. Even changing separator, if you want to be strict, already helps, like in "foo|bar|zap" compared to "foo bar zap".

Humans are strage: many will spot "foo||bar|zap" as an error, but not "foo bar zap" as an error as serious as the previous one.


Especially when you have display technologies like HTML that will actively compress whitespace... As happened in your last example.


I agree http://www.win-vector.com/blog/2010/02/postels-law-not-sure-... . Correct code remains correct under various compositions and transformations (that may happen in the future). Code that is working only due to pity often does not have this property. Some Netflix style chaos-monkey that turns on and off strictness during testing would be cool.


In particular this philosophy is rejected in the Erlang community, where they prefer "crash if anything is not what you expect it to be"


This doesn't demonstrate a fallacy in "be liberal in what you accept" any more than closed source software demonstrates fallacies in Linus's Law.

The problem wasn't liberal acceptance, it was that liberal acceptance ended when Cowboy was added to the mix.

Strict acceptance would have shown the error earlier, but continued liberal acceptance would have allowed continued functionality.


You mean so long as everyone standardizes around a non-standard, rather than the actual specifications of the standard, it'll work?

I think I prefer just adhering to the standard in the first place.


That's what "strict in what you issue" means


If you are always going to blame the issuer for not being strict... what's the point of the accepter being liberal?

I agree that the issuer should always be strict; and if accepters were strict too, then buggy issuers would be detected immediately and never make it into production. Instead they make it into production, where they will sometimes work and sometimes not, depending on the accepter stack in use at the time and context and how the accepter stack chooses to interpret 'liberal'.


It locks you into the particular "liberal" implementation you started with, or at least significantly increases the risk of changing implementations.

"liberal" by definition here means _beyond the spec_, according to no spec. So different implementations may have different varieties or extents of 'liberalness,' and switching implemenentations will almost necessarily give you a different set of acceptable requests. If they were all the same, that'd be adhering to some spec, not being liberal in your acceptence of it.

"Liberal acceptance" may or may not have ended -- we don't really know if Cowboy accepts only exactly what is legal according to spec or not -- but the bounds of what is liberally accepted defintely changed. As it neccesarily will any time you switch implementations, since 'liberal' is by definition not according to any spec.


The right thing, I think, is to "accept but warn". Like those web browsers that used to show a yellow exclamation mark in the status bar when something was off; web devs could check for this and fix it, but normal users were unaffected. More protocols should include a way to indicate "nonfatal errors".


I recall reading that Postel's law did not mean "accept input that flagrantly ignores the standard", but merely wherever the standard might be read differently, accept all conceivable interpretations of the standard. Unfortunately, I can't remember for sure where I read this, or how authoritative it was.

Postel's original formulation is not written in an essay, but an RFC, and does not elaborate on what he meant: https://tools.ietf.org/html/rfc761

Here's one discussion that suggests this interpretation, without precisely ascribing it to Postel: http://cacm.acm.org/magazines/2011/8/114933-the-robustness-p...


> the fallacy of the "be liberal in what you accept, strict in what you issue" principal

The market (players) (can) manipulate it to create an (perceived) competitive advantage.

It's also a source where "evil" in IT comes from.


SIP takes this to the next level. http://tools.ietf.org/html/rfc4475 Is a spec for "torture tests", where the SIP authors revel in the hideously complex parsing rules they've come up with (which is basically HTTP parsing).

They even suggest that code should infer the meaning of messages. So I suppose you need some sort of AI to really handle things well.

Binary protocols would be a better choice. Or, a well-defined text format. JSON, XML, anything, really, would eliminate this class of bugs.


So SIP is like IRC?


I imagine IRC has the excuse of wanting the UI to be a simple line-based system, and thus in-band signalling is an unavoidable evil.

This same reasoning is why mail and other messages have the Header: Value format - you can compose in plain text. This is documented at least as far back as RFC 561, in 1973. Again, that's a good excuse: Users must compose by hand. Also, the RFC is just codifying what people were already doing.

HTTP, SIP, and others do not have these excuses. HTTP headers are essentially never written by hand, and the extremely few times they are, we don't need conveniences such as comments, line folding, lenient grammar, etc. (Proof: People write XML and JSON by hand far more often, without major ordeals.)


I think the core issue here is that we're directly manipulating strings instead of using DSLs and tooling based around grammars to build our responses (this has been a solved problem for more than 10 years!)

I'm a strong proponent of "do not manipulate strings". Having library writers be the only one doing that would greatly reduce the attack surface/bug potential.


The Server: cowboy tag is from an Erlang web server:

https://github.com/ninenines/cowboy/blob/master/src/cowboy_p...

I'm guessing around here would be interesting to add a test case to handle.

As far as whose server this is? I'd guess Heroku or AWS, though it's plenty possible T-Mobile could have devised some proxy to inspect traffic, but seems unlikely they would do so with Cowboy?


It's simple enough to single out Heroku:

  $ cat <<EOF | nc example.herokuapp.com 80
  GET /test  HTTP/1.1
  
  EOF
  ----
  HTTP/1.1 505 HTTP Version Not Supported


Your example fails with or without the whitespace. These work though:

Request

  printf 'GET / HTTP/1.1\r\nHost: example.herokuapp.com\r\n\r\n' |  nc example.herokuapp.com 80
Response

  HTTP/1.1 200 OK
  Connection: keep-alive
  Server: SimpleHTTP/0.6 Python/2.7.6
Request

  printf 'GET /  HTTP/1.1\r\nHost: example.herokuapp.com\r\n\r\n' |  nc example.herokuapp.com 80
Response

  HTTP/1.1 505 HTTP Version Not Supported
  Connection: close
  Server: Cowboy


Ah right, forgot about the newline specification. I guess, for reference, the smallest string I can come up with to get Cowboy to spit that error message is '\x20\x20\n'. Parsers are fun.



T-mobile is known to have, in the past at least, used Erlang. And Cowboy is one of the most popular web servers within that community at this point.


Didn't heroku turn off legacy routing last week? I was getting emails warning me to update to their new routing rules by Monday. Seems like it could be related.


Erlang is from Ericson, a telecom company. T-Mobile is a telecom company. Doesn't seem a stretch that they would use erlang.


      strcpy( ( char * ) commsOrderBuffer, "GET /v1/printer/");
  
      strcat( ( char * ) commsOrderBuffer, ( char * ) settings.getIMEI());
      strcat( ( char * ) commsOrderBuffer, "/orders.txt  HTTP/1.1\r\n");
      strcat( ( char * ) commsOrderBuffer, "HOST: ");
      strcat( ( char * ) commsOrderBuffer, SERVER_NAME);
      strcat( ( char * ) commsOrderBuffer, "\r\n");
      strcat( ( char * ) commsOrderBuffer, "Authorization: Basic ");
What the.... O(n) string concatenations, unnecessary pointer casts, no bounds checking... I think extra whitespace in an HTTP request is not their only problem.


Those would be "safe" (assuming that settings.getIMEI() is completely under your control, everything else is string literals) but yeah snprintf seems way better here (though it's been well over 20 years since I wrote any significant C code.


Possibly safe but definitely inefficient, since it has to find the end of the string to know where the destination pointer starts. The right way is to keep a pointer to the end.

(Or since they are already using std::string in other places, maybe just do that everywhere, I'm sure it makes better choices than they did here.)

The pointer cast thing is glaring. Why not simply declare the buffer as a char array and be done with it, instead of casting at every use? IMO over-use of pointer casts is a clear sign someone is lost in the language, your goal should be to reduce them.


Yeah agree, to me casts like that are a smell that someone is trying to squash compiler complaints rather than understanding them. It also has every appearance of "copy/paste" code writing.


Since these are all string literals you really don't need any concatenation function at all except to concatenate with the output of getIMEI().

  char *a = "Hello " "world!";
Works just fine.

Edit to add: You can really see the difference in code between someone coming to C/C++ from a high level language and someone who learned assembly first, where a list of literals is a common idiom. The original style is not functionally wrong, but it does look like Java :-)

Also: DON'T post your potentially insecure string handling code on the Internet; are you crazy?


The Arduino embedded C library (which I'm assuming they're using) isn't as rich as a Glibc or uclibc. Sometimes I have to fall back on very old school methods to build complex strings.


This is a very weak excuse to use strcat(). Writing something that actually tracks the length instead of recomputing it every time takes 5 minutes.


I'd hesitate at suggesting it only for efficiency in some embedded device that is fast enough, if it sacrificed readability at all.

Yes, I'm saying that I'm ok with the code above assuming that there are no user inputs that can exceed the bounds (though casting away the const is strange, I assume the stdlib is not correctly consted?)


It's not casting away const, the buffer is declared as uint8_t and they are casting that to char... Otherwise known as it should have been char to begin with.


O what now? O(7) is the same as O(1)


Yeah I'll admit I was a little fast and loose with that. O(n + m), where n is the length of the (pre-concatenation) destination and m is the length of the source. Do that enough times and you get a quadratic looking curve. My point was it's easy to get to O(m).


I saw it right away - "that HTTP/1.1 looks a bit farther away than it should be..." - and confirmed it by selecting the spaces. I thought it would be a bit more subtle than that... I remember working with a server that violated the HTTP spec by not accepting allowed extra spaces in headers.

According to the new HTTP/1.1 RFC 7230, it should be a single space - the previous RFC didn't specify this clearly in the wording, although it is implied by the grammar (SP and not 1 * SP).

https://tools.ietf.org/html/rfc7230#section-3.1.1

"A request-line begins with a method token, followed by a single space (SP), the request-target, another single space (SP), the protocol version, and ends with CRLF."

I'm surprised there doesn't seem to be any widely-used and easily available HTTP conformance checker - unlike the well-known HTML validators.

This is also why monospace fonts are ideal for seeing small but significant differences like this.


> I'm surprised there doesn't seem to be any widely-used and easily available HTTP conformance checker - unlike the well-known HTML validators.

There is one called Co-Advisor [1] that can be used to test web proxies. It is commercial and pretty expensive, but the online version might be free for open source projects. Squid and Apache Traffic Server are tested with it [2][3]. There was a USENIX talk that showed some Co-Advisor results [4]

1. http://coad.measurement-factory.com/details.html

2. http://wiki.squid-cache.org/Features/HTTP11

3. http://trafficserver.apache.org/acknowledgements

4. https://www.usenix.org/conference/lisa12/rolling-d2o-choosin... (at 31:16 in to the video).


That's an interesting idea. It would be useful to have a Web server where the output is just a conformance check of the request. That might be a fun project for a rainy day :)


Sounds like something that could be added to http://httpbin.org


True. The only problem is that you would have to test requests only to a particular endpoint. It would be nice if you could test all incoming requests. Then you could do things like modify your DNS so any requests go to the testing server and you could see the output.


That runs on Python/Flask, which is already a layer of abstraction above where HTTP conformance testing would be; what you need is something that listens on a TCP socket and parses the requests itself.


Actually, thinking about it, didn't Zed Shaw make a Ragel-based strict-conformance HTTP parser?

> Simply being more explicit about what is valid HTTP means that most of the security attacks that worked on Apache were rejected outright when tried on Mongrel.

Which I guess is a qualified "sounds like it, maybe?"


This proves a very important pet peeve of mine: Your modern application has a highly dynamic operating point. There is no way you can deploy a system and expect it to be static for eternity. Back in the day with low interconnectivity you could. But today it is impossible.

When you build stacks on top of system for which you have no direct control, you must be able to adapt your system. This means you can't statically deploy code without an upgrade path in one way or the other.


You're combining two issues.

Yes, if you let other people run your infrastructure, you are beholden to their operations decisions and schedules.

It is not impossible to design around that (new) problem, but it is sometimes expensive.

The trick is to know what external dependencies you have, and that is almost impossible to fully quantify in the XaaS and cloud model.


True but that doesn't bother me. Nothing is static on the web these days and everyone plays under the same rule set. Keeps things interesting...


It shouldn't bother you. It is just how moderns systems are.


Cowboy is quite a well respected we server of the Erlang flavor. I'd guess heroku rejiggered something in their stack, perhaps adding cowboy as a reverse proxy or load balancer in front of their junk.

Cowboy apparently shot yor no-good dirty sidewinding web requests in the face.


It is well known that you can't (should not) rely on bugs (or internal APIs)


It's technically correct, according to the HTTP spec there must be a single "SP" character between the elements in the Request-Line:

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Source: http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1


Another broken network device which takes it upon itself to mess with TCP connections passing through.

I ran into this a few years ago with Coyote Point load balancers. It turns out that if you send HTTP headers to a Coyote Point load balancer, and the last header field is "User-agent", and that field ends with "m" but does not otherwise contain "m", the connection does not go through the load balancer.

Complaining to Coyote Point produced typical clueless responses such as "Upgrade your software". (The problem wasn't at my end, but at sites with Coyote Point devices. Fortunately, I knew someone who had a Coyote Point unit, and we were able to force the situation there.) I had our system ("Sitetruth.com site rating system", note the "m") put an unnecessary "Accept" header field at the end of the header to work around the problem.

Coyote Point's filtering software is regular-expression based, and I suspect that somewhere, there is a rule with a "\m" instead of "\n".

A current issue: there are some sites where, if you make three HTTP requests for the same URL from the same IP address in a short period, further requests are ignored for about 15 seconds. You can make this happen with three "wget" requests. Try "wget http://bitcointalk.org" three times in quick succession. Amusingly, this limiter only applies for HTTP sessions, not HTTPS.


That series of strcat's caught my eye as bad practice. Fine in this case since the destination string is short but horrible in general. Every single one of those calls needs to iterate over the entire existing string to find the string size. The code could be much cleaner with a small macro hiding the incrementation and the casts.


Basically it's an O(n^2) algorithm... well-known story about that here:

http://www.joelonsoftware.com/articles/fog0000000319.html

The design of strcat() itself is partially to blame for this - the return value could've been more useful, like the number of characters in the resulting string or a pointer to the end of the appended string so it could be used to chain concatenations, but instead they chose to return the exact same pointer that was passed in as the source.


A sufficiently smart compiler could optimize a string of strcat calls to remove the redundant length finding. I have no idea if real compilers actually would....


Java does, for the "+" operator.


Oh yes, nice example! So there's precedent for C compilers doing something similar.


Yeah- implicit concatenation + snprintf seems like the way to go. Although you'd have to calculate a length, I suspect avoiding that is the primary virtue of this approach.


What's the deal with all the scrollbars on this page?


Yeah, why is every image, heading, and paragraph on that page surrounded by scrollbars where most don't work and are not necessary?


In their CSS, they have a rule for every <p> tag to have "overflow: scroll" for some reason. Not sure why they didn't use the default value for overflow, since there's nothing on that page that needs to be specifically told to scroll.


Probably, they tried it and it worked.

This incidentally led to the bug that they're blogging about too.


Yeah, it's likely that they're all using Macs and have never checked their website on a Windows machine.


Yup


What browser are you using? I'm not seeing that here on my devices...? We are using pretty vanilla Bootstrap.


http://i.imgur.com/v0b1LKC.png

This is what I see in Chrome on OSX.


Are your devices Apple devices by any chance?

We had a similar bug reported at work recently, and it turned out that Windows browsers will always show scrollbars but the ones running on OS X/iOS will hide them until you start scrolling.

To turn them on in OS X, go to System Preferences > General and set Show scroll bars to "Always".


The issue is you've got the following CSS on lines 31-33 of blog.css:

body.blog .blog-post p { overflow: scroll; }


I initially thought that it was because of a single whitespace character... but it's because

p { overflow: scroll; }


Fixed. Thanks.


It scares me to think all of these requests run over unencrypted HTTP.


Why? it's just pizza


Pizza has been used as a tool of harassment in the past. People order lots of pizza from different places for the victim, who then has to deal with a bunch of angry pizza drivers and being black-listed from those pizza places.

Pizza drivers are often the victims of crime. Not only for the small amounts of cash that they carry, but sometimes just for the pizzas.

edit: I should say that my comment here was a kneejerk reaction to "it's just pizza", and has nothing to do with how eatabit.com deals with this kind of harassment. i agree with other commenters that blog posts like these are a great way to promote the company.


How are you going to exploit the fact that these pizza orders aren't encrypted to achieve either of those things?


that's the wrong question to ask.

We're not (all) in the "think-of-things-to-do-with-stolen-information" business like so many others are; but many of us are we're in the "encrypt-all-the-things-so-that-information-isn't-stolen" business.


Encrypt the pizzas.


PIZZA IS SRS BSNS


We use Twilio so we know our customer's phone numbers. If there were ever an issue like this, we could easily assist with a request by law enforcement.


Something something $1 SIM card from eBay.


Yes, it's just pizza - no ccards or personal information is transmitted - except your choice of toppings. Could that be used to profile you? "...paging Dr. Freud"


But aren't the API credentials are transmitted there as well? You don't care if your API is compromised?


Kudos for sorting this out quickly. Problems like this one can be really difficult to debug.

I remember one case where the coefficient table for a polyphase FIR filter we implemented in an FPGA caused huge instability problems in a design. The coefficient table, if I remember correctly, was 32 wide (32 multipliers) and 128 phases long. That's 4096 numbers. The design had about 40 of these tables that would be loaded from firmware into FPGA registers in real time as needed. We built a tool in Excel to be able to compute these tables of FIR coefficients.

We got word from a customer that things were not behaving correctly under certain circumstances. We were able to reproduce the problem in the lab but could not find anything wrong with the FPGA, microcontroller or Excel code after about three weeks of work by three engineers. This quickly became a nightmare as it threatened several lucrative contracts and failed to service our existing customer base adequately.

I had to put our other two hardware engineers back to work on their existing projects so I took on the debugging process. This was the most intense debugging I've had to do in thirty years of software and hardware development. Lots at stake. The very reputation and financial well being of my business was at stake. Enter 18 hour days, 7 days a week.

FOUR MONTHS LATER, at 2:00 AM on a fine Sunday morning without having slept for three days looking at code the bug jumped out at me. We've all had that moment but his one was well "one of those". The problem? We used "ROUND()" in instead of "ROUNDUP()" in calculation that had nothing to do with the FIR filter coefficients but rather affected the programming of counters related to them. This caused timing errors in a state machine that drove the FIR filters. If this were software this would be exactly like having the wrong count in a loop counter. Yup.

I re-calculated after making the change and everything worked as advertised. That was the best Monday I've had in years. And I took a long vacation after that.

Over four months to find a bug.

That's why sometimes it is impossible and even unreasonable to create budgets for software development. One little bug can set you back weeks, if not months.


Way to abuse :first-letter.


Assuming the problem originates from something relating to eatabit's infrastructure, the important takeway (for me) would be: Depend as little on 3rd parties as possible.

I know this is not a popular opinion among the HN crowd, mainly due to the entire web's love of linking to some other site's js/css to offload cost from their own site. But this makes no sense; you're not really reducing costs, you're just delaying them.

People talk about how 3rd parties speed up development or (potentially) reduce costs. But if the success of your business depends on providing a service all the time that has to be reliable, the reliability of your product is directly proportional to the reliability of the 3rd party. And each 3rd party adds additional points of failure. If you don't control whatever service or product the 3rd party is giving you, you will be unable to even attempt to isolate and fix it yourself.

Typically the answer to this problem is 'buy a better service contract'. But if the 3rd party doesn't provide 24/7 365 support along with multiple contact methods and harsh penalties for failing to supply you with timely service, you're wasting your money. You don't want to be the guy who has to tell the CIO "Sorry, I can't get a hold of our service provider or they aren't giving me timely updates, so I do not know when our product will be up again."


> Depend as little on 3rd parties as possible.

This attitude has many a startup reinventing and supporting commodity infrastructure instead of focusing on developing unique products and value for their customers.


When learning OCaml, I decided to write a little web client that would bruit force the password on my own home router. I wrote a client, and my router wasn't responding, so I tried having my client fetch pages from Yahoo, and it worked fine.

I fired up wireshark and saw that everything looked fine... except that all of my line terminators were shift-in-formfeed instead of carriage-return-newline. It turns out that OCaml uses decimal character escapes instead of octal. (This was back when I was under the impression that portable code avoided use of \n in string literals because someone who misunderstood text mode file handles had told me that Microsoft compilers expanded \n to \015\012.)

Apparently someone at Yahoo had experienced enough terribly terribly written web clients that they wrote their HTTP server to accept any two non-space whitespace characters as a line ending.


"our cellular printing api has printed over 9300 food orders for our client restaurants, stadiums and golf courses"

Am I the only one who read this as a system using 3D printing to print food? Disappointed to discover it's not that kind of cellular.


Tangentially, why didn't curl escape the trailing space to %20?


I experienced a similar problem with a POP3 utility that I had written years ago. I had been appending an extra space to the end of each text line (before the CRLF ).

There were a few people using this utility with no problems until one day a particular POP3 server no longer tolerated my utility's malformed requests.


I have some advice. Hire a real C programmer. This code is _awful_ and probably full of vulns.


I've had the same issues when developing with Flask in Python. I forgot to URL encode some query parameters and it worked fine with the local HTTP server.

But when I put nginx in front as a proxy, it denied all requests.


The thttpd webserver doesn't handle requests with too many slashes either, which I only found out recently

This is treated as an invalid request:

      http://example.com//robots.txt


Unless I'm reading RFC 3986 incorrectly, that's valid because you can't have an empty segment in the path part of a URI.


I think you're reading it incorrectly.

You can have an empty segment in the path. The BNF for a segment is:

    segment       = *pchar
Which according to RFC2234 section 3.6 means zero or more repetitions.


But then the server may still decide that an empty segment is so meaningless that it will refuse it.

In fact, it would not be a smart move to just treat double slashes the same as single ones, because of relative URLs: a ".." segment only removes one slash, so the hierarchy levels would get messed up. thttpd is doing the smart thing here.

As one of my teachers at university would say: the empty segment is also a segment.


The server can of course interpret the path as it wants, but it should allow an application running under the server to give 'foo//bar' a meaning if that application wants to, IMO.


True. I was writing about the case when the URL simply mapped to a file system location. Applications should be able to apply their own interpretation.


Agreed.

(The problem in my case was just stupid spiders that were crawling my sites.)


Yes, it's a valid URI but //robots.txt is different resource to /robots.txt. It seems thttpd is probay doing the right thing.


The difference between `path-abempty` and `path-absolute` is bloody confusing but I think you're right.


Slightly off-topic, but this is why dev posts like this are important. I didn't know eatabit.com was a thing, it it sounds like a great service.


Dev #2 here thanks for the compliment. Where are you? Maybe we should expand to your area. :)


Buffalo, NY. We're quite proud of our local restaurant industry. So, yes, you should!


If this were my team, I would be unsettled by the fact that we never caught it in testing. Did no one write tests to exercise this part of the app - the one where we're handcrafting HTTP requests?

Objectively, you need to write more tests. At the minimum, this bug should have a regression test so that it can never accidentally happen again (say when a dev merges an old branch in for whatever reason).


What test would you have written to catch this? One that checks the exact contents of headers passed along? It's possibly they even had tests around this, but were expecting the same output that they were inputting (copy+pasta). Perhaps they had a more "integration"-ee test that actually hit the web with that bad header. At the point they wrote it, that test would have been passing. It wasn't until the parsing server changed (to Coyote, it seems) that the test would have started to fail.


Yes, I would have written a test to confirm that input_a generates output_b. The first half of that function is nothing but a string builder and easily testable. If they were copy-and-pasting the actual output to get the expected output, then yes: they screwed that part up.

I'm far from a TDD purist, but it's clearly true that they're not sufficiently validating their code. If they had been, this would not have happened. I'm not saying this as an attack on their skills as programmers, but as caution to others reading the story: you have to - have to - test your stuff.

It's one thing to lean on third-party libraries and expect them to mostly Do The Right Thing, especially if they're popular and come from a culture of valuing test coverage. If you're writing a Rails app, for instance, you might be forgiven for not writing your own independent validations of the Ruby methods you call. But writing string-building code to implement RFC-defined network protocols? You should have some confidence that your program is generating the output that the other party will be expected. Especially with something as commonly proxied as unencrypted HTTP; you just have to assume that your data will be traversing and analyzed by systems 100% outside of your control.


At first I was thinking that suggesting that you exactly check the output of a request might be a bit much, especially since it could be entirely variable and cause your tests to break at any point during refactoring. If that was done by a third party framework, as you point out, you might not get a whole lot of value from testing its output. However, if you're constructing your own HTTP requests, as seem to be, then yeah, you probably need to explicitly check that it is being built up correctly. Or, since this appears to be a single build up, and not common/shared functionality, it could probably be abstracted into a common function/utility that does it for you. That should be easily unit-testable. Fair enough.


>Objectively, you need to write more tests

That is precisely the opposite of objective. Personal thoughts, feelings and opinions are subjective by definition. 2+2=4 is objective. "You need to put more cheese on that pizza" is subjective.


"We don't have enough test coverage to ensure that we're generating RFC-compliant output" is objective. That's not a personal though, feeling, or opinion. If you wish, you may generalize that to "we don't have enough tests to catch an error that made it into the shipped product".


That would only be objective if it were possible to have enough test coverage to ensure that. But that is not possible. So it is purely a subjective question of how much test coverage person A thinks is "good enough".


Are there any languages out there that handle scale and many connections like Erlang does, but with an easier to swallow syntax?


Erlang. The syntax really isn't that bad, once you get over the initial shock. In all honesty, grasping that the variables are immutable and how you need to change your thinking is much more difficult than the syntax itself.


http://elixir-lang.org/

It's the Erlang VM you love, but with the Ruby syntax we all enjoy!


I've been very happy writing these things in Scala using Spray. Honestly there are plenty of event-driven I/O frameworks in many languages, and almost as many green-threading systems. The Erlang supervision system and the ability to replace code on the fly, not so much.



q


i'm just glad my city made it to HN.


Charleston represent!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: