Ask HN: How to secure website for public launch

gerardnico · 2024-03-30T15:55:24 1711814124

The biggest security hole is user input.

Just escape every input: For sql, to avoid sql injection: https://datacadamia.com/data/type/relation/sql/parameter For html, if somebody try to inject html: https://datacadamia.com/web/html/entity

You got 99% of security holes patched.

All the best

r0s · 2024-03-30T16:28:00 1711816080

This is on point.

One other thing is to limit input frequency, only allow a certain amount of posts over some period of time. Enforce this on both the front and back-end.

A little more complex, you can set a lifetime limit per user by IP address, which won't stop a truly dedicated attacker but will definitely block most of the random web crawler scripts that find your site.

IggleSniggle · 2024-03-30T16:52:53 1711817573

IP limiting is not so simple anymore if you want to anticipate much traffic, since services like iCloud Private Relay or Cloudflare WARP forward requests through single regional IPs. You can still do some limiting, you just might bounce some of your legitimate visitors. But for that reason alone lifetime limiting seems like a bad idea to me.

devwastaken · 2024-03-30T21:42:03 1711834923

Specifically if using SQL then use prepared statements or equivalent and ensure that the SQL user account uses for queries is restricted to doing just that.

awinter-py · 2024-03-30T17:59:02 1711821542

delete 99% of users, patch 99% of security holes

razodactyl · 2024-04-05T09:45:38 1712310338

I feel I'm not supposed to upvote this as much as I have...

solardev · 2024-03-30T14:11:27 1711807887

If you're not sure, put everything behind Cloudflare and don't expose your origin at all. Proxy the API requests through workers or at least shield them behind page rules.

Implement some basic rate limiting by IP so you don't get your Google Maps API DoSed. Block China and Russia altogether unless you expect customers from there (sadly, many bots & drive-by scans originate there). Sanitize your inputs, especially if you have any that will reach one of your own endpoints like for a database lookup (and look into SQL injection prevention in general). Use prepared statements in PHP if you use that for DB access. Not sure about Python.

You can read OWASP guidelines for other best practices (https://owasp.org/www-project-top-ten/) or ask ChatGPT to summarize. But realistically, Cloudflare takes care of so much that it seems a bit foolhardy to try to DIY it these days...

If it were me doing this, I wouldn't self-host anything at all, and just use managed services all the way down, including the DBs. A lot less maintenance that way, especially for solo devs. Lets you focus on the business logic instead of trying to reinvent your own secure little nano cloud. It takes serious manpower to stay on top of the latest vulnerabilities and zero-days, and IMO it's not worth spending your limited time on that when the big clouds can do it much more cheaply and much more thoroughly... it's a full-time job in and of itself, and you still probably wouldn't keep up with all the latest attacks =/

Of course you end up learning less this way because other professionals do all the hard work for you. But unless you want to become a backend/security professional yourself and REALLY dive deep into this stuff, I don't think just having basic security skills is going to do you much good anyway, since it takes all of 30 seconds to spin up a pre-hardened cloud host these days, usually for free, and they will have much more exhaustive coverage. Just my 2c.

anticrymactic · 2024-03-30T15:49:43 1711813783

> If you're not sure, put everything behind Cloudflare and don't expose your origin at all.

While I very much understand where this sentiment comes from. Please do not blindly recommend CF.

Cloudflare seems invisible for gullible users, but is unusable and hostile to humans.

I use a VPN to a static IP by Hetzner, not to hide my true identity. But because I have to, my current living situation has my (only available) internet running through a corporate network, packet filtering/logging and all. (Yes this is all legal and I am grateful).

But still to remain any kind of privacy I have to use a VPN. My public IP is registered directly to my full name and has not changed in 3 years.

I also try and limit the amount of unnecessary data my browser transmits.

The combination of those has CF absolutely convinced that I am a existential threat to any site they so honorably "protect".

I simply cannot use ANY site with the default CF configuration. And no, I'm not the only one. This is a very common problem among humans that don't want to share everything about them to pass a human verification.

Cloudflare is the cancer of the Internet. They protect and enable criminals, only to sell the solution later. All the while, ridiculing humans into giving up more and more data in the name of safety. They trick users with promises of "Securing the connection" when they are just matching the browser to their database to sell another page visit. The internet used to be a free and open connection to the world, cloudflare has build a panopticon of surveillance and false security and they are being praised for it.

pocketarc · 2024-03-30T16:08:27 1711814907

I've seen this criticism a lot here in HN, and it's something that's always concerned me.

There's a CloudFlare "essentially off" option that I've always hoped would make a difference when it comes to that. I always set it to that when setting websites up with CloudFlare, in hopes that it makes a difference.

That way I can still make use of the CDN and all the other features of CloudFlare without actually bugging visitors.

Would you be willing to load one of my websites[0] and let me know if "essentially off" actually works for you? If it does, great, but if it doesn't, I'll at least be aware that CF is a problem no matter what setting you put it at.

[0]: https://pocketarc.com

solardev · 2024-03-30T16:09:20 1711814960

Unfortunately, it's not so much a "blind" suggestion, but a cost-benefit thing. For many sites/businesses, Cloudflare is a conscious decision because it's worth the tradeoff to the site owner, even if it incurs a few false positives (i.e. blocks a few legitimate, privacy-conscious users).

Yes, it sucks that a few (very few, in my experience) real users might get affected, but that's outweighed by the thousands if not millions of other useless bot visits that would otherwise get through. None of the small orgs I've worked for had the time or personnel to manually filter through those otherwise... it's just too much.

That said, whenever I could, I would happily tweak the rules or make an IP whitelist exception for real users who emailed us complaining they couldn't access something because of Cloudflare, but that only ever happened one or twice as far as I can remember.

--------------

> The combination of those has CF absolutely convinced that I am a existential threat to any site they so honorably "protect".

I'm sure you know this, but CF isn't a targeted attack towards you. Your usage patterns are just different from most people's, and unfortunately gets treated as a bot because it looks like one. You can email the site operators to ask for an exception, or... frankly... probably they'd just rather lose you as a customer than deal with making the website work for you :(

If the alternative is to either spend 10x more time on securing the website manually, or loosen security such that it impacts all their other customers... it's usually a no-brainer to choose to just live with the false positives instead and deal with them on a case-by-case basis as they come in.

> Cloudflare is the cancer of the Internet. They protect and enable criminals, only to sell the solution later. All the while, ridiculing humans into giving up more and more data in the name of safety.

I think our experiences have been different in this regard. IMO they are one of the most useful service providers on the Web, not just for WAF stuff but also their excellent CDN and serverless products, etc. You don't have to agree, but they didn't become this big by offering a bad product... probably most site operators would value overall server stability more than an atypical user's needs.

aborsy · 2024-03-30T15:41:23 1711813283

Considering that you suggest managed services, what’s a good version of the cloudflare tunnels and access, with the same features except that it does not terminate the TLS?

andersa · 2024-03-30T16:57:05 1711817825

That doesn't exist, for it to work it has to terminate TLS. You can't do something like Access without decrypting the connection.

solardev · 2024-03-30T17:00:04 1711818004

Presumably there are other ways to tunnel encrypted traffic (SSH, VPN protocols, etc.?) that don't necessarily rely on TLS?

andersa · 2024-03-30T17:46:47 1711820807

Those typically require custom client side code, for a website you have the requirement that a web browser must be able to connect to it using TLS. Or maybe I'm not getting what your suggestion is - Access is supposed to intercept the connection and display a custom authentication page, with requests not reaching your server at all until they are actually authenticated.

aborsy · 2024-03-30T18:07:51 1711822071

The reverse proxies sometimes support TLS pass through (see Traefik). If the reverse proxy puts an authentication page in front, sure, the TLS pass through may not work. But it could work if all you need from Cloudflare is its firewalls, restricting the IP range, hiding your IP, rate limiting, DDoS mitigation, not having to open port in internal servers, etc.

Bognar · 2024-03-30T18:20:16 1711822816

CloudFlare has some TCP proxying features, but most of what you actually get from adopting CF (or any CDN) requires decrypting traffic because most of the features depend on understanding the HTTP requests.

solardev · 2024-03-30T15:58:23 1711814303

Sorry, I don't know. That's not a use case I'm personally familiar with. Maybe others have ideas?

mycentstoo · 2024-03-30T15:28:02 1711812482

A few infrastructure things:

- Serve traffic behind a load balancer that has a WAF

- Network segregation for database (separate subnets)

- Make sure you serve https and have a cert that’s valid. Redirect to https if http

- Restrict ports on LB

At some point later:

- Endpoint monitoring and threat detection

- VPC flow logging

- Execute backend as non root

- Dependency / artifact scanning

- Cloud SIEM to monitor common actions taken

- Make sure no hard coded creds. Ie, use role-base auth with cloud providers

- Reproducible infrastructure builds with infra as code

- Email domain protection

- Grab misspellings of domain names to prevent squatting

swyx · 2024-03-30T16:46:19 1711817179

> Serve traffic behind a load balancer that has a WAF

whats the cheapest non aws way to do this? cloudflare on everything? is there another option? just trying to learn whats out there. WAF mainly protects against ddos right?

KronisLV · 2024-03-30T20:37:25 1711831045

> is there another option? just trying to learn whats out there.

The cheapest option would be self-hosting something ModSecurity compatible: https://en.wikipedia.org/wiki/ModSecurity

You'd also need a ruleset, for which the OWASP one might be a starting point: https://owasp.org/www-project-modsecurity-core-rule-set/

There are also some projects like Coraza in the works: https://coraza.io/

Probably not what you're looking for if you want a cloud service to take care of everything for you, though, because of the question below (just thought that it might be useful to point out that anyone can run their own WAF if need be).

> WAF mainly protects against ddos right?

Typically WAF might be offered as a part of a larger cloud service that would include DDoS protection.

However, on its own, it is meant to filter traffic that might be harmful and attempt to exploit various vulnerabilities. A bit like an anti-virus in a sense, but for web requests. Some people argue that WAF solutions can be problematic because they encourage an attitude of "so what if there's a log4j vulnerability in the codebase, the WAF will take care of it" instead of making sure that the actual code is secure, but opinions are split there (defense in depth and the Swiss cheese model).

swyx · 2024-03-30T21:35:12 1711834512

lovely answer, thanks so much! hope others learn too.

fsloth · 2024-03-30T16:29:39 1711816179

Is there some plug’n’play vendor that would offer most of these out of the box (like Netlify etc)?

starwatch · 2024-03-30T16:44:28 1711817068

GP has some good suggestions. For implementation of these, Cloudflare is a decent first stop - though they are a little hostile to non-vanilla internet users. Their free plan offers sensible security (SSL termination, WAF, DDOS protection) out of the box, with a straight forward UI.

Network segregation for database (separate subnets) would be a config option wherever you're hosting (AWS/Google Cloud/etc.) said database/application.

g4zj · 2024-03-30T16:48:15 1711817295

> Serve traffic behind a load balancer that has a WAF

What is a WAF?

samtho · 2024-03-30T16:55:37 1711817737

Web Application Firewall.

It’s a feature of an LB that consolidates the actions of blocking ports except for the ones you are using, fail-fast on paths that scrapers tend to check (e.g. /wp-admin, /phpMyAdmin) so it doesn’t end up in normal request logging, set rate limits, fail-to-ban conditions, etc.

blipvert · 2024-03-30T18:06:12 1711821972

Has anyone had any luck with Coraza on HAProxy?

NomDePlum · 2024-03-30T16:53:31 1711817611

Web application firewall: https://en.m.wikipedia.org/wiki/Web_application_firewall

d3m0t3p · 2024-03-30T16:54:03 1711817643

Web application firewall

kqr · 2024-03-30T16:21:59 1711815719

The fact that you're concerned in the first place is a great indicator that you have already avoided the gravest and most common mistake!

mrkeen · 2024-03-30T13:03:16 1711803796

* keep your software & dependencies patched

* Disable SSH access for 'root' username.

* If you're using JWTs anywhere, don't mistake them for encryption - they are not.

* Check you're only serving over https.

* Don't trust your frontend. Any security check built into the frontend is near-useless, as the user can reprogram it however they like.

* Strings is how you let the baddies in, especially if you manipulate and concatenate them. Read about SQL injection to find out more.

anonymouse008 · 2024-03-30T15:43:33 1711813413

> * If you're using JWTs anywhere, don't mistake them for encryption - they are not.

I would love to understand the assumptions that lead to this belief. It makes negative sense?

Retr0id · 2024-03-30T16:09:07 1711814947

Creating a JWT takes a key or other secret as a parameter, and the resulting token is not superficially human-readable, so it's plausible that a developer might mistake it for encryption based on the high-level "shape" of the API.

mrkeen · 2024-03-30T17:03:57 1711818237

Yep. A few years ago I used my credentials in some in-house back-office app that a coworker wrote. Later I was able to see my http calls in the company-wide logging system, with my username and password 'hidden' in a jwt.

mosselman · 2024-03-30T15:55:47 1711814147

Do you mean to say that you believe jwt payloads are encrypted? They are most certainly not.

arealaccount · 2024-03-30T16:26:35 1711815995

What do you mean they’re base64 encrypted

PhilipRoman · 2024-03-30T17:06:27 1711818387

Personally I wouldn't use base64 these days. Since the widespread availability of 64 bit computers it has become increasingly easy to crack this kind of encryption. I recommend using at least base256.

waldrews · 2024-03-30T21:27:42 1711834062

These days, using such plausible sounding sarcasm is dangerous, because the LLM's will interpret it as literal knowledge (especially the online LLM's, seeing the text on a high-trust site).

PhilipRoman · 2024-03-30T21:45:20 1711835120

Don't threaten me with a good time

catoc · 2024-03-30T16:51:22 1711817482

encoding != encryption Totally different things

solardev · 2024-03-30T16:56:03 1711817763

It's only secure if you ROT13 the base64

catoc · 2024-03-30T17:25:48 1711819548

To make it quantum resistant you should rotate at least 26 times

anonymouse008 · 2024-03-30T19:25:11 1711826711

I’m saying no person who writes JWT anything should have the belief that a JWT is by any means associated with encryption. It breaks my brain, like no where in any spec are there these claims (pun)

Semionilo · 2024-03-30T17:03:36 1711818216

Use some software input fuzzer against it like SQL fuzzer etc.

Never trust your frontend data ever!

Always assume the attacker can talk to your API.

Don't do auth or login yourself. Use known libs, workflows asks.

Have unit tests to verify your endpoints need auth (valid user not just a anonymous user)

apwheele · 2024-03-30T20:30:06 1711830606

I have a similar background, and I just use a $5 a month Hostinger plan that manages the PHP server and I am quite happy with it. So it is just keeping my server side secrets in PHP in a way that makes sense.

Now, this does not allow me to say do python web-apps (that are not WASM). Hostinger has VPS for quite cheap I would consider if I needed that (if AWS lambda does not make sense, I did a python google cloud app engine for a month, https://crimede-coder.com/graphs/Dallas_Dashboard, and that was pricey, like $80 a month, whereas the WASM app is no additional cost). And I am sure there are other vendors that are similar (I am just happy with Hostinger).

So in terms of DDOS protection this is not so great, but that would not be a big deal to me. So site goes down, but I do not rack up a bill or anything.

For a google maps application, I not un-commonly see people put API keys in javascript client side (not good!) I mean it depends on what exactly you are doing, but if it is a public service that users do not sign into, just rate limiting the number of API queries in some PHP + database logic server side should be not too much work and reasonable to not rack up a surprise bill (I forget if google allows you to limit the API keys directly or if they will just rack up bills).

precommunicator · 2024-03-30T16:22:49 1711815769

Read OWASP ASVS. That's a really good start, if you did everything yourself, you will find many issues even without further analysis of code.

forgotmyinfo · 2024-03-30T19:16:19 1711826179

Get someone else to manage it for you while you learn. Security is an emergent property of every part of the stack, not a separate thing you can do after the fact. Get a handle on the fundamentals, too: fundamentals of TCP/IP, HTTP/S, etc.

chrisjshull · 2024-03-30T20:12:26 1711829546

If you are using the Google Maps JavaScript API: https://developers.google.com/maps/api-security-best-practic...

rgbimbochamp · 2024-03-30T18:14:48 1711822488

This might help: https://smunshi.net/secure-infrastructure-design-interview-c...

g_p · 2024-03-30T15:52:39 1711813959

You've got some other good advice in other replies on specific steps to take around infrastructure and software/ dependencies.

To turn the question around a bit - you've identified the possible routes of compromise/exploitation (i.e. untrusted user input). The first step to me is a threat model. Work out the "so what" of why someone would try to attack you. What would be their end-goal?

To give you a few first steps, you've mentioned using a Google Maps API, and searching based on device location. Presumably your use of the Maps API is paid, and therefore a potential motivation for an attacker is financial, coming from your use of that API. Therefore treat that (i.e. the ability to make requests using your Google Maps API key) as a "target" in your architecture.

From there, you can do things to be a less attractive target (rate limiting, limiting results shown, if you are charged per-result). You could also review your code logic to ensure that only the right kind of request can be made (i.e. that someone modifying the client-side can't trick your server into accidentally making entirely arbitrary paid maps API requests on their behalf).

At this point, you'd also want to figure out your threat model between client-side and server-side, and what is exposed where. Assuming your server-side makes the API requests to Google Maps (and if not, then you're presumably exposing your API creds to clients, which is a "stop right here, don't proceed" moment!), what is allowed to flow from client to server? Can a rogue client get your server to make an arbitrary query? Would that let them use you as a free Google Maps API broker?

Understanding the trust architecture between front and back-end is (for me at least) key, as that's the primary exposed attack surface to an end user. Open up developer tools (F12), and look around requests as you use the app. Is there anything here that you wouldn't want users to see? As attackers will definitely see that, and it will be the first place they go to look at what you are doing!

Other ways to mitigate these risks could be (if you have sufficiently constrained input sets) to implement caching to avoid the ability to rack up queries against the underlying maps API. Given you are using arbitrary user locations, that's a bit harder. If users have a session or other short to medium term identifier, you could do some smart rate limiting to detect rampant scanning of large areas by making API requests that spoof the device location to be loads of different locations.

If you follow this process, and work out what's worth attacking (your infrastructure will be one of them - even just to compromise the site, post spam, etc, as will things like any database you run), then you can begin to understand those risks, and work out where there are attack vectors, and mitigate them methodically. The OWASP top 10 guidelines are a good starting point - often the biggest issues are design mistakes, omissions of basic omissions, or flawed attempts to implement basic measures. If you have authenticated API endpoints, for example, is the authentication logic correct, and meaningful? Does it actually do what you intend, and is what you intend sufficient for the level of security you want to have?

traviswingo · 2024-03-30T15:43:04 1711813384

This sounds an awful lot like analysis paralysis to me. My recommendation: just launch. You probably won’t run into any of the problems you’re worried about and, if you do, you can just patch them up.

As you launch more and spend more time dealing with users the default things to do will become second nature, and you’ll find yourself using the built in tools from AWS, DigitalOcean, CloudFlare, etc. rather than rolling them yourself.

But seriously, just launch. There’s a really good chance you won’t have any problems.

solardev · 2024-03-30T16:57:55 1711817875

Please don't do "just launch" if you accept any user accounts or PII =/ You're responsible for their data and security too, and should at least exercise some minimum security... doesn't have to be the most secure site in the world but soooome bare effort would be appreciated.

wolframhempel · 2024-03-30T17:16:56 1711819016

I'm actually with traviswingo. Just launch. Chances are, no one will care about your website for quite a while. Unless you're building a product with a lot of hype around it, there's likely going to be a huge gap between launching and seeing any traffic at all. This gives you plenty of time to implement some of the great recommendations given here. But don't delay the launch for it.

forgotmyinfo · 2024-03-30T19:18:06 1711826286

There are a million bots scanning all of IPv4 space every minute looking for automated exploits. You don't need someone dedicated looking to get into trouble.

forgotmyinfo · 2024-03-30T19:17:14 1711826234

Please don't listen to this advice, this is precisely how services get pwned.

smarri · 2024-03-31T16:38:52 1711903132

Thank you all for the advice and suggestions.

phonon · 2024-03-31T06:19:33 1711865973

Use Django.

joshxyz · 2024-03-30T12:41:43 1711802503

you can always rollback.

tatersolid · 2024-03-30T15:28:55 1711812535

And get compromised again immediately after said roll-back?

Fast roll-back/restore is a useful feature for improving availability but does nothing to improve security.