I’d like to learn more about backend development but most of what I find while searching are results like “Build a nodejs backend in 20 minutes” or “learn backend development in an hour.” I think part of the problem is I don’t know enough to ask the right questions.
There are a number of concepts that are worth learning about at a high-level if you want to learn about building large-scale projects. Most modern/large companies use some/all of the following to build their backends:
If you dig into any of these there's a ton to learn, especially around looking into the underlying technologies used to build these higher-level systems.
There are also more conceptual things that are part of building/maintaining backend systems. These are a bit fuzzier, but I would say are also as important as the specific technologies used:
- Reliability
- Monitoring
- Observability
- Error/failure handling
- Migration strategies
- Data normalization/denormalization
- Horizontal vs. vertical scalability
This is by no means a complete list, but these terms are enough to get you in the right ballpark of ideas and start learning. I think highscalability.com is a great place to read about how other companies have built backend systems to solve specific problems. They have a massive list of quality articles written about various backend systems at scale.
Cron jobs are the definition of the anti pattern of treating servers like pets and not cattle. You have to worry about that one non redundant server running your cron jobs. There are others ways to skin the cat, but my favorite is Hashicorp’s Nomad. I like to call it “distributed cron”. Together with Consul for configuration it’s dead simple to schedule jobs across app servers - the jobs can be executables, Docker containers, shell scripts, anything.
These days I think of "cron" as more of a type than an actual implementation. When somebody says, "I need a cron job" my answer these days would never be, "ssh into www1 and add it to the crontab." It would be, "create a CronJob type in Kubernetes."
But I agree and concede, a user with zero back-end experience will just google "cron" which will take them to a crontab example, so they will likely be mislead into the anti-pattern, as you said.
I agree that is definitely the way to go at some scale. :) I really just put cron on there as an example of how someone might think about scheduled jobs as most of the more advanced things are conceptually similar to cron, but where you don't have to worry about where your job is actually running and how the environment was setup.
I think it is worth noting for any of the systems above that there's a spectrum of possibilities around how much you automate/offload the management of them, as well as plenty of backend systems for managing those.
That is only true if your jobs aren't doing a stateless operation. I use cron all the time on cattle VMs. No reason to cargo cult extra stuff into the mix.
I think the parent poster was referring to the fact that Cron jobs can act purely on local state, and that's OK.
For example, if I had a traditionally deployed (i.e. not in K8S / a PaaS / similar) backend app that accepted file uploads, then passed those off to something else, I'd be streaming the uploads to a temporary holding directory on disk. I'd then have a CronJob that clears stale items from the temp dir. If the server fails, that's OK that the CronJob didn't run.
There are still plenty of use cases for traditional cron.
In the case of Nomad, you run it as cluster of three. But if one of my app servers went down in the middle of the night, when I was using Nomad, the next day I might notice a degradation of performance but everything still ran.
The server going down was never really the issue though honestly. The issue was usually a process taking more CPU/Memory than expected in that case Nomad could intelligently schedule jobs based on available resources across the fleet of app servers.
These days with AWS, I don’t use Nomad, I just use CloudWatch and for the processes that aren’t Lambda based, I use autoscaling groups with the appropriate metrics for scaling in and out.
That also means if a server goes wonky, I can just take it out of the group for troubleshooting later and another instance will automatically be launched.
Yes jobs that run unattended via chron should have logging and watch dog processes that check for completion and possibly do reconciliation - that can alert if something goes wrong and the logs help diagnosis.
> Cron jobs are the definition of the anti pattern of treating servers like pets and not cattle
What's old is new again. Much of clean distributed systems development is now built on, what is essentially, scheduled period operations. They're pretty much the least complicated ways to loosely couple domain logic in distributed systems that follow eventually-consistent semantics. They're also a good model for the functionality of many distributed scheduling systems like Kubernetes, AWS' ECS Scheduled Tasks, and more.
Kubernetes even goes so far to have Jobs (batch operations) and CronJobs (scheduler that creates Jobs).
I use AWS’s native services most of the time like CloudWatch to schedule jobs, lambdas, and step functions these days, but to keep the post generic, I mentioned Nomad/Consul that I have used in the past for an on prem implementation that kept us from having to bring Docker into the mix. Since we were already using Consul, using Nomad just made sense.
That’s not cron’s job. Different jobs have different requirements and tradeoffs. Ownership of retry logic and distribution design belongs with whatever cron kicked off.
Learning the cron format is still necessary / useful - most distributed deployment systems have some form of schedule job running. Even on a purely serverless model like AWS Lambda, it’s still possible to do distributed crons. I actually think the word ‘cron’ has already been repurposed - when using it my team actually refers to the distributed version, not the per server version.
Yes: reliability, monitoring, and error handling were the types of things I’m looking for more information on. Do you have any recommendations for more information on these topics? I should have clarified that my question was geared towards important concepts agnostic of languages/frameworks/etc. This is a great list of further reading, thank you.
Also what does observability mean is this context?
"Also what does observability mean is this context?"
Something went wrong, and now your site is serving 500 server errors to everybody at the rate of 25,000 per minute. The ops team already tried "just reboot it" and it didn't help. How are you going to figure out what is going on and fix it?
It's (mostly) too late to add anything, so all you've got is the logs you already had, the metrics you already had, etc. That's the "observable" stuff in a system. There's an art to recording what it is you need to know, while at the same time recording so much that you can't find what you need in the mess.
(The "mostly" is that if you have a good enough setup, you might be able to bring up a new system and route some very small fraction of traffic to it to examine it more intensely in real-time with a debugger or something, though in my experience, on those occasions I've had the opportunity to try this, it's never been a problem that would manifest on a new system receiving a vanishing fraction of a percent of the scale of a production box. But maybe you'll get lucky.)
You certainly want to do everything you can to not be in that mess in the first place, but it won't be enough. You need a system sufficiently observable that you can find the problem and find some sort of solution.
Oh thank you, I didn't know that was referred to as "observability" I thought it was just logging. This article from Etsy's engineering blog [1] was part of the inspiration for this question. Funnily enough when I googled "Etsy engineering logging" the 5th result was for a position on Etsy's observability team.
When it comes to "measure everything" I've found services that have clients that already grok popular frameworks to be a godsend. We use NewRelic and it's abilty to automatically insturment all rest apis and db transactions is delightful. I could not imagine going back to having to do it manually or guess what information might be useful later.
I haven't read it personally, but I've heard good things from others and looking over it briefly the advice there lines up with what I've experienced in practice.
For some of these concepts - take a look at what Envoy + Istio , linkerd (and other service meshes) are trying to solve and conceptualize: load balancing, auth(n/z), monitoring, logging, etc.
I'm a bit terrified that no one had security as their first item on the list. Many answers here are great and contain a lot of important concepts, frameworks and tools. But all of these are meaningless unless you have a strong spider sense for security. Not just the OWASP top 10, or top 100. But also criticizing your own business logic, not leaking information to the client side, not pushing things to git that should not be pushed. How to securely store passwords in the database. How to handle DDOS. How to prepare for the worst case, limit blast radius, all this while not hurting your productivity, as well as the end user's.
In the official sense, Security also includes availability (see the CIA triad), so a large portion of the bullets others have mentioned focuses on that.
I think a security mindset (including availability, which leads to thinking about disaster recovery, performance, redundancy, high availability, distributed systems as a mechanism to achieve it etc) is the first aspect that I look in a backend developer.
Other things are important, but security mindset in my opinion is the first layer in the foundation.
Others on my list that may be helpful to deep dive into:
1. HTTP, REST, know it well.
2. GraphQL as a complement/alternative to REST.
3. If using relational databases, learn what is a N+1 selects issue and how to solve it
4. If using NoSQL databases, learn about the CAP theorem and understand the tradeoffs etc
5. Learn to avoid premature optimization. Measure and profile before jumping to conclusion on theoretic bottlenecks that don't exist.
6. Unit test all the things, learn how to mock and what to mock, learn the difference between unit and integration tests.
7. Invest time in good design, read some other open source projets, see how they organize the code, what packages, what modules. Learn about dependency injection, Inversion of control.
8. Learn some cloud patterns, such as exponential backoff, throttling
9. Know everything about cookies, localStorage, XSS, CSRF, JWTs, and session cookies, stateless vs stateful architecture etc.
10. DevOps: Look into containers and serverless, CI/CD
11. Multithreading if you are in a language that has them.
Security is absolutely important. You’re right that most programmers don’t consider it and even my CS degree programming never emphasized security. However, I did a cybersec emphasis so I always consider the security implications of a system at the design stage. It’s significantly easier to build a secure system than it is to retrofit an insecure one.
One of the most important concepts is the principle of least privilege which is rarely ever discussed. Every tutorial I’ve seen, even paid ones, (this isn’t to say all of them) give their app master DB credentials. You could vastly improve your applications security just by leveraging your DB engine’s native access control.
How much of the typical application server authorization logic can we put into the database? Is it possible to setup authorization on a cell-level?
Example: All users may change their own passwords, but only team admins may change the passwords of other users in their team.
I have never attempted this particular use case but, as far as I can tell this isn’t possible. While it is possible to grant privileges to change another user’s password, you would have to either make the team admin a super user (this grant all possible privileges) or by granting the Create Role privilege (which would allow them to create a new role with privileges they aren’t intended to have and switch to that role) so neither of these options are really any good.
A clarification on my previous comment:
The way I’ve used Postgres RBAC is to create roles for each service in my application that needs DB access. For example say you have a service (in my case typically a lambda function) that only ever reads data from the DB and only from specific tables. I would create a role that only grants SELECT privileges for those specific tables. This also disallows UPDATE, DELETE, ALTER etc. privileges. Then assign this service that role. This mitigates the possible damage done if that service is compromised and shrinks your applications attack surface.
Spot on. Security should be baked into the way you work. It should be baked into an formal CS education too. I have joined teams full of seniors and "leads" who on paper should be miles ahead of me in terms of experience and competence and have had to teach them the basics you mentioned. It immediately creates a stressful overhead of constant auditing, nagging and unnecessary stress in the team when security is introduced after the fact.
Yep. I've worked in on a multi-tenant product in which I've seen senior engineers write APIs that would have allowed one tenant (enterprise customer) to access the data of another tenant!! Insane.
My perspective, having sold a company: security isn’t that important until you’re big. When you’re small you’re a small target, your resources are better spent on features and sales, and if you get hacked you can change your name and pretend you’re a new entity. I know this isn’t the ethical answer but it is the realpolitik answer. Invested maybe 50 dev-hours in security over 4 years.
No matter what you're going to read here (and there's all good stuff), number #1-3 on my list are simple requirements engineering:
- Understanding in depth what you are supposed to be building, as directly as possible from the stakeholders. Nothing is worse than building the wrong thing perfectly right.
- Coding that understanding into a data model (read: database schema) with great, clear naming, correct entity relationships and as many constraints as possible (including correct types) early on.
- Understanding the front end part that will serve your application and how it interfaces with the users. Only that will give you the details on how to build the API that will serve it.
Iterate these three points until you're done with the application skeleton and only then write the first line of code. If there's anything that will make you insanely productive as a backend developer, it's these crucial first steps of requirements engineering.
It sounds like you're just getting started, so I would say: pay attention to the "seams" between layers/modules/components/machines/etc. Especially these days when we are building higher and higher abstractions, you will understand a lot better if you examine how things interface with each other. You will learn more deeply & more correctly, and it's a great way to debug problems. For instance:
Find out how to make your ORM log the SQL it's running, and try running that in your database directly. If you are having trouble, get the SQL working first, and then figure out how to make your ORM generate that SQL.
On the web, use View Source and/or Inspect Element to see what is really in the DOM.
Look at the Javascript you're serving. Is it what you think it should be? Maybe the problem is in your webpack config or caching or something.
If SCSS is giving you trouble, look at the CSS it's generating. Is it what you expect?
If there is a problem submitting a form, look at the Network tab to verify the right things are being sent. Understand how HTTP works here. Try putting the same request into `curl -v`. (Browsers even have a "Copy as cUrl" command nowadays.) If that looks okay, see if Rails is turning your submitting into the right params (or whatever your framework is).
One nice thing about checking the seams is you can debug via binary search: first see if the problem is in the browser or the server. Next see if it's in your app or the database. Next see if it's in your controller or your model. Etc.
Inspecting the DOM, client side javascript and generated CSS are all front end concerns, not stuff for back end development, even if many of us do both.
> Find out how to make your ORM log the SQL it's running, and try running that in your database directly. If you are having trouble, get the SQL working first, and then figure out how to make your ORM generate that SQL.
Time would be much better spent learning SQL properly. ORM's have their place, they can cut down a tonne of boilerplate, but anyone using an ORM should be somewhat competent in SQL already. If you learn SQL from ORM output then in many cases you'll be learning what not to do.
This. Visibility into what is actually going on, and the drive to understand it. Troubleshooting, debugging, learning: all the same thing, but tarred with different parts of the process. Dig into what's actually going on and understand the layer under you.
tl;dr: Keep digging, because everything's worth knowing at some point.
It's generally much easier to fix a bug in your own code or infrastructure if you're familiar with the innards of the things you depend upon directly, and learning your preferred platform beneath the current frameworks is a solid advantage because it helps you move sideways when the next New Framework comes along.
But there's also the conceptual side, the general concepts which have remained unchanged for decades, which a detailed knowledge of a specific platform (eg. Java, .NET, Node.js) doesn't necessarily aid with. These probably won't even feel relevant on a daily basis, but will definitely multiply your effectiveness over the long term.
Some things which might fall into that category:
* SQL and relational databases (especially if you're not using SQL, because you'll be reinventing it for any nontrivial amount of data)
* Consistency models and distributed systems (capabilities and limitations; some things can never be waved away by technology)
* The operating system, and principles of its design (not necessarily in any great detail, but knowing roughly how software talks to hardware is very useful, especially when assessing the marketing blurb for any cloud platform)
* Complexity theory (stringing three 'optimal' algorithms together might be textbook, but often you can apply some domain knowledge to cut out a step or two to reduce actual runtime and maintenance overhead)
I've found stuff like graph theory, principles of compiler design, and a working knowledge of assembly code to be useful too, but that's probably the far end of the bell curve for web server stuff :P
Computer Science is a subset of mathematics and it's amazing which bits of pure maths turn out to be useful in my day job. Shame I'm not very good at that stuff... but reading enough about it to recognise when it's applicable can mean the difference between 'picking a good library off the shelf' and 'hacking together a terrible solution myself'.
It sounds easy ("derp, back-end is server, front-end is client") but it's really quite a mind bender when you think about it. Take a given piece of website code in a given language. Where is it being executed? Some "front-end" techs (Node, Webpack, Uglify, etc.) will be executed invisibly on the server. Some "back-end" web techs (cookies, redirects, OPTIONS requests, TLS) enable some invisible client behavior. Some (PHP) are dual-purpose, executed in two stages, some (conditional comments) are hacks that exploit this duality. Some (OAuth2) are orchestrated dances between back-end and front-end. It's really not so simple when you think about it.
It gets even more complicated because your server is often a client, too.
Modern backend code will usually connect to databases and call REST APIs. The databases themselves might be part of a cluster. Moreover, the backend code is rarely exposed directly to the public; there's usually a proper http server sitting in front of it, and there might be another load balancer or CDN between the user and your http server. Every connection has a client and a server, and sometimes the same component plays both roles. Remembering which role(s) each component is playing in any given context is very important for security, not to mention debugging.
I'm not sure what OP meant. But for Java, your app runs inside an application server like Tomcat which handles the low level networking for you, I believe this fulfills a similar role as web servers fill for other languages like PHP.
Basically these severs provide a separation of concerns and are built to be more scale-able than a naive application listening on a port for requests.
Postgres has been my go to DB for my projects. It's fast, simple and has decades of research and battle tested systems to pull knowledge from.
Your comment reminds me of a PyCon 2017 talk by Raymond Hettinger where he details the improvements in Python dictionaries from 2.7 to 3.6 which effectively amounted to rediscovering and re-implementing what was standard in databases decades ago.
I can wholeheartedly second natdempk's concept list, but would like to add these. They should provide more than enough inspiration.
---
The failure modes; what they are and how they manifest: read contention, thundering herd, cascade, ...
Mitigation strategies and where to apply them: throttling, backpressure, load-shifting, graceful degradation - and of course how to signal the upper layers that these are taking place.
Decoupling reads and writes. (Search for material on CQRS and see where the rabbit hole takes you.)
Error handling and tracing. This is particularly devious, because by definition error path is the unhappy path. It will be more expensive when hit, and has to spend more time serialising data. See also read contention, thundering herd and mitigation strategies.
Learn and understand the differences between: telemetry, instrumentation and monitoring.
Those are great things to know. I wish a lot more people I worked with knew and understood them.
However, I don't think they're "must know concepts for back end development". Most backend development is small in scale: a few servers, maybe more, a database, maybe something running cron/equivalent jobs or a queue worker, maybe some caches. Not much besides that. While people learning to develop systems at that scale might incidentally learn some of the above, I don't think most of those concepts will be useful until people are designing systems at much greater scale. Informally, many of them are present in small systems: e.g. when your cache server is down but connections take 10sec to time out, that's a form of backpressure, technically, but not in a super formal or useful way to understand the concept.
In sum, I think those are important (essential, even) areas of knowledge for intermediate-experience back end developers, or developers looking to increase the scale of their infrastructure or projects (or work on very large-scale applications), but I don't think they're in the must-know, 101-level tier of back end knowledge. People can and do build successful, stable, easy-to-work-on back ends without knowledge of any of those things. In fact, this is the norm in our industry.
The lack of knowledge of these areas is not a significant handicap (to productivity, understanding, or code/output quality) until you get beyond small scale. Most software projects do not.
> I don't think they're in the must-know, 101-level tier of back end knowledge.
Fair enough, I admit I am looking at things from behind glasses tinted in a certain way.
But I do believe some kind of familiarity of the concepts is essential. Even at relatively small scale. Read contention in particular is really easy to hit the first time you have to deal with an increase in traffic (doesn't need to be external, could be a minor change that triples the number of cross-service calls). Simply because the first observation is that worker systems are running hot, the instinctive reaction is to add more workers.
The two core problem areas in backend development - which in this context can mean anything that requires non-trivial server side processing - are I/O latency and processing capacity. Regardless of scale. Being unaware of (or worse, ignoring) them is not tenable.
The O'Reilly book Designing Data-Intensive Systems is a good overview of a lot of the concepts mentioned in other comments, and most things you come across in typical backend development and operations. If that doesn't answer your questions, it'll certainly point you in the right direction.
I've been looking at this book the past couple of days; have you read it yourself? If so what did you think?
Edit: I decided to go ahead and buy it since it was pretty cheap on Amazon. The table of contents had a lot of the information from the other comments. Thanks for the recommendation!
It does a great job of explaining the underlying ideas behind different databases and data processing systems. Even if your needs are met by, say, a traditional RDBMS like PostgreSQL, it’s helpful to know about the alternatives and what problems they solve. It’s the sort of book I wish I had read years ago; I’m sure you’ll be happy you picked it up.
You won't be disappointed. The book is packed full of useful content, and I found that the presentation of that content is the sweet spot of technical and approachable.
I am currently reading this book and it gives clear explanation about each concepts that you have to consider in Designing database for your application. Definitely one of the best reads that broadens your outlook
I'm a backend engineer with 10 years experience, and now a team lead. Everything I learned is through experience and piecing stuff together... This looks to be the book I'm missing! Thanks.
Surprised it hasn't been mentioned yet... API design. (Perhaps people consider it middle-ware of sorts?) Anyways, your API is basically the interface to your back-end and goes hand-in-hand with your database as the core components of your data model, which in turn, is probably the most critical aspect of most software.
My company asks API design question in interviews (in fact, I designed and ask this question). We don't expect candidate to know anything about HTTP, REST and databases (if the candidate is junior). We have a question prompt that essentially summarizes what candidates need to know to solve the problem. Then, we give them an example API and a real life problem. We expect them to design the API between app and backend using RESTful HTTP requests, and for extra points they can design database structure too (how to store data coming from APIs) (and how to make it efficient, e.g. using which indices etc) Candidate has full freedom and I try to see if they can (1) read a technical document and understand (2) design a structure and process to solve a problem (3) can they collaborate with me to improve their ideas. In practice I thought this question is pretty useful. There are some hickups though (like if candidate is clueless about what REST is you need to give them 5 mins so they can read and understand it, which causes some silence, which I don't think is ideal) but so far we thought it's a good question.
If by REST you mean full on HATEOAS then I definitely would not use that as an interview test. Advocates tend to be religious in their intensity, yet few real world APIs are full REST in that sense. Discussions around HATEOAS could be enlightening but requiring people to buy into it not IMO.
I would structure interview questions around APIs rather about distributed systems/transactions, circuit breakers, etc. - practical aspects of running multiple systems talking to each other through pipes.
I let candidate design an API for a problem we solved in real life (and compare their API to the API and db I designed). It really isn't about HATEOAS purism, we chose REST simply as an abstraction candidate should operate in, but they're free to break some principles if they like (for caching etc we already break it). We're mindful of these and are very lenient with candidate's answer. As long as they can show signs of being able to solve the problem in real life, they're good to go. I should note that we use this question mostly for junior candidates, and this is the very final round of interview so all candidates will be asked ~6 45min questions and this is only one of them.
Build everything to be as easy as possible to debug, because everything else you can do at work after a nice cup of coffee, debugging is the only thing you might have to do at 3AM while drunk.
So:
- Set up some kind of logging, by default use rsyslog and set it up so your logs are available somewhere other than the machine where something is running. You will be surprised how often something breaks and takes the server down with it so you can't log in.
- Later on, when you have to debug stuff, READ THE LOGS. I can't explain it, but after years of working on production systems I have noticed that almost nobody actually goes back and just reads the logs when things go wrong. You will find the problem spelled out in the logs 99% of the time. YOU CAN'T JUST GUESS WHAT WENT WRONG.
- Use transactions properly.
- If you use an ORM, do whatever you need to do to keep track of how many actual queries are done. You are going to find that you do some order of magnitude more queries than you thought. Learn how to give the ORM hints to avoid this.
And work on your logs. Tweak the info/error/debug levels so that the log files read nice. Too often we find that a common error/exception will spew hundreds of lines of shite into the log files over and over again, obfuscating the real information. Take pride in having meaningful, concise but informative log files.
YES! At one company I was at their web service framework would log thousands of lines of nonsense for every request, most of it errors about not being able to connect to services that no longer existed or thousands of lines of debugging that were never relevant to anything. Make sure you are only logging sensible things (usually this means when you see something being logged for no reason or which is no longer relevant, fix things), and if you want to log a ton of garbage, set things up so you can tell the difference and throw out the junk.
Yeah blindly using ORM query builders can result in a bunch of N+1 query issues, and retrieving unnecessary data, which can impact the database load. Good advice
You seem like a novice and I think a lot of the other answers here are focussing on fairly random/advanced concepts that are probably a bit difficult to takle in a meaningful way without getting your head around the following:
1) Get a grasp of what a "back end" _actually is_. When learning- be mindful of what you are going to tackle, and what you are going to leave alone so that you don't get overwhelmed. (In heavier technology environments, you don't really talk about "back end development" since the back end is subdivided into so many separate disciplines. Therefore "back end development" tends to be something that smaller web shops use to encompass everything that is going on on application servers- DONT FEEL THAT YOU HAVE TO BE AN EXPERT IN ALL OF THIS STUFF)
2) Have a reasonable grasp of all of available hosting platforms and their relative pros and cons: Windows/Linux, cloud/local, aws/axure, etc.
3) Get familiar with the lumps of code and/or services that you need to make an application server available to actually do "a thing".
4) Get familiar with standard software development workflow- version control, bug tracking, testing and deployment.
5) Develop a healthy attitude towards application security- understand that it takes a lot of knowledge and effort to make an application 99% secure. Learn how to do what you can with the resources you have at your disposal. On one end of the spectrum you need to know stuff like "never store a password in cleartext", on the other you have to be aware of essentially unsolvable problems like 0-day exploits and social engineering.
6) Get good at communication and teamwork. A backend, by its very nature, needs to talk and get along with other systems so therefore a back end developer needs to talk and get along with other humans.
If you really want to understand the fundamentals, look into writing a toy HTTP server from scratching starting with sockets. The language you choose to do this in doesn't matter quite as much, but bonus points if you use C and the POSIX API directly.
Once you have accepted a connection, you then have to decide how to handle multiple concurrent connections. Do you use threads? Sub-processes? select/poll?
You can then move on to reading/writing to shared state. Do you use memory? Files on disk? A SQL DB? If using threads/processes, is concurrent access to those resources done safely?
My #1 thing would be learning databases and data modeling - most of the time I've made mistakes at work has been because of a bad data model. Most of the time I've made slow software was because I lacked understanding of SQL queries.
Most of the crap software I have worked on has been bad because of a poor data model. I don't think of database design / data modelling as particularly hard, but working on other peoples software, some people make a really bad job of it.
In addition to development, there is always a tone of work related with the system that cannot be avoided unless paying big bucks. Do not forget the basics for operating systems:
there are thigns that are absolute basics:
- bash/linux scripting, understanding
- usage of basic linux commands, grep, awk, sed...
- the basics of posix (stdin/out/err pipes, stderr)
- Monitor your system: observe CPU/network
- Configure firewall, ssh services, cronjobs, set up a systemd job
And from here how to deploy anything with the stack/framework which is popular this year, kubernetes, docker or ansible or whatever infrastructure it is in the backend.
Going to give the advice I would have gave to my younger developer self:
General:
- Use a debugger. Stepping through code or existing code is an incredibly quick way to grasp what is going on.
- Exercise healthy paranoia. Expect your code paths to fail. How will you handle failures?
- Code hygiene. Legibility of a codebase makes me respect it a lot more. I feel that I subconsciously handle it with more care when well written.
- Interfaces. Whether your writing an API, a data access utility, etc. -- think about the design in your current and other contexts.
- Language. Whatever language you work in Node, Ruby, Go, etc. know it and its quirks well.
Backend specific:
- Docker. It really changed the way I work. I think the learning curve can be a little steep, but once you're comfortable it leads to much more productive development that is ready for production.
- Load testing. You should be able to identify potential bottlenecks in your application and decide whether they're worth fixing or not.
- Observability. Think about how you will profile and gather metrics of your application when deployed.
- DevOps. A lot of workplaces will expect their SE engineers to help with or even be DevOps. It's a big surface area, so I would focus on knowing the parts of the stack that make your application work (containers, databases, virtual networking)
- Pushing work elsewhere. Sometimes you're going to run into a problem that can't be solved by code changes alone. A basic understanding of how to push workloads elsewhere like message queues / serverless functions and the patterns associated with them. For example, a user uploads an image to your server and you need to compress it.
> "- Docker. It really changed the way I work. I think the learning curve can be a little steep, but once you're comfortable it leads to much more productive development that is ready for production."
I know it’s the big lock in boogeyman on HN, but learn about at least one cloud service - not just how to host a few VMs. Learn about their hosted, managed versions of open source software, managed network infrastructure (load balancers, autoscaling, etc.) and even proprietary services.
You can’t come up with a real scalable fault tolerant solution if you don’t understand the underlying infrastructure and cloud hosting is the fastest way to do it.
There are a lot of parts and roles involved with back-end development. It should be of no surprise that each response here is unique. You could make an entire career by developing deep specialization in just a part of what is discussed. Since you've said "back end" then that means you are considering client-server scenarios, potentially of the web application variety. If this were true, it's reasonable to feel confused and possibly overwhelmed by information. Because of this, many beginners have started with popular web application framework-platforms, such as Django or Ruby on Rails, and grown from there. This is a tried and true approach to learning back-end web dev.
The concepts that must be known are those that apply to a role. For instance, a "web application developer" will likely be a generalist who is able to build an entire web app back end but may not have have ever opened the hood and rebuilt "the engine" or "the transmission" (car metaphors, not literal parts). A popular web framework will have a reasonably designed public API that abstracts away the complexity under the hood.
This is a tough question to answer. What constitutes "back end" has changed significantly over the last 5-10 years - an Amazon AWS infrastructure engineer does vastly different day to day work from a small startup engineer.
For you specifically, I'd recommend picking up a web development framework like Ruby on Rails. It will teach you every aspect of building websites: Interacting with databases, writing server endpoints, creating front end web pages, user authentication, deployment, and probably version control. I would consider all of these things to be the bread and butter of typical "back end" engineers (except for maybe the front end stuff.)
From there, you can broaden your knowledge in any direction that interests you. If you like building interactive applications, you can look into front end frameworks like React or Vue. If you want to focus more on back end, you can learn more about relational databases (Head First SQL is a great beginner resource.) Lots of directions you can go.
Agreed, for me learning Django covered a very broad area with and pointed me in the right direction regarding best practices. Deploying it is difficult enough that you will need to learn a about servers (its not desperately difficult but you will need to understand various things to get it running). I would suggest making a point of learning SQL properly understanding how to optimize queries.
I feel the pragmatic approach would be to start with "what happens when I type a URL in a browser and hit enter?". Set up a simple HTTP server in any language and log what the browser actually sends you - the various header fields as well as the body text. From these fundamentals, you can increasingly build up an understanding of how to serve clients - layer by layer.
I appreciate this sounds like trivial advice, but I've seen countless developers (including myself initially) that e.g. start with Ruby or PHP and don't understand what happens before their script is called, what receives their request, how paths are mapped, how this magical $_POST object is populated and so on.
What I'm supposed to suggest is more applicable to relatively more complex and B2B applications but:
As a backend engineer, what you create is basically the core of the product. Data structures and relations you define will be the limitations of your product.
Coming up with intuitive and good data structures for a complex application can actually be a very challenging task and as a backend engineer most of my time is not spent on writing js or sql, but on product design and specs, so I can create something that is stable and scalable, yet not too rigid, considering that iteration is part of the process and everything you create might be subject to change, specially in startup environments.
Data migrations make my palms sweaty (not schema migrations but the kind of migration that requires rewriting large amounts of existing data). Picking the wrong data structure is a fast path to data migrations.
Database migrations. Especially zero downtime multi-phase schema migrations. Rather than make all changes at once, do smaller incremental changes to allow old code, and new code to worth with the database at the same time, before removing old code, and finally removing the parts of the database schema which is no longer used.
>We change the schema in a way that is still compatible with the old version of the schema. Sure, this won’t work in many cases, but in some.
In my experience, doing DB migrations w/o downtime is rarely worth it and involves big risks of actually ending up having long unplanned downtime due to the process being prone to errors. Large majority of schema changes can be performed very quickly and couple of seconds downtime is acceptable in most cases.
Long story short: Don't do DB migrations without downtime unless you absolutely need to.
P.S. If your business guy is requesting deployment w/o downtime, make sure to have a conversation with him to understand why is that so. In majority of cases they make it more dramatic than it really is.
TBH get comfortable with writing a web server in your favorite language and a popular framework e.g. flask (python), go (gorilla/mux), nodejs (express). Then try using it to make calls to an external system like a database. As a backend developer you should have a good feel for the data flow through the application stack.
There's nothing special about backend development, its mostly about the "plumbing" between systems. Lots of de(serializing) (e.g. json, protobuf) messages into objects and validation of those messages
I don't think this is the original source (I seem to remember a blog post with better diagrams), but this post [0] does a good job introducing pieces you need as your number of users increases. It is specific to Amazon, but the concepts are universal.
1. By definition, on the Web every application is a distributed system. Even the simplest HTML-only Web page with just one visitor involves at least two separate machines -- the server, which will at the very least run a HTTP server, and the client, which not only needs to run a browser, but also your HTML code (and possibly CSS, Javascript etc) will be parsed executed (parsed) in real-time. This means that all of the Web development -- back-end and front-end needs to take into consideration all the caveats and gotchas of distributed systems (including, but not limited to, the eight fallacies [0]).
2. Many Web frameworks -- Django and Ruby on Rails come to mind -- treat the database as an integral part of an application. In practice, however, the database in most cases runs as a separate service, more often than not on a completely separate machine. It is useful to view a database as just another part of your distributed system, with a very clear responsibility (persistence of state), that is accessed via a special API (SQL).
There are a lot of good answers in the comments. One thing I always do from the beginning that I see a lot of people forget is to decouple the API classes from the Database model.
For example, I use Java Spring a lot in combination with Hibernate. I always have a layer with my ORM classes mapped to the database, and then in my API layer I will have lots of mirror classes that initially look very similar to the database classes with a conversion step from one to the other. You can use libraries like ModelMapper for the tedious conversions.
With this setup it is easy to evolve the database schema without impacting the API and vice versa. This is something that (in my opinion) some web frameworks like Ruby on Rails do wrong by default.
- Learn about modeling the data of the application you're gonna work on. You're going to be very sorry if you picked a non-relational database for a relational model.
- Avoid NIH: There's a very good chance that the code you're writing could instead be imported from libraries/frameworks that already offer it. Avoid the overhead of reinventing the wheel and focus exclusively on business logic (and sometimes even that might already exist in some oss projects you can use).
> - Avoid NIH: There's a very good chance that the code you're writing could instead be imported from libraries/frameworks that already offer it. Avoid the overhead of reinventing the wheel and focus exclusively on business logic (and sometimes even that might already exist in some oss projects you can use).
I partially agree. Sometimes it's better to reinvent the wheel: you got better control and less risks.
Well, I'm not saying that people just put whatever first library they found as a dependency. We should review different options and pick the most mature that adjusts to our case.
Ultimately, software used and contributed by many people in a community is far less likely to have risks than one a single developer cobbled together to deliver a feature as fast as possible.
> How can I properly quote someone on HN?
You can't, I just use the markdown syntax and hope for the best =D
Make sure you really learn TCP and the unix toolchain for digging into network problems e.g. tcpdump. It's always important to isolate things: is it the latest push causing things or is it network problems?
Read through the OWASP top ten web app vulnerabilities, and make sure you are--at least--familiar with them. https://www.owasp.org/index.php/Top_10-2017_Top_10 You're not paranoid: highly motivated criminals smarter than you are plotting against you trying to crack your systems.
I really like the discussion this thread initiated. Are there places (websites/ forums) where such discussions are the most common ones, and they might do in details in the topics mentioned here?!
It starts with a hard problem like “we need to predict the price of plane tickets”. If you don’t have a hard problem or serious scale, just use whatever works and forget about the backend.
Depends on what you want to do. Php and MySQL, available on any shared hosting provider, can take you far. In 2008 I wrote an online food-delivery platform that served a busy college town. Plain old php and MySQL. It's still running today.
I cannot upvote this enough. Not for the Php and MySql specifically, but for the general, sadly unfashionable good sense to keep things simple; to steer clear of buzzword-driven development and complexity for its own sake, which seems to be the bane of so many projects these days.
I agree with keeping things simple however I found that Django provided me with an opinionated enough way of doing things, that it gave my application a decent architecture and didn't end up a complete mess. It pointed me in the right direction for many things where free form PHP might not have had much structure.
- Load balancers
- Web servers
- Caches (eg. Redis, memcached)
- Databases (relational, non-relational, document)
- Search datastores (eg. Elasticsearch, Solr)
- Log/event/message processors (eg. Kafka)
- Task queues/task processing libraries
- Periodic jobs (eg. cron)
If you dig into any of these there's a ton to learn, especially around looking into the underlying technologies used to build these higher-level systems.
There are also more conceptual things that are part of building/maintaining backend systems. These are a bit fuzzier, but I would say are also as important as the specific technologies used:
- Reliability
- Monitoring
- Observability
- Error/failure handling
- Migration strategies
- Data normalization/denormalization
- Horizontal vs. vertical scalability
This is by no means a complete list, but these terms are enough to get you in the right ballpark of ideas and start learning. I think highscalability.com is a great place to read about how other companies have built backend systems to solve specific problems. They have a massive list of quality articles written about various backend systems at scale.