Hacker News new | past | comments | ask | show | jobs | submit login
CWE Top Most Dangerous Software Weaknesses (mitre.org)
155 points by dlor on July 13, 2023 | hide | past | favorite | 128 comments



It's somewhat disheartening as a software developer focused on security that the top four elements are still:

* Out-of-bounds Write

* Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

* Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')

* Use After Free


The gap from knowing what a CWE is and actually knowing, on code level, how it manifests and how you avoid these things is very large. Given how much the software industry has grown in the past 10 years it's not particularly surprising.


> and actually knowing, on code level, how it manifests and how you avoid these things

You avoid them by using tools that make it difficult or impossible to introduce such vulnerabilities to begin with. Such as modern, memory safe programming languages.

For many decades, carpenters have been educated about table saw safety. But what finally stopped thousands of fingers getting chopped off every year was the introduction of the SawStop, and similar technologies.

Safety is a matter of using the right tools, not of "taking better care".


> For many decades, carpenters have been educated about table saw safety. But what finally stopped thousands of fingers getting chopped off every year was the introduction of the SawStop, and similar technologies.

Afaik the technology isn’t widespread and there are still 10s of thousands of injuries per year.


You mean tecnhology like bounds checking, invented during the 1950's decade, with the creation of Fortran, Lisp and Algol, and every other language derived from them, with exception of C, C++ and Objective-C?


And why the whole world wrote so much code in C, C++ and Objective-C when bound checking existing long before these languages without boundcheck?


It started like this,

"Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own."

-- https://www.bell-labs.com/usr/dmr/www/chist.html

Then source tapes with an almost symbolic license price for its time, and a commentary book did the rest.


with bounds checking, out of range index still trigger exception or runtime error. Many of them results in DoS.


Much better than silent data corruption.

Then there is the whole issue of making it more interesting to look elsewhere instead.

When a door is locked I can still break in by throwing a rock to the window, yet most people do lock the door nonetheless, while most thieves only bother to break the window if there is anything actually valuable in doing so.


Yeah at least in the US, it looks like tablesaw accidents that put people in the ER are about as common as they were 15 years ago. I have a buddy who just lost 6 months work because of a tablesaw accident.


Also SawStop doesn't prevent kickback, one of the other major sources of injury from a table saw.


Wow! SawStop is incredible tech. the blade stops within 5ms. That's insane.


Those two things aren't mutually exclusive. I'll bet a non-trivial number of XSS and SQL injection vulnerabilities came from people disabling input and output sanitation on solid frameworks and libraries because they didn't know why they shouldn't. Tools won't solve all of your problems-- you need knowledge, diligence, and tools that make doing the right thing easy.


> I'll bet a non-trivial number of XSS and SQL injection vulnerabilities came from people disabling input and output sanitation on solid frameworks and libraries because they didn't know why they shouldn't.

I will take this bet.


Searching Google for disabled sanitation "vulnerability", the first two hits are articles admonishing developers to not do it, and the third is a CVE, CVE-2023-1159, from a month ago that affects WordPress installations on which the developer disabled unfiltered_html, which is it's built-in sanitation functionality.


Memory safety won't stop you writing SQL queries or dynamically generating HTML that accepts unsanitised user input.


You're right. Those things are stopped by other tools, such as query builders and web frameworks.


>Those things are stopped by other tools, such as query builders and web frameworks.

No. All tools can be used with an improper attitude which leads to the creation of weak points.

The proper way is to have a deep understanding of the role of design rules.

A programmer who does not pay attention to design (very basic principles of the design process) can create a good game, and even if this game contains weaknesses the risk related isn't a reason to not use it. The same programmer when creating critical infrastructure software is a source of potential nightmare.

Unfortunately, software business accepts such specialists for projects both of kinds. Why? Who knows? Perhaps because of legal regulations? Why when an engineer designs a car they don't try to "Move fast and break things"?


> Why when an engineer designs a car they don't try to "Move fast and break things"

They do when they design submersible or rockets


C is the table saw of programming. C++ is the band saw.


I've used both saws and both programming languages. I still don't know which is worse.


XSS and SQLi can happen independently of the memory safety of your chosen programming language. You can use relatively safe frameworks or ORMs to generate HTML and interact with your DB, but there will sometimes be complex use cases that require you to extend or otherwise not use those safeguards.

Similarly, I imagine that there are cases where someone needs to do complex wood working tasks that involve dangers which are a less obvious than with a table saw.


I agree 100%, but in reality most people work with the language they are presented with.


XSS is a great example of that. On paper a ton of people know exactly what XSS is and does. In practice... simply don't allow user-controlled input to be emitted unescaped, ever. Good luck!

The reason XSS (and CORS) are tricky is because they fundamentally don't work in a world where a website may be spread over a couple different domains. I get a taste of this in my dayjob where we have to manage cookie scoping across a couple different region domains and have several different subdomains for different cookie behaviors. It's easy to be clean on paper up until you need to interface with some piece of software that insists on doing it its own way - for example the Azure excel embedded functionality requires the ID token to be passed in the request body, meaning you have to pull in the request body and parse it in your gateway layer (or delegate that to a microservice)... potentially with multi-GB files being sent in the body as well!

It's super easy on paper to start from greenfield and design something that is sane and clean, bing boom so simple. But once you acquire a couple of these fixed requirements, the cleanliness of the system degrades quite a bit, because that domain uses a format that's not shared by anything else in the system, and it's a bad one, and we can't do anything about it, and now that's a whole separate identity token that has to be managed in parallel.

Anyway, you could say that buffer overflow or use-after-free are kind of an impedence mismatch for memory management/ownership in C. Well, XSS and CORS are an impedence mismatch for domain-based scoping models in a REST-based world. Obviously the correct answer is to simply not write vulnerable systems, but is domain-based scoping making that easier or harder?


Great examples. 1) You have to deal with your own complex systems where it becomes difficult and 2) you have to deal with external complex systems which enforce bad practice on you. One can see how it becomes borderline impossible not to slip once in a while.


Two of those four are things there's no need to make easy to do by mistake, but two popular programming languages choose to do so anyway and they reap the consequences.

Actually the SQL one is arguably in that category too, to a lesser extent. Libraries could, and should, make it obvious how to do parametrized SQL queries in your language. I would guess that for every extra minute of their day a programmer in your language must spend to get the parametrized version to work over just lazy string mangling, you're significantly adding to the resulting vulnerability count because some of them won't bother.

Bonus points if your example code, which people will copy-paste, just uses a fixed query string because it was only an example and surely they'll change that.


I feel there would be some value in SQL client libraries that just flat out ban all literals.

I know it's the nuclear option, but decades of experience has shown that the wider industry just cannot be trusted. People won't ever change[1], so the tools must change to account for that.

[1] Unfortunately, LLMs learned from people... so... sigh.


Our industry is ageist and anti-intellectual. These are the symptoms of those.


While I agree that the software industry suffers from ageism and anti-intellectualism, these vulnerabilities are actually the symptoms of elitism, cargo culting, and traditionalism, which it also suffers from.


Maybe not ageist, but I do think it's easier to get younger people to work slavishly and pay them relatively less (on average, not everywhere pays like Bay area).


It's easy because there has never been a greater backlog of junior candidates trying to break into the industry.


Could be worded as Low barrier to entry and highly compensated.

Kids get into it just by having the tenacity to do whatever it takes to make it chooch. It's all that counts.


Ageist against old people? young people? middle-age people? I see at least these 3 categories are facing age related issues.


"But modern c++ is safe, preventing all those errors is as easy as not making them!..."


Is there some authoritative source for what is considered modern C++ and what is old? Most projects I've seen use a wide mix of C++ features of varying age. If you use some C++23 futures it would not make it modern if you still use C++98 features you not supposed to use.


Originally it refers to what was already possible in C++98, when one leaves behind the legacy ways of coding C with a C++ compiler.

Started with the publishing of "Modern C++ Design" from Andrei Alexandrescu in 2001.

https://en.wikipedia.org/wiki/Modern_C%2B%2B_Design

When ISO C++11 came to be, many re-used the term to mean C++11 or higher.

Given that many keep updating this to mean more modern versions, a well known developer in the community (Tony Van Eerd) has made the joke of that by C++17 time we were in Postmodern C++.

https://www.youtube.com/watch?v=QTLn3goa3A8

No idea what kind of modernism to call C++23, when C++17 was already postmodern, maybe Revivalist C++.

However it basically comes back to Andrei Alexandrescu's original ideas of programming in C++ as its own language, leave the C ways and pitfalls of resource management behind, learn to embrace a modern language for systems programming.

I should also note that there are developers against this philosophy, they advocate that the C++ as understood by CFront is what one should care about, thus Orthodox C++ movement was born.

https://gist.github.com/bkaradzic/2e39896bc7d8c34e042b


> programming in C++ as its own language, leave the C ways

I'm with Kate Gregory on the "Stop teaching C" (actually Kate specifically means in order to then teach C++ but I also think it's probably fine to stop teaching C outright)

But whilst Kate is right in terms of pedagogy, as a larger philosophy this is inadequate. As a language C++ is obviously defective and the explanation is almost invariably "Because C" which only makes sense once you appreciate C++ in terms of C.

The built-in array type in C++ is garbage. Why is it garbage? This is a language with all these powerful features, why doesn't its array type leverage any of them? It's because this is actually the array type from C.

OK, maybe just the array type is trash, that's obviously not good, but it's one defect. How about string literals. Oops. C++ does sort of technically have the string literals you actually wanted, but the syntax for them is weird and you need the standard library not the core language... the ones you get for "Some text" are C's constant strings, an array of bytes with an extra zero byte, and well, the array type sucks.

This carries on, the language doesn't provide real tuples, it doesn't provide a real sum type, its built-in types don't believe in methods but user types do, everywhere there are weird choices which are non-sensical except for the reality that it's what C does.

And then at the end of that, the language isn't actually compatible with C. It's close, a lot of stuff works, and more stuff kinda-sorta works enough that you may be surprised when it fails, but there isn't the sort of robust compatibility you might expect given the enormous sacrifices made for this goal.


I mostly agree with you.

The issue is how "worse is better" culture tends to win, and if the option is between C and C++ for a given scenario, then I definitely take C++.

However if the option pool is widened to more alternatives, then yeah, there should be a sound reason for still pick them for greenfield development, e.g. CUDA, a language toolchain based on LLVM,...


No, there's no such authoritative source - depending on context C++ fans will mix and match what is 'modern'.

It's somewhat similar to the C/C++ split. When it is convenient it's "C/C++" because "you can easily migrate your old C codebase to C++". But in other situations it's "C++", because C is old and more error prone and "we no longer manipulate raw pointers".


“Modern C++” is not necessarily tied to any specific standard, it is more a collection of ideas and philosophies. Although if I had to pick I’d say it really started with C++11.


not authoritative, but the really big c++ change was with c++11 - changes after that have been important, but perhaps more or less transparent to the average c++ user. and compiler support for c++11 is very good.


In fairness, only 2 of those 4 are actually memory-related.


And both have existing tools to find those bugs that people often just don't use.


Since 1979 with the invention of lint by Stephen Johnson at Bell Labs.

https://en.wikipedia.org/wiki/Lint_(software)


Static analysis as a bugfinding tool has proven to be insufficient, especially for large C++ binaries and JS programs. Both languages are nightmares for precise and scalable analysis.

Coverity exists. They've got a great product. But it doesn't solve the problem.


It doesn't solve everything, it solves even less when it isn't used.


Of course. But these issues will remain near the top of the list indefinitely if people just leverage traditional analysis tools.

I love static analysis. I did my PhD in it. But we'll still be talking about use after free in 2073 if we just try to chase higher K in our analysis implementations.


Naturally static analysis alone doesn't fix use after free in all possible cases, however it already does fix several of them when the analyser can see everything on the existing source code.

The main issue is the community sub-culture of not adopting tooling as it isn't perfect 100% of the time.

Many of the C++ security conscious folks end up being polyglot, as this subculture eventually wears one out.


Having spent well over a decade in this space, I assure you that the root cause of limitations for static analyzers doing lifetime analysis in C++ is not separate compilation or partial program analysis caused by shared libraries.


Naturally it isn't.

Again, even if not perfect, and doesn't cover all use cases, what about people would actually use something at all?

During that decade, how much time did you spent looking at the human side of the problem instead of what the tools can achive?


Nowhere did I suggest that we shouldn’t use these tools or spend time improving them. UX for tools more powerful than local AST matching indeed tends to be quite bad because explaining the chain of reasoning for an alarm is difficult.

My only point is that without a different approach we will continue to have the same problems in 2073.


In fairness, only C/C++ of all the currently commonly used languages can have half of the 4 top dangerous software weaknesses.


JavaScript routinely has the other half of the top4.


So do C and C++ when used in web or database applications. So they get 4/4


I agree that in principle the neutralization bugs aren't something C++ is necessarily making worse than, say, Python. But it'd be fascinating to see a study to figure out whether C++ programmers make these mistakes more often, or less often, or roughly the same.

An argument for more often: C++ is so complicated, maybe you're too busy with other problems to address the neutralization issue

An argument for less often: C++ teaches you to be careful and check everything to avoid nasty outcomes so that carries over to neutralization


It's somewhat disheartening as a security enthusiast that people only focus on "popular" security bugs and ignore the rest. The other top 21 bug classes aren't as "cool" but they will let me hack your app just the same.


Sure, but SQL Injection will let a script kiddie steal and/or drop your entire poorly configured production DB.


It also provides several paths to RCE depending on the environment, not just exfil.


SQL Injection is weird because it's been known for so long and modern frameworks usually have so many ways of avoiding it by default, that's one has to go out of their way to create an injection vulnerability, but it still happens often with greenfield code.


> mproper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

This is often ignored as it simply takes too much time and it often does not hurt much as it’s ‘internal’ (to the company using the saas or whatever).


Then ask yourself: how much have you done to prevent people choosing the wrong programming language? Because the PL has such a major influence, it's by far the most low hanging fruit to tackle those many of those issues.


Personally? I've done quite a bit here although there's always more. I worked at Google to fund Rust development internally and externally, helped sponsor the work that eventually led to getting Rust adopted in the Linux kernel, and now run a company that's building a new Linux distribution that prioritizes shipping code written in memory safe languages.

https://security.googleblog.com/2021/02/mitigating-memory-sa...

https://www.chainguard.dev/unchained/building-the-first-memo...


Oh awesome! Then I take off my hat. :-)


How many of these rust can solve?

(Not in use rust for everything bandwagon, genuinely curious)


SQL injection and XSS are typically solved at a library/framework level instead of a programming language one, although type systems can help make those frameworks usable and work well.

Either way, they're effectively "solved" from a programmer's perspective if you're willing to adopt modern frameworks instead of string-concatenating HTML or SQL manually.


Judging from my limited experience the first and fourth are either caught by the compiler or at least result in a panic in some cases.

The middle two are out of reach of a typical PL or type system (there are exceptions like Ur, but I don't think it's adopted widely). It's a problem that is typically solved via libraries and Rust is not unique in terms of providing safe libraries around generating SQL or HTML.


2 of the 4 listed.


With a bit of creativity, you can use static typing systems to at least slant the table in your favor with SQL, HTML, and in general, structured text output. It's hard to completely ban string concatenation because you will eventually need it, but you can make it so doing the right thing is easier than the wrong thing.

However, existing libraries for statically-typed languages often don't do the work or apply the creativity and end up roughly as unsafe as the dynamically typed languages.

It's a bit of a pet peeve of mine.


It could, but it will be decades before Rust adoption is where C/C++ is today so in the meantime it would be nice to see some other, more practical and short term solution to these problems. Otherwise I can predict the the top 4 at least 50% for a decade ahead.


Hence why all major OS vendors are embracing designs with hardware memory tagging, that is the last frontier from possible mitigations.


Items 4 and 12 and only in obvious cases.


for 1 scan all the code base and warn any use of strcpy/strncpy/etc and replace them with snprintf, no APIs without length argument shall be allowed.

for 4 the static analyzer should help, and, also set your pointer to NULL immediately after free too(for double free)


Static detection of UAF is grossly incapable of actually protecting real C++ applications. It can find some bugs, sure. But a sound analysis is going to just throw red all over a codebase and get people to disable it immediately.

Changing everything to take lengths is definitely a good change - but challenging to retrofit into existing codebases. Apple has a neat idea for automatically passing lengths along via compilation changes rather than source changes, but if you want to do things in source you have to deal with the fact that there is some function somewhere that takes a void*, increments it locally, reinterpret_casts it to some type, and then accesses one of its fields and you've got a fucking mess of a refactor on your hands.


> top four elements are still

Use after free is actually gaining popularity, up 3 since last year.


Aside from Memory Management, there's another general category that always comes up in these lists, but is not talked about much: in-band signaling (i.e., "Strings are Evil"):

- Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') (#2)

- Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') (#3)

- Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection') (#4)

- Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') (#8)

- Improper Neutralization of Special Elements used in a Command ('Command Injection') (#16)

- Improper Control of Generation of Code ('Code Injection') (#23)

All of these came from trying to avoid structured data, and instead using strings with "special characters". It's crazy how many times this mistake has been repeated: file paths, URLs, log files, CSV, HTML, HTTP (cookies, headers, query strings), domain names, SQL, shell commands, shell pipelines... One unescaped character, from anywhere in the stack, and it all blows up.

One could say "at least it's human-readable", but that's not reliable either. Take files names, for example. Two visually identical file names may map to different files (because confusables[1] or surrounding spaces), or two different names map to the same file (because normalization[2]), or the ".jpg" at the end may not actually be the extension (because right-to-left override[3]).

So the computer interpretation of a string might be wrong because a special character sneaked in. And even if everyone was perfectly careful, the human interpretation might still be wrong. For the sake of the next generations, I hope we leave strings for human text and nothing more.

[1] https://unicode.org/cldr/utility/confusables.jsp

[2] https://developer.apple.com/library/archive/qa/qa1173/_index...

[3] https://krebsonsecurity.com/2011/09/right-to-left-override-a...


Out of this frustration I've built: https://github.com/Endava/cats. It's for APIs, but mostly addressing exactly this case: don't use strings for everything, if you choose to use it though, make sure you add patterns for checking if things are valid, make sure you think about all the corner cases and all the weird characters that can brake you app, and so on.


And it's even worse when everything is a map, rather than specific object schemas.


What‘s the alternative though? For URLs for example, would you have to put a JSON structure into the browser? That‘s obviously not going to happen.


Sure, most of these decisions are too entrenched to be fixed.

But yes, URLs should have been structured. We already see paths rendered with breadcrumbs, the protocol replaced with an icon, `www` auto-inserted and hidden, and the domain highlighted. If that's not a structure, I don't know what is.

By cramming everything into the same string, we open ourselves to phishing attacks by domains like `www.google.com.evil.com`, malicious traversal, 404s from mangled relative paths, and much more.


URLs are structured. But when you need to send them across the network or store them on disk or even just send them between different processes on the same machine you need to define what the byte level representation is.

I don't see how you can get away from having a defined serialisation format. People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.

But I'm not sure exactly what you mean by "should have been structured". Eventually you've gotta define the bytes if you want to interoperate with other software.


> I don't see how you can get away from having a defined serialisation format.

Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them. Your URLs shouldn't be either, but it's probably too late for that.

> People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.

That's a whole lot better than the current footgun we have, where

    http://http://http://@http://http://?http://#http://
is a valid URL. People don't operate directly on string URLs without trouble either, so at least the structured data is not inviting incorrect usage.


> > I don't see how you can get away from having a defined serialisation format.

> Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them.

...all of those things mentioned have defined serialization. i expect all of them have had security issues because of problems with deserialization code.


Yes, of course. Everything that is stored or transmitted must have a defined serialization. And any piece of code as widely used as this is going to have security issues.

What is your point? That strings don't need defined formats? That they have less security issues?


Your certificate isn't entered by hand, though?

That is, it is easy to see that the reason we have URLs sent as strings, is that we collect them from the user. And it makes perfect sense that we would collect strings of characters from users.


How many URLs, as a percent of all browser navigation, do you think are typed by hand? And I don't mean "news.ycombinator.com", I mean the full URL, like "https://news.ycombinator.com/news".

And in those rare cases, of course you can collect strings from the user. But then they have to be parsed, and that's what should be on the wire. IP addresses are also sometimes entered by hand, but we don't send those strings in TCP packets.


Fewer today than when it started, for sure. Though, I'm not clear that "copy pasted between applications" doesn't have its own problems. I have never seen that done in a "you are passing objects around" way that didn't have terrible security.


Humans think in strings so it's not surprising we carry this thinking to code where it blows up in our face.


Some humans think in strings. I don't, generally I think in pictures.


No, IMHO escaping is an elegantly simple concept; it's just that for some reason (like basic arithmetic) people don't seem to be taught enough about it to understand.

Two visually identical file names may map to different files (because confusables[1]), or two different names map to the same file (because normalization[2]), or the ".jpg" at the end may not actually be the extension (because right-to-left override[3]),

Those are all because of Unicode, which is an even worse idea in general.


Escaping is a cute solution, but it doesn't belong in infrastructure.

> it's just that for some reason (like basic arithmetic) people don't seem to be taught enough about it to understand.

That's the same argument used to defend manual memory management. But education is not enough. Escaping is something you have to remember to do every time*, or it'll blow up spectacularly. Even knowledgeable professionals mess it up, or it wouldn't occupy 6 of the 25 spots in this list.

> Those are all because of Unicode, which is an even worse idea in general.

What's the alternative? Japanese speakers writing file names in ASCII? Unicode is a modern marvel, it's our fault we use it where it doesn't belong.

* Not necessarily every input/output, but at least every system that interacts with it.


You are going to be sorely disappointed with LLMs. :(

We make it look like it is a request response with a chat bot, but it is more realistic to say we are making a single document and having the model fill out the rest. That is, there is no out of band. There is only the document.


Mitre really lost a lot of respect with CVE-2016-1000027. Every few weeks a warning that any SpringBoot 2.x project has a CVSS 9.8, which causes all sorts of heartache for those of us bound to CVE remediation. Every blasted security tool reports this one. Spring reviewed and rejected, as did our very, very large organization. Comically, this has become the CVE we use to see how our tools allow us to white/black list entries.

Thank god Spring dropped this interface in the Framework 6.x / Boot 3.x release, and the end for non-commercial support is this year for the old stuff.

https://github.com/spring-projects/spring-framework/issues/2... https://github.com/advisories/GHSA-4wrc-f8pq-fpqp


What would you rather? It seems to make sense to rate these with such a high CVSS. All auditing tools I know of have a way to whitelist CVEs to say either "We've looked into this and it doesn't impact us" or "We are willing to accept the risk". From your post it sounds like you in the first camp, but others might not be and need those notifications.

RCE via deserilaization seems valid 9.8 even if it requires the developer to use less common APIs or using them in strange ways. In the bug they have a comment that the documentation warns about these API but that doesn't really impact a CVSS score. Am I missing something about this specific CVE on why you think its unfair?


It should be considered a failure of our profession that after all these years the number 1 issue is still out of bounds write, a memory safety issue. In any true engineering profession a failure of this sort would be unacceptable, but in ours it's tolerated and explained away as a necessary byproduct of certain tools. How much personal information has been compromised due to these low standards? How many people put at risk? It's shameful.


In any true engineering profession, we would still be using C, but with big orange safety vests on.


So you want to be a licensed engineer to write software?


This will eventually happen.


Here are Language-Specific ones:

1. CWE-787 Out-of-bounds Write: C, C++, Assembly

4. CWE-416 Use After Free: C, C++

7. CWE-125 Out-of-bounds Read: C, C++

10. CWE-434 Unrestricted Upload of File with Dangerous Type: ASP.NET, PHP, Class: Not Language-Specific

12. CWE-476 NULL Pointer Dereference: C, C++, Java, C#, Go

15. CWE-502 Deserialization of Untrusted Data: Java, Ruby, PHP, Python, JavaScript

17. CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer: C, C++, Assembly

21. CWE-362 Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition'): C, C++, Java

23. CWE-94 Improper Control of Generation of Code ('Code Injection'): Interpreted


>12. Null pointer deref.

In java you'll get an exception, while in C you might dissapear your cat. Those 2 are quite incomparable when talking about "dangerous-ness" of a mistake


And C# is making references non-nullable by default.


Kind of, it doesn't work that well with existing libraries, and because of that, even when you enable it, it is only a warning.


> 15. CWE-502 Deserialization of Untrusted Data: Java, Ruby, PHP, Python, JavaScript

> 21. CWE-362 Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition'): C, C++, Java

Why those languages specifically? I would say these two issues apply to all languages.


So, the memory related ones are in position 1, 4, 7, 12, 17, and 21.

I understand memory safety is important, but still: only one in the podium (though it is first), only 3 in the top 10… clearly security is about much more than memory safety.


Of course. Security is the exercise of making programs not do things. Since the very beginning of computer science we've understand that programs want to be able to do anything at a very fundamental level. We'll never solve security completely.

But it is embarrassing that we've been living with memory safety issues for 50 years and they still remain very common and very severe, despite being addressable via type systems in ways that something like a logical bug that leads to data leakage isn't.


This isn't about choosing security measures from a menu. This is about the foundations of what you build.

To the extent that memory safety is slowly, oh so slowly, but steadily dropping down the list, it is because we are taking it seriously as a foundational issue and actually addressing it. To turn around and then use the success we've had as evidence that it isn't important is making a serious error.

There is no reason to use a memory unsafe language anymore, except legacy codebases, and that is also slowly but surely diminishing. I'm still yet to hear this amazingly compelling reason that you just need memory unsafe languages. In terms of cost/benefits analysis, memory unsafety is literally all costs. Even if you do have one of the rare cases when you need it, and you only need a very particular variant of it (reading bytes in memory of one type as bytes of another type, you never need to write out of the scope of an array or dereference into an unallocated memory page), you can still get it through explicit unsafe support that every language has one way or another. You do not need a language that is pervasively unsafe with every line you write so that on those three lines of code out of millions that you actually need it, you can have it with slightly less ceremony. That's just a mind-blowingly bad tradeoff and engineering decision.

How are we supposed to address the other issues from a foundation of a memory unsafe language? If we can't even have such a basic guarantee, we sure aren't going to get more complicated ones later.


Safety is nice, but it often costs significant performance. So what you are saying is… if I need performance my only choice is Rust?

Don’t get me wrong, I absolutely hate the insanity of Undefined Behaviour™ in C and C++ (my pet peeve being signed integer overflow), and I’m totally behind systematic bounds checking (which with compile time support tends to lie between free and cheap). I’m less sold on ensuring the safety of memory shared between threads because I tend to prefer message passing, and I’m not sure how to best address use-after-frees: using the general allocator for each and every object is often even more wasteful than just using a GC, so RAII based schemes aren’t quite enough. I have yet to really test Rust’s borrow checker however.

One thing I have noted, is that C, despite its expressive weakness and its unsafe insanity, remains pretty capable at some niches. Low-level cryptographic code for instance is hardly affected by its flaws (having no heap allocation and constant time code helps a ton).


"I’m less sold on ensuring the safety of memory shared between threads because I tend to prefer message passing"

Memory safety, at least to my eyes, has not traditionally encompassed that as a requirement. I don't consider this a solved problem, in that it has a lot of solutions and consensus about them is still developing. (e.g., I still expect async as it has been implemented in Node & Rust to eventually be considered a gigantic mistake but clearly that is not an uncontroversial opinion in 2023; check in with me in 2033 or 2043). So I'd advise trying to use one of the better solutions but I'm not quite to "there's no reason to not use one of these things".

So my passion is mostly about out-of-bounds access and use-after-free. If it costs you performance... take the hit. It's not a lot. And if you do need unsafe approaches, they are almost always some tight loop somewhere or something where you can selectively take the gloves off and drop down to assembler or something. You don't need you entire language to be unsafe just so you don't have to wrap "unsafe { }" around your tight inner loop.


> So my passion is mostly about out-of-bounds access and use-after-free.

Yeah, those are the big ones indeed, and I am willing to take a performance hit to get there. If that’s the only hit I take I’ll still be much better than paying an Electron tax.

I do however still feel some discomfort about use-after-free, because to be honest I just don’t know enough about the relevant use cases, compilation techniques, and runtime checks. So far my only relevant experiences have been GC, RAII, and stack-only. They all solve my problem (or at least I can see how I could write a compiler that would solve each use case for me). But I know those aren’t the only use cases, and I’m not familiar enough with the other allocation patterns (pool, arena…) to have a relevant opinion.

But perhaps I’m just stressing over nothing? The problem is easily stated after all: no object should be accessed after its backing memory has been freed. One way to do that is to make sure the object (and any reference to it) goes out of scope before the backing storage is freed. Which sounds doable enough if the backing storage itself follows a stack discipline…

Hey, I can glimpse here a way to allow allocations and statically guarantee a limit on memory usage (barring input dependant allocation amounts). Perhaps even avoiding fragmentation, which would be terrific for embedded use cases.


> There is no reason to use a memory unsafe language anymore, except legacy codebases, and that is also slowly but surely diminishing. I'm still yet to hear this amazingly compelling reason that you just need memory unsafe languages. In terms of cost/benefits analysis, memory unsafety is literally all costs.

Tell that to the authors of new memory unsafe languages (like Zig) and creators of new projects in those languages (like https://tigerbeetle.com) :(


I do tell them that. I see no reason to be memory unsafe.

It is a huge uphill battle to become a new general purpose language, and the smallest thing can kill it. The fact that Zig is memory unsafe means that I, who am not a bleeding-edge adopter, but am an early adopter and in a position to make decisions about what is used at work, have disqualified it and lost all interest. I have no use for such a language for greenfield projects. Simply offering a more convenient onramp to the sorts of problems that C has is not a compelling value proposition for me.

I extremely strongly suspect I am not even remotely alone.


People have language blinders on. It's not like if you only focus on the ones that affect your language specifically, suddenly you're secure. There's still another 16 bug classes to worry about.

If you don't think about the other classes, I'm still gonna escalate privileges, root your box, ransom your data, send spam, charge a half million dollars in cloud spend to your account, steal your customers' PII/PHI, etc etc etc. Without ever using a language specific exploit.


Yes, but such neglect of other bug classes suggests that those developers aren't focusing on security anyways. For those who do want reasonable security, using a memory-safe language suddenly makes the most pervasive errors go away, and then it's easier to focus on building robust applications.


PHP is uniquely vulnerable to things like XSS and others on that list, because it does escape strings that are used in templating.

Escaping by default has become a standard practice with HTML templating languages, see the Go html template standard library for a very detailed breakdown of what is escaped where.

More modern PHP frameworks like Laravel provide their own templating solution in part because of this. But the vast majority of websites run on default PHP templates, so it's not surprising that these kinds of vulnerabilities are so high up in the list.


Laravel has had their own share of XSS issues with their Blade templating engine.

The whole problem is that you mix code and data, and that third party resource loading is 'on' by default in browsers, especially for scripts and things that can embed scripts. This is not something you can fix once and for all at the library level.


Isn’t #17 the same as #1 and #7 combined?


Is anyone using Valgrind even anymore these days?

I've noticed that using Valgrind on Python systems is almost impossible because most modules have not been built with Valgrind in mind and thus you get swamped in noise.

I suppose the same is true for any large system that uses many different third party libraries.


ASan is better for finding memory corruption afaik


I use valgrind regularly, and prefer it over asan. asan will result in a faster executable which is nice, but I far prefer valgrind's output than asan's (this might be preference, but I find it to be clearer), and various things break when building with asan so I never make it default. Being able to valgrind stuff without recompiling is very convenient.

I'm also not sure if asan has an equivalent to --leak-check=full


Ok. But I'm guessing it has the same problems. I.e., if half your libraries/modules have never seen it, then you'll get a lot of noise. Happy to be proved wrong.


The compiler can't add checks into code it hasn't compiled. External modules, unless they are doing weird things which you do want to know about, should not generate ASan reports... on Linux.


Absolutely. I enable Valgrind on every default debug build of mine. It's my favorite tool.

I have even made it recognize my custom allocators and report bugs with them too.

When combined with my second favorite tool, AFL++, I have a good shot at eliminating most memory bugs. AFL++ finds paths through the software, and I run every single one of those paths through Valgrind. It's beautiful.


It is useful but quite limited by itself for security bugs - it's dynamic instrumentation so you'd need to test with the input triggering the vulnerability. (But useful in combination with fuzzing, similarly to the compiler sanitizer options)


Valgrind, ASAN and AFL are my holy trinity when it comes to bug squashing. I'm surprised that you have a problem with python modules - I use Valgrind specifically when I need to test executables without recompiling.


I'm really waiting to see all those shift-left startup founders that will craft a new world of developer-oriented products from this list. IMHO, the real way to look at it is how we can influence developers (by choosing the suitable languages, platforms, architectures, etc.) and then measure them after they find the vulns.

From the optimistic side, it looks like the safest language to write an app today with is TypeScript.


Typescript applications suffer from many of these vulnerabilities. JS apps have a specific class of critical vulnerabilities as well, prototype pollution. If I had to write a web application with security in mind, I personally would pick Python. It’s possible to make mistakes in any language though, and the environment an app is deployed in can independently introduce many vulnerabilities.


When I wrote typescript, it was half a joke as a result of the language ranking in one of the comments. As you said, the most important factor is the platform, not the language itself. Writing the software in a language that run well on the platform you aim for, is the right decision.


Wordle sold for a $1M+ and put all the solutions for the future games into the javascript file. Security is important, but its a spectrum.


CSRF higher than improper auth? Yeah...don't think so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: