A little sloppy with the error handling, but pretty neat.
E.g. log_append() doesn't check fopen return value, malloc() return isn't checked, and a write() can return a partial write, thus needs a loop. fcntl().
Also write() should check for EINTR&EAGAIN.
And also there's no handling for nonblocking write.
If the user types too much while the network glitches it seems that this could cause the client to exit().
Probably the correct way is to not read from stdin unless poll() returns that writing to the socket is safe, and vice versa.
connect() errors should print where they failed to connect.
And:
$ ./kirc
Nick not specified: Success
And in my first test I got:
Write to socket: Resource temporarily unavailable
And I see a lot of NULL pointers being printed when I connect to e.g. freenode.
And so on, and so on…
For these things I recommend re-reading the manpage for every libc and syscall you call, and check how they can fail, and consider how you can handle that failure.
307 lines of pure C is a pretty neat minimal actually usable client, so I'm not saying it's not well done. But that thing about "… and do it well" (from the readme) means doing all of the above.
The problem, of course, is that fixing these problems well is "the other 90% of the work", especially when coding in C.
And this is the main reason I avoid C when I can. You can't just call "write()". You have to write a 5-10 line wrapper function, and then all callers need a few lines of error handling. And so it goes for everything, until your program is no longer the nice 300 line neat thing you can almost write from memory.
So it's a good start. But it's pretty fragile in its current form.
i appreciate the honest feedback! definitely a work in progress and i have learned a lot since i started ~1mo ago on this. will take your suggestions and add them to my ever growing “todo” list. ;)
small digression, but I thought that at least on Linux, malloc() never returns an error because the actual allocation happens lazily, when the memory is first used?
Cases where memory allocations fail on linux include: hitting memory resource limits (ulimit), overcommit behaviour is set to stricter than the default via sysctl, kernel heuristics with the default overcommit settings end up failing allocation, container / cgroups limits, 32-bit virtual address space is exhausted - i'm sure there are more.
The default overcommit-within-reason algorithm is designed to deny allocations that are obviously unrealistic I think.
Using semi conservative ulimit settings is pretty common in interactive use to catch runaway / swapped to death situations.
Take a look at vm.max_map_count. A lot of systems have that set artificially low and that will force a memory allocator to fail and return NULL. I can confirm this is the case for jemalloc.
Also, on Linux, only 48 of 64 bits are available for VM addressing. That leaves you with about 250TB of address space. Seems like a lot until you start using VM for disk mmaps. I have actually seen this limit hit.
Also it's 48-bit just on current amd64 chips, it's not an architectural amd64 property - your binary is compatible with >48bit VA space of the future (unless you yourself misguidedly bake the 48-bit assumption into it). But I guess the point was to point out lower bounds.
It's sloppy coding and not portable. If you really don't want to check for malloc() return value because you consider that it shouldn't fail (or that you won't be able to recover from it anyway) just implement an xmalloc() that just aborts on failure and do that. At least it'll crash "cleanly" this way.
On 64 bits malloc basically always succeeds, because you are very likely not running out of virtual memory. You can totally go ahead and malloc terabytes of RAM.
On 32 bits malloc is going to fail once your virtual memory space is full (~3 GB).
Error checking is somewhat redundant for most applications, since you're going to abort anyway, and if there are no pages available on access - well you're getting SIGSEGV anyway, just like you would accessing a null pointer (except on embedded devices). But beware of the exceptions. In C especially it's common to use null pointers as a flag... one of the reasons why new/delete in C++ is preferable; it throws and thus aborts when it fails, which you don't care to handle, so that's about perfect for a C++ exception.
In addition to what others have said, which basically boils down to "it'll never return an error… unless it does", there's also the aspect where if you start assuming things about the kernel environment you're not coding in "POSIX C99" like the title says, but to "Debian GNU/Linux circa 2020".
And that's part of the "and do it well". A thousand years from now, if the C99&POSIX spec survives, will support correctly written code.
Dereferencing a null pointer can also be a security issue (mostly for kernels, though) so one should assume a malicious general environment.
IRC in 307 rules of C code without dependencies, pretty cool. Of course for that to work they had to sacrifice secure TLS support. Based on the "do one thing well" I was thinking you could set up a TLS tunnel from localhost:6667 to <irc-server>:6697
Not really perfect but my WIP attempt using `socat`:
The IRC protocol is both text-based and simple enough that you can use something like netcat as a client (and I have, many times.) The line-based format fits IM perfectly, and the overhead of the protocol is a tiny fraction of the bloated proprietary ones that have filled this use-case today (except MSNP, which in its earlier versions was also delightfully simple and easy to write a client for, but definitely beyond the threshold of being usable "raw".)
The problem with that is that IRC networks generally send you a ping every 5 minutes or so, and if you don't send a PONG back in time you are disconnected.
You could use netcat (or something akin to it). Back in the days I tunneled that via stunnel to get TLS [1] support. One could do the same with POP3 and IMAP.
One problem with this workflow is that in some clients (e.g. IRC client supporting TLS) the user had no way to identify/verify the certificate. If a client just automatically accept self-signed certificates, its just snake oil.
[1] Everyone still called it SSL back then. Oh, wait...
> If a client just automatically accept self-signed certificates, its just snake oil.
It forces attackers to use a active attack rather than a passive one. Which is the only security most IRC can have anyway, since the attacker could just join the channel and listen in that way, since most IRC networks are public.
> It forces attackers to use a active attack rather than a passive one.
The MITM or eavesdrop can happen on a bridge. If the client doesn't check the certificate and accepts any, its about as good as plaintext. It could be worse, even, due to the false sense of security.
> Which is the only security most IRC can have anyway, since the attacker could just join the channel and listen in that way, since most IRC networks are public.
IRC network private or public is irrelevant.
There were, for sure, private channels back in the days (90s). Back then you could set a channel secret (hidden) and set a password on it, effectively making it a private channel (would not show up in /whois or /list). Bots could kick people who are unknown based on filters. For example, without an auth to an Eggdrop, you could get insta kickbanned even _with_ the correct password.
Then there's PMs which are one on one (except for server(s)).
If one of the IRC servers is compromised though (or tapped, or whatever), that makes sniffing a channel or PMs child play.
There's also the problem of data integrity. If you are asking for (or giving) help in #linux and someone can change the data on the fly, [...]
FWIW, UnrealIRCd, even back then, innovated (or invented) a lot of new features on top of IRC. Some of these added security, though I don't know examples out of my head.
I never did SSL from telnet for sure, in fact I didn't know IRC supported encryption back then. There was also the ident response that was needed - can't recall the particulars as it's been 20+ years.
I was talking end of 90s. UnrealIRCd (mainly Carsten Munk / stskeeps) was one of the first to support TLS/SSL. At some point in start of 00s the popular IRC networks slowly but surely started to support it. There's also the case for cryptography between servers, something UnrealIRCd also was quick to adapt.
Some servers required ident(d) response which required a server running on privileged port 113.
The one issue for the longest time was that networks would use self signed certs and often different certs per server in the network, so it was hard to have any kind of trust. I did see a few using Let's Encrypt in recent years but moved over to Matrix these days
So you can play RPGs remotely from any crap built from the 80's with serial support and a 80x24 display (WIFI232), on the display, a Spectrum +3 would suffice.
I generally support end-to-end encryption for everything, but I'm not sure that it makes sense in the context of IRC. IRC networks are usually public, so anyone could join your channel and listen in, even with end-to-end encryption. It seems like E2E would make for a lot of complexity and overhead without tangibly increasing the privacy of the users.
Before E2EE was used in IM clients, IRC already had IRC over TLS, and also OTR (which was also used in Gaim/Pidgin).
On IRC, IRC over TLS doesn't have the same threat model as E2EE. With IRC over TLS, the server(s) can read the data plaintext. With proper E2EE (not the marketing version) that's not the case; only clients can read the data. I'm talking about actual data/content here; not metadata.
Hence the "usually" public, I presume. While this doesn't invalidate your point that IRC could use E2E encryption, I personally only use IRC for communication on public channels, where it would be largely pointless, unless you're assuming a really paranoid threat model, in which case public group conversation is probably not a good idea anyway.
Cool project. I would relax the requirements from C99 to C89. You get more portability to cool
retro systems that way and C99 doesn’t really add that much. Also C++ compilers are generally more able to compile C89 in C++ mode than C99, e.g. msvc
I thought MSVC added support for C99 at long last a few years ago? IIRC because it was effectively needed for a newer C++ standard, but still.
Also it's 2020, I wasn't even coding when C99 came out and I'm now a "senior" developer, whatever that means. You'll have to pry C99 from my cold, dead hands. Whatever compilers don't support it by now, I don't want to support them.
What do you think C99 adds that's so important? The only thing that comes to mind is declaring dynamic arrays in which the length is known at the call of the function on the stack. But using malloc isn't so hard regardless.
I haven't programmed in C in over 10 years though, so I could be missing something.
Agreed, also "retro system compiler" doesn't mean it only supports old C standards, e.g. SDCC is C99 and C11 compatible. IMHO strict C89 is a much less enjoyable language to read and write than C99, for instance designated initialization and compound literals are massive improvements.
Also MSVC's C99 support is pretty good since ca. VS2015.
When I mentioned retro systems, I had in mind obsolete compilers in stock installations of IRIX, AIX, Solaris, BeOS, NextStep, Amiga Unix etc. Not modern compilers that target 8-bit microcontrollers like SDCC.
The MSVC C compiler supported the full C99 designated initialization and compound literal features in VS2015, both are not part of the common C/C++ subset, but exclusive C99 features. So all in all the C99 support in MSVC hasn't been that bad since ca 2015, it just wasn't complete enough to be called "standard C99".
But yeah, those C99 features weren't consistent at all with Herb Sutter's 2012 blog post about MSVC only supporting C features that are needed for the C++ compiler.
C and C++ are more strictly separated in the Microsoft compilers compared to gcc and clang (which both support more modern C features in C++ mode via non-standard extensions), I think that's what's confusing many people. It's not a problem in mixed-language projects though, just put all the C code into .c files and all C++ code into .cpp and you're set, compiling C code with a C++ compiler isn't such a great idea anyway, since it limits you to a ca. 1995 version of C.
Because despite the name this "betterC" is not a C dialect, but a D dialect which is an entirely different language than C99?
Could just as well ask why not write it in Nim, Zig, Rust, Swift, Kotlin etc... This means a different audience, different target platforms, different trade-offs.
the suckless IRC clients are awesome! in fact, `sic` was my “go to” before writing `kirc`. I’m definitely not trying to compete with those, especially their file-based approach (which is great for users that work across channels) but rather offer a lightweight and “clean-looking” solution for the casual user.
Not trying to be snarky. Honest question: Why use C11 over C99? Or even why use C99 over C89? What significant advantages do the new standards provide that cannot be done in plain old C89?
C11 gives you noreturn and alignas. Alignas can be pretty useful for low-level development in particular. Just hope you don't need variable-length arrays because those got changed to optional.
> Or even why use C99 over C89?
Several very big things: Native bool, stdint.h (fixed-width int types with known sizes ahead of time), long long, snprintf, not having to declare all variables at the top of the block (and now you can do for (size_t i = 0; i < sizeof(strbuf); ++i) because of it).
You don't need any "significant" advantage. Even a very small advantage ("an anonymous struct would be handy here") is enough, why would you _not_ use it when it's free? For the fun of the constraint? I'm not a C expert but I don't think there's any downside to using the C11 standard compared to C99
...which is lower than the number of platforms with a C89 compiler. A lot of popular projects known for their high portability are C89 for this reason.
There's also the fact that there are far more compilers for C89, and it is easier to write one than for the newer standards. This becomes important if you are interested in avoiding Ken Thompson attacks.
Personally, I still stick to C89 and the only newer feature that I've found to be useful is mixed declarations and statements, but it's no big loss as it both avoids the "variable proliferation" that some codebases seem to be afflicted with, and blocks can be used to start a new inner scope if you really need a new set of declarations anyway.
I'm so glad to not be the only weirdo out there just sticking to plain old C89. I concur to all your reasoning. C89 is simple, readable, gets the job done.
My only pain point is indeed stdint.h. Though it's often available everywhere even if not standard per se.
They're actually called trusting trust attacks (the original paper on the topic is "Reflections on Trusting Trust" if you want a guarranteed search term); I'm not sure why userbinator used a eponym instead.
I'm also not sure why they would be relevant for a general project, since the source language being easy to write a alternate compiler for only matters for the compiler itself: once you have non-infected compiler, you can bootstrap gcc or whatever and compile everything else at whatever C standard you like.
When you start using new features you break backwards compilability. For C11 that means distros as new as Ubuntu 10.04 (which I still use as my main desktop) and the like are going to have problems compiling (GCC it ships with only supports parts, as in C1X). This will also apply to older embedded systems where a tiny client would be useful.
In the past a compiler and ecosystem would last a decade before it couldn't compile something. These days changes are coming out, and being used, every 3 years. It's future shock and the major cause of container usage on the desktop and in academia. Sticking with a well established older standard means everyone can avoid the massive increase in complexity and problems that containers bring.
>That must have horrible security implications, surely?
Lets just say it's a matter of taste. I keep my attack surfaces to a minimum, backport what I can "patch and statically compiled deps for userspace"-wise. On the otherhand, I browse the web with javascript disabled so my old box probably has less "horrible security implications" than a completely up to date distro with the user blindly executing all code they're sent. Security is behavior more than software.
Older standards typically have a larger pool of people who can contribute because the standard has been around longer.
A programmer might have more experience with an older standard due to the length of time has been out or because the toolchain they use elsewhere (personal projects, embedded comes to mind, or work) hasn't updated to the new standard.
Coming up to speed with the new standard is not free. The tooling may be free for the most common targets (embedded usually lags), but taking the time to learn isn't free.
It certainly is free. The standards are generally backwards compatible and the changes are simple. You do not even need to be aware about the differences between C89 and C11 to contribute to a C11 project.
> or because the toolchain they use elsewhere (personal projects, embedded comes to mind, or work) hasn't updated to the new standard.
I'm currently porting some "C99"-ish code, and I had a few instances for which I would have loved just use C11's `_Generic` to replace a macro-hell with statement expressions and accompanying `typeof()`s all over the place. Fun fact: `typeof` is a GNU extension, and the target compiler doesn't have that.
C99 over C89: designated initialization and compound literals are the biggies, plus all the small accumulated improvements that had been added to C during the 90's (e.g. variable declaration anywhere, for (int...), winged comments...)
I actually feel bad about pointing out issues with this code since the project is very neat. But, I'm still going to do it.
There is a problem where the code uses explicit escape sequences for colour instead of using terminfo. This is a pet peeve of mine, because it prevents things like controlling whether or not to use colour by setting TERM to the appropriate values. Or to completely disable highlighting by setting TERM to "dumb". Or even use a completely different terminal type, like if you have an old vt52 hooked up to your computer.
Terminfo is a really nice library that abstracts away all the terminal codes. It's really what should be used here.
Looks cool. I wonder whether users can overflow your buffers inputting commands in sscanf. Also, why malloc/free cmd_str in raw? You're automatically or statically allocating all the other buffers.
Great client, I have been using it for about a week. I would suggest to add a channel indicator before the nickname so you can see from what channel the message being sent from but otherwise great work.
E.g. log_append() doesn't check fopen return value, malloc() return isn't checked, and a write() can return a partial write, thus needs a loop. fcntl().
Also write() should check for EINTR&EAGAIN.
And also there's no handling for nonblocking write.
If the user types too much while the network glitches it seems that this could cause the client to exit().
Probably the correct way is to not read from stdin unless poll() returns that writing to the socket is safe, and vice versa.
connect() errors should print where they failed to connect.
And: $ ./kirc Nick not specified: Success
And in my first test I got: Write to socket: Resource temporarily unavailable
And I see a lot of NULL pointers being printed when I connect to e.g. freenode.
And so on, and so on…
For these things I recommend re-reading the manpage for every libc and syscall you call, and check how they can fail, and consider how you can handle that failure.
307 lines of pure C is a pretty neat minimal actually usable client, so I'm not saying it's not well done. But that thing about "… and do it well" (from the readme) means doing all of the above.
The problem, of course, is that fixing these problems well is "the other 90% of the work", especially when coding in C.
And this is the main reason I avoid C when I can. You can't just call "write()". You have to write a 5-10 line wrapper function, and then all callers need a few lines of error handling. And so it goes for everything, until your program is no longer the nice 300 line neat thing you can almost write from memory.
So it's a good start. But it's pretty fragile in its current form.