Hacker News new | past | comments | ask | show | jobs | submit login

Does the committee have plans to deprecate (as in: give compiler license to complain suchthat compiler developers can appeal to yhe standard when users complain back) locale-sensitive functions like isdigit, which is useless for processing protocol syntax, because it is locale-sensitive, and useless for processing natural-language text, because it examines only one UTF-8 codw unit?



isdigit is likely to remain, because much existing code does use it (perhaps in different contexts from the one you cited). If you need a different function specification to do something different, it could be added in a future release, but that doesn't mean that we need to force programmers to change their existing code.


What about giving isdigit and friends defined behavior for any argument value that's within the range of any of char, signed char, or unsigned char?

The background (I know Doug knows this): isdigit() takes an argument of type int, which is required to be either within the range of unsigned char, or have the value EOF (required to be negative, typically -1).

The problem: plain char is often signed, typically with a range of -128..+127. You might have a negative char value in a string -- but passing any negative value other than EOF to isdigit() has undefined behavior. Thus to use isdigit() safely on arbitrary data, you have to cast the argument to unsigned char:

    if (isdigit((unsigned char)s[i])) ...
A lot of C programmers aren't aware of this and will pass arbitrary char values to isdigit() and friends -- which works fine most of the time, but risks going kaboom.

Changing this could raise issues if -1 is a valid character value and also the value of EOF, but practically speaking -1 or 0xff will almost never be a digit in any real-world character set. (It's ÿ in Unicode and Latin-1, which might cause problems for islower and isalnum.)


This proposal is very difficult to implement because it will cause ABI breakage due to the way the isdigit() macro (and its friends) expose the representation of the ctype internals


I remember that the various is* man pages noted that most of them are only defined if isascii() is true. So I always used e.g. (isascii(x) && ispunct())

FWIW, just looked at the man page (macos) and iswdigit() and isnumber() are mentioned.


isascii() is not defined by ISO C. (It is defined by POSIX, but POSIX says it may be removed in a future version.)

I see that POSIX explicitly says that isascii(x) is "defined on all integer values" (it should have said "all int values").

Personally I'd rather cast to unsigned char.


Does there exist a use case in portable code such that use of isdigit is not a bug?

How does the committee view non-portable existing code generally when considering changes?


Code can be non-portable for various reasons, not all of them bad. I just grepped a recent release of DWB and found about 100 uses of isdigit, most of which were not input from random text but rather were used internally, such as "register" names (limited to a specified range). Other packages are likely to have similar usage patterns. I really don't want to have to edit that code just for aesthetics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: