> The calculation was using the HUGE_VAL constant, which is defined as an ISO C constant with a value of positive infinity [...].
Only if the floating point implementation has infinities, which ISO C does not require. C99 also defines INFINITY, but that one’s also not required to actually be an infinity, unfortunately.
More seriously:
> "Nowadays, outside museums, it's hard to find computers which don't implement IEEE 754."
AFAIU there are plenty of implementations that either run faster without or outright refuse to implement the “gradual underflow” semantics, which 754 technically requires with no equivocation. (Wikipedia tells me the Intel compiler turns on SSE “denormals as zeros” when optimizations are on, for example.)
Zero and nonzero mantissas always mean different things, no matter what the exponent is. I guess I am not sure how this could be described as an optimization... an optimization relative to what? Some hypothetical floating-point format which has extra bits?
> Zero and nonzero mantissas always mean different things, no matter what the exponent is.
They don't normally mean different types of value. 4.0000000 and 4.0000001 are just different adjacent numbers.
> I guess I am not sure how this could be described as an optimization... an optimization relative to what? Some hypothetical floating-point format which has extra bits?
Using the exponent to distinguish, like the format uses for normal/subnormal/nonfinite. Or using extra bits, sure.
They took a bit pattern that would have been a NaN if not for Infinity, and used it to store Infinity.
You might think of floating-point numbers coming in five varieties:
- normal (exponent not minimum or maximum)
- zero (minimum exponent, mantissa == 0)
- subnormal (minimum exponent, mantissa != 0)
- infinity (maximum exponent, mantissa == 0)
- nan (maxmimum exponent, mantissa != 0)
NaN is only the last category, strictly. Finite is everything except infinity and nan. Subnormal and zero are collectively denormal. The distinction between "subnormal" and "denormal" is a bit esoteric.
I don't know why GCC-12 was in the picture, but I hope all of you can understand one thing: python's source code must be compatible with GCC 6. And no so called "Threading [Optional]" features. It is all because CentOS is dead. CentOS has been sold to IBM. There is no CentOS 8 anymore. The free lunch is ended. And there wouldn't be another OSS project to replace it.
Linux kernel developers want to use C11. CPython developers want to use C11. But does your compiler support it? If you were a C++ developer, would you request C++17/C++20 also?
CPython must be aligned with the manylinux project. manylinux build every python minor release from source. Historically manylinux only use CentOS. CentOS has devtoolset, which allows you use the latest GCC in an very old Linux release. Red hat has spent much time on it. Now you can't get it for free because CentOS is dead. So the manylinux community is switching to Ubuntu. Ubuntu can only provide GCC 6 to them. If anycode in CPython is not compatible with GCC6, then manylinux will have to drop ubuntu too. And they don't have any other alternative. As I said, for now Red Hat devtoolset is the only solution. When IBM discontinued the CentOS project, most people do not understand what it means to the OSS community.
(I work for Microsoft. It doesn't stop me to love Linux.)
> If you were a C++ developer, would you request C++17/C++20 also?
yes absolutely. I think that it is wild that some people stay with older compilers solely because they happen to be on an old distro and don't want to update. Tying compiler versions to operating system versions is absolutely braindead. A compiler is just a program that takes text and outputs a binary which is supposed to work on anything with the expected ABI - you could even cross-compile from windows if you wanted ; at least I've done the opposite (cross-compile windows binaries from a linux host) a few times.
e.g. personally I mostly use clang-13, soon 14, for development with every possible C++20 goodies, and ship software that works back to windows 7, mac os 10.13 and until recently centos:7-era linux userland (recently upgraded to 8). There is zero reason to use an older compiler.
> I don't know why GCC-12 was in the picture, but I hope all of you can understand one thing: python's source code must be compatible with GCC 6. And no so called "Threading [Optional]" features. It is all because CentOS is dead. CentOS has been sold to IBM. There is no CentOS 8 anymore. The free lunch is ended. And there wouldn't be another OSS project to replace it.
the day centos:8 ended I replaced centos:8 with rockylinux:8 in my docker image for my builds and everything continued working fine with the latest GCC and Clang versions.
Honestly your pains regarding gcc-6 are entirely self-inflicted. Just ship a more recent GCC binary with the manylinux project or something, you can build it on centos 5 if you fancy in order to get it to work on older libcs
I agree, and especially in the context of Python. My Mac comes with Python 2.7 and 3.8, but it seems like almost everyone uses Homebrew or pyenv to locally install a newer version. I rarely hear of anyone going out of their way to stick with the system version outside of specific scenarios (like "IT locks down developer laptops") or such.
And while I'm keenly aware that Python and C++ are very different languages, if someone asked if I'd insist on using the "new" Python17 (aka 3.6), like that was wildly and unreasonably bleeding edge, I'd literally laugh at them.
If everyone just use the latest macOS and the latest iOS, I'm less concerned. But I haven't heard any IT manager enforcing that. macOS 12 has come out, but usually they still allows you using macOS 11 as long as you have installed all the security bug fixes.
For macOS and iOS, the problem is more complicated. Let's say I want to publish a package to pypi.org. The package contains some binaries compiled from C++ and it requires C++17. Then what is the lowest macOS version the binary can support?? It's very tricky that if the build machine which generates the package is macOS 11, then it can have Xcode 12.5. Otherwise it has to use XCode 12.4 or lower. And in XCode 12.4 it says std::optional, which is a C++17 feature, is only supported in macOS 10.14+. But if the build machine has macOS 11 and XCode 12.5, then the binary would be able to run on 10.13 too. And in whatever case, it can't support 10.12 or lower.
I believe you don't want to build tensorflow/pytorch packages from source. So the maintainers of these two OSS projects must consider the things above. If you wonder why they don't use C++17, this is the reason. For the same reason they want to avoid C11 optional features too.
So you are conflating tons of things here and none of it really makes sense. The reason why compiling for macOS tends to claim stuff like how C++ features are tied to operating system targets is because Apple doesn't really understand how to do toolchain architecture well and so have managed to do this awkward thing where the operating system--from their perspective--is this massive monolithic brick that contains a ton of functionality that is classically introduced by the toolchain, which I admit has some advantages but generally feels ill-intentioned.
The result of this is that Apple really assumes you are going to use the copy of libc++ that ships on the system, and it might be missing exported systems that are assumed by the library. But this isn't related to the toolchain you are using. I do remember some weird corner case with std::optional and I think it might have been that Apple forgot to mark something correctly as unsupported in a previous toolchain build? The newer compiler is thereby actually giving you the more correct understanding of what can actually be presumed to work on older systems.
But like, the core issue here is that you really shouldn't use Apple's stupid toolchain setup, nor do you have to: just embed your own copy of libc++ and call it a day. I routinely use the latest versions of Xcode's copy of clang (though I prefer to compile for macOS now using the copy of clang that comes with the Android NDK for various reasons: I recommend avoiding Xcode except to get their system headers) to target ancient systems with modern C++ features as I'm not relying on the system copy of libc++ to even exist or work (10.7's doesn't) much less be complete.
> Then what is the lowest macOS version the binary can support??? It's very tricky [...]
No it's not. The oldest supported version of macOS your binary will run on will be the one you set in your CFLAGS with the -mmacosx-version-min=10.x flag when you build.
The OS you are running the build on and the Xcode version don't influence that, you should just run the most recent Xcode you can, build against the most recent macOS SDK available and set that flag. You can target 10.8 from Big Sur and Xcode 12 with afaik no issues.
> The OS you are running the build on and the Xcode version don't influence that
Usually it doesn't. But in this special case it does. I have set the flag you said. When C++17 wasn't in the picture, it's enough. But now we are talking about how to enable developer using the new language features. New C++ features typically need new runtime. But the old systems do have it. Then one of the difficulty is how to figure out which macOS versions have it, which doesn't.
The std::optional / std::variant case is unfortunate indeed, it's sadly an argument for using non-std versions of these types. It's also possible to cheat a bit with libc++ macros.. -D_LIBCPP_NO_EXCEPTIONS=1 does the trick (but turns exceptions into asserts).
Thanks for the info. Would there likely be a downside to saying "if you want to use the precompiled packages, you need to be on macOS 11+ (but feel free to DIY if you're stuck on an older version)"?
I agree that there is little reason to not use a newer compiler to target an older distro, but there is one issue: the standard library. With newer language versions the standard library also grows. Older distros ship with older standard libraries, and they probably don't have the newer language facilities in them.
At work we sidestep this by building the new C++ standard library implementation against an old libc, and ship it with our product. I can imagine that this could be problematic for other software though, especially ones that normally ship with the disto.
> I can imagine that this could be problematic for other software though, especially ones that normally ship with the disto.
why would it be an issue ? that's how pretty much every software does on Windows and most certainly people can agree that things are much more sane there for the end-users
This is not a technical issue. But if you aim for the software to be packaged in official distro repositories, they possibly don't want you to have other backported dependencies, such as a new version of libstdc++, or you to vendor in those libraries.
In practice compilers and the things they compile are insanely large. If we started updating them more... who knows what bugs we'd get with different GCCs on different OSes...
Up-to-date Linux distros already compile things with latest compilers. The bugs are already caught there. And in my experience, with GCC and clang, updating to new versions consistently reduces bugs.
(The reason this is significant is because the manylinux infrastructure is responsible for approximately 100% of the binary Python extension builds provided as wheels)
I'm a little confused by this. I'm on the last Ubuntu LTS version (20.04) and the default GCC version appears to be 9.3, with 10.3 available as an option. So why are they stuck on 6?
As a user, if you build every python package from source, it's ok. But if you a maintainer of an OSS project and you need to publish binary packages for it, then you will hit the trouble. Binaries built on Ubuntu 20.04 can only support Ubuntu 20.04 and newer. So you'd better to choose an older Linux release to target broader users. Now most python packages choose CentOS 6 or 7. See https://github.com/pypa/manylinux/issues/1012 for more details. They need help!
Evidently the point of the manylinux project is to make it easy to distribute Python binary wheels for Linux, so that basically is their use case. Though they're not building for arbitrary old versions — they're building for a concrete set of old OS versions defined by Python standards. I think the idea is that you don't want to require everyone installing your Python module to have a full set of build tools installed, so they're providing a way to distribute a binary module that supports a guaranteed set of supported Linuxes.
The manylinux wheels distributed on PyPI need to be built against the oldest glibc imaginable, which is defined to be the glibc on some ancient version of CentOS (a different version for each different manylinux platform tag).
If you build your own wheels, of course no one’s going to stop you from building against anything.
This is how it used to be back when manylinux was separately versioned. Since https://www.python.org/dev/peps/pep-0600/, you just say which version of glibc you expect: manylinux_2_17_x86_64 etc.
Isn't it possible for more recent versions of GCC to build binaries (linked libraries or otherwise) that can be distributed and run correctly in older operating system environments?
(in other words: my understanding is that software languages and tools can continue to evolve while maintaining the ability to build backwards-compatible binaries)
I don't understand any of this logic: you can trivially run new versions of gcc on any distribution you want, and you can trivially use new versions of gcc on new distributions (which is sane) to target arbitrarily-old distributions. I routinely use bleeding edge compilers to target pretty ancient systems, and have for decades now... like, the entire point of gcc is that it is easy to compile it to run on whatever crazy system you have and then it should be able to target whatever other crazy system you have as long as you have a sysroot for it (which is of course somewhat easy to obtain as it is generally equivalent to whatever system you would have used to compile "natively" for that system).
As i see it, the real problem is the lack of a stable linux ABI for binary programs. Because linux does not have one, people need to play around with old distros and compilers to build something that will work on anything released after $SomeOldDistro.
> for now Red Hat devtoolset is the only solution. When IBM discontinued the CentOS project, most people do not understand what it means to the OSS community.
>> Build Python wheels for all the platforms on CI with minimal configuration.
>> Python wheels are great. Building them across Mac, Linux, Windows, on multiple versions of Python, is not.
>> cibuildwheel is here to help. cibuildwheel runs on your CI server - currently it supports GitHub Actions, Azure Pipelines, Travis CI, AppVeyor, CircleCI, and GitLab CI - and it builds and tests your wheels across all of your platforms.
Using an insecure C standard does not improve the situation, broken compilers neither.
C is broken from C11 to C26 by committee.
GCC was broken from 9 to 11.
Python has at least the option to use GitHub, which can eventually detect PR's with unidentifiable identifiers. Linux will need to use linters to detect homoglyphs or bidi attacks, because reviewing emailed patches is impossible.
They don't handle bugs. If the bug is reproducible on RHEL, they will suggest you report the bug to RHEL. It scares people away. At this moment, the manylinux community is considering Rocky Linux and Alma Linux, but it's very hard for them to make the decision because of lacking direct support.
The "manylinux" platform tag definition means GNU libc. musl libc systems are not within the "manylinux" Python platform, they are a separate "musllinux" platform tag.
See: PEP 600, PEP 656.
While the manylinux infrastructure project may use Alpine to produce "musllinux" binaries, Alpine will never be usable to produce "manylinux" binaries. So Alpine would only be useful in addition to another distro, not instead of another distro.
Alpine is a supplement. It can't replace the others. It hasn't been widely accepted by the community yet. I believe many python projects are not afforded to replace libc.
Which has a different release model than Centos, so it's not a replacement for the business cases Centos was used (basically: long term stability and support).
I just wish python would switch float and Decimal.
Most of the time float is just an implementation detail when you really want a decimal. I think the literals should be a decimal, and you could explicitly cast float when you actually need to do floating point math, or as an optimization. But I know it's too late for that.
Why do you say that "most of the time... you really want a decimal"? I can't think of any situation where I'd prefer a decimal.Decimal over an ordinary float, except for the textbook example of dollars and cents. In most situations, I'm pretty sure decimal.Decimals would be unacceptably slower.
If you write a decimal literal, the sensible default behavior is to preserve precision rather than throwing it out for speed optomization. Sure, there is a long history of programming languages mostly choosing the reverse default, but it's a bad choice for correctness.
That works well in Haskell, which turns literals into rationals (pairs of big integers) and calls an overloaded function fromRational on them implicitly. But Python can't do this kind of inference to give you a nice syntax, so I think this "sensible default behavior" would be a huge pain for users.
(Also, "correctness" seems like the wrong word: making rationals the default type would give you perfectly precise literals, but you'd immediately lose that once you perform almost any operation on them.)
If you write literals such as 0xa.bp-3, 1/3 or 22/7, why would those be allowed to throw out precision for speed? (That hex literal is valid C++17; the fractions aren’t literal in the standards, but IMO are literals in human heads)
⇒ If you think the “sensible default behavior is to preserve precision“ I think you would have to call for (at least) rational as the default non-integer integral type. (If you think of √2 as a literal, it gets more complicated. In the end, you might have to use a representation that’s similar to that used in computer algebra systems)
Edit: you could also require constants that do have an exact representation in your floating point format. That would be annoying, too, though, but maybe not too annoying if you had a way to specify “the number closest to this one”, say by requiring one to write ~2.1~ instead of 2.1
> If you write literals such as 0xa.bp-3, 1/3 or 22/7, why would those be allowed to throw out precision for speed? (That hex literal is valid C++17; the fractions aren’t literal in the standards, but IMO are literals in human heads)
IMO decimal rounding of those fractions, while not ideal, is a lot more understandable to a typical person. If 1/3 * 3 = 0.99999999999999999 that's annoying, but not crazy in the same way that the floating-point equivalent is.
Correctness is relative to expectations though. It might be worse for bug count to have some languages use decimals and others use floats. People might make even more mistakes.
You might as well use a Decimal at that point, which usually combines an integer mantissa with some inverse exponent of 10. Otherwise you're applying that exponent everywhere you use the value, whether you display it or use it in a mathematical expression.
I think the advantage of ints over Decimal is interoperability of systems.
Anything with a REST api or JSON doesn’t have native support for Decimals, so to use them you have to represent them in component form as an object or as a string.
Ints just make that easer at the slight cost at display time. Stripe is a good example of this.
> Otherwise you're applying that exponent everywhere you use the value, whether you display it or use it in a mathematical expression.
Slightly disagree, I think it’s only when displaying you have to convert to a “human readable” decimal form. They don’t need converting to Decimal for processing.
It’s also worth noting there are actually two none decimal currency’s still in use:
“Today, only two countries have non-decimal currencies: Mauritania, where 1 ouguiya = 5 khoums, and Madagascar, where 1 ariary = 5 iraimbilanja.”
> Anything with a REST api or JSON doesn’t have native support for Decimals
This is a very slight nitpick (that might be wrong) but I think it's perfectly in line with the JSON specification to interpret JSON numbers as decimals.
You are right, both the ECMA. standard and RFC leave the implementation of numbers open to the parser and language:
"JSON is agnostic about the semantics of numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange."
And most JSON parsers allow you to extend and change how types are handled. I think however from a developer point of view by standardising on ints you reduce the risk of making a mistake in your implementation.
I think the point is though that as long as your system doesn't need to handle fractions of a cent/pence/etc then using them as the base representation and only doing integer math makes the system simpler. It moves the complexity of currency to the display layer away from the business layer. Obviously there are times when you need to be able to handle smaller units and a decimal type would be correct there.
you're right that they would be slower, but if I'm using a high level language I already have made the decision to have things be slower, so that I can focus on my logic instead of how the hardware works. I think maybe what I really want is a number type that abstracts away ints, floats, fractions, and maybe others. Then once I have it working I can optimize it to use specific types if it's not running fast enough.
In CPython 2, a `str` was bytes period; no encoding was enforced, these bytes were not necessarily valid UTF-8, and a `unicode` had either UTF-16 or UTF-32 in-memory representation (decided at compilation time). In CPython 3, a `bytes` is bytes and a `str` has ASCII/UTF-16/UTF-32 in-memory representation (decided at runtime).
So CPython 2 strings were not “bytes of UTF-8”, and that is why strings `'\xfc\xf7\xe9'` worked fine, they even printed fine in the proper environments.
My understanding is that even decimals still do rounding, so it doesn't give users accurate arbitrary-precision numbers.
A default number format that is much slower than built-in floating point but still subject to unexpected deviation from real arithmetic numbers doesn't seem like a win for users.
It gives users rounding that they're used to, from calculators etc. Floats defy common expectations. The fact that we still have to talk about this stuff is exactly the point.
So much of computing practice is based on the idea that floats aren't quite real numbers, they're like fuzzy analog voltages almost.
Almost nobody thinks about float rounding, they just use something precise enough that it doesn't matter, or use integers.
Why break that standard? Everyone expects computer numbers to be just a bit off. If we didn't we would do a lot of stuff differently in incompatible ways.
Seems like it would hurt the Python ecosystem if people wrote stuff that assumed floats were perfect decimals, and encouraged code that depends on that.
You'd get stuff like serializing to JSON, and reading in some other language with a JSON parser that doesn't know the numbers need to be precise.
I kinda hate writing real programs in C. The only time it's vaugely enjoyable is with sub 100 line embedded stuff. Otherwise, it's near the bottom of my list of languages I'd like to use.
Not actually at the bottom, I'd choose it over Forth and ASM, but.... I haven't written it for anything but a microcontroller in years.
I don't really do much outside of Python and JS in general these days. Python's performance is basically the same as C, because... There's already a C extension for everything!
I'm really not even a fan of compiled languages in general especially not for open source work. Things like plugin architectures are a lot easier in dynamic everything is a dict type languages.
But C sure is good at being ubiquitous! I appreciate the heavy standardization.
I would stay away from writing any program in C unless it is absolutely 100% necessary for your use case. For example, if you’re working on a large existing C code base like the Linux kernel.
There are much better modern alternatives to get comparable speed for new projects. Many projects that used to be written in C are being rewritten in either C++ or Rust for better memory safety and reasoning about the memory model.
Disclaimer: I did my PhD (several years ago) in a systems and networking lab and most of the code I wrote was in C. Now for example modern kernel modules tend to be written in Rust if possible, while such an option was not available while I was working there
Glad to see Havelock Vetinari, ruler of Ankh-Morpock on Discworld, is such a renaissance man as to contribute insight into standards support of compilers.
(see also other prolific pseudonymous Linux contributor "George Spelvin <linux@horizon.net>")
I can't believe Python is reluctant to mandate floating-point when they require threads. I'm still on Python 3.6 because there's no simple consistent cross-platform way to do threads via system interfaces.
What exactly is the bikeshed here? Naturally Python doesn't use the newest standards the second they are released because that would be chaos. So they need to consider when they use newer standards. In this case they found bugs and had a discussion that ended in everyone supporting adopting a newer standard.
What exactly do you feel is the "which color should the shed be" point of this discussion?
Leaving aside whether this is a bikeshed or not, I'm not at all surprised that a community that passed through the 2->3 project would subsequently be concerned about even the barest repeat of that.
This is a very good point. At my workplace we haven't had a 2->3 level project event, but our discussions _are_ always shaped and influenced by not only how previous agreements have worked in practice, but also how previous discussions (and arguments) leading up to those agreements played out.
It is to the point that a sizable amount of my value is derived from being having been around long enough with my ear to the ground to be able to recount the logic behind arguments from each side and giving recommendations on how to broach remaining tensions.
At work we had a very-late-technical-debt-move from “Python 2.7/Django 1.7” to “Python 3.8/Django 3.2” (with lots of associated libraries moving forward too) and we managed to reach a point where our codebase was common and functioning in either environment; the process was… bloody, to say the least, and obviously I wouldn't wish for others to ever have to go through a similar ordeal, but it was feasible.
In any case, I don't think that in the Python ecosystem there will ever be another jump of the same magnitude as 2 to 3.
> Multiplying infinity by any number is defined to be a NaN for IEEE 754
No, multiplying infinity by any number other than zero or NaN produces an infinity. Multiplying infinity by zero produces a NaN.