> However, rather than use the native toolchains directly, such as Xcode for macOS, we delegated the creation of platform-compliant binaries to py2exe for Windows, py2app for macOS, and bbfreeze for Linux.
I wish the authors of more Python tools would deploy standalone applications. I do not like having to maintain various sets of Python installers/package managers (because every Python tool seems to use a different installer). Especially on cloud servers that often lack a whole set of dependencies that Python developers just seem to take for granted.
I’m not a Python developer. I don’t have the time or the inclination to repackage various tools and untangle dependencies.
Given the choice between trying to figure out how to get multiple Python tools to behave together, or using another tool, I’ll almost always choose an alternative.
You install all 3rd party dependencies into some directory. The command line entry point is then a simple BASH script which sets the PYTHONPATH to the appropriate installation location and then does the appropriate exec call.
You then have a functionally portable python installation.
This exchange is a good demonstration that the word "trivial" has lost its meaning in the same way "literal" has. Much like I usually hear someone use term "literally" for figurative emphasis, these days I mostly hear the word "trivial" used to describe something which is clearly nontrivial.
Math textbooks have been doing this for decades, but it's leaked into common parlance with online discussion.
I believe in the mathematical community the potentially offensive term for outsiders is non-trivial, used in a technically correct manner, but oftentimes applied as synonymous of epically hard, which unexpectedly throws people off, especially the author of that non-trivial work!
I think the best translation of "trivial" when used amongst mathematicians is that it is something you should be able to figure out with your current knowledge without too much difficulty (though it might require an hour of thought). Said in another way, you don't need to learn/develop new tools or techniques for something that is trivial.
Of course this is not when most people think when they hear the word so really it's a term of art that should probably be avoided when talking to non-mathematicians.
> I think the best translation of "trivial" when used amongst mathematicians is that it is something you should be able to figure out with your current knowledge without too much difficulty
As i read through my old uni maths notes there are often wild leaps from a to e along with a little scrawl saying "trivially" or "obviously". They may have been true once, but god dammit 21 year old me was a knobber
It takes approximately a year to learn to walk after you are born. But almost everybody figures it out. And once they know, they practically never forget. So I don't know, is it not trivial?
I have never had to support all of the mobile environments of DropBox nor the scale, so I cannot claim that my 99% solution would ever meet their 99.999% requirements. But I have been able to package Python apps for Mac OSX, WinVista, Win7, Ubuntu, and CentOS at the same time using that strategy.
Being charitable, I think the parent means that the python developer sticks all the dependencies in a directory and creates a bash script to set PYTHONPATH and launch. The user receives a directory rather than an executable, but only has to use the bash script, rather than worry about any of the Python in the directory.
There are a lot of minor differences, but the biggest difference is that you're able to be completely independent of the system python install. You bundle a complete python interpreter, all libraries needed, etc.
The user doesn't need to have python installed at all, and if they have 2.x instead of 3.x or 3.3 when you're expecting features that are only present in >= 3.5, it's no issue.
This may sound trivial, but it's a _huge_ deal, particularly when you need to deploy something that runs on multiple different OSes and versions of OSes.
Other than that, the "directory full of libs, binaries, and code" approach is a lot easier to package into something that will work well with the native package manager (e.g. an .msi for windows, etc).
in the ideal, those making the tools would provide this packaging, instead of asking all users to have `pip` properly set up. Similar issues exist w/r/t npm in my opinion.
I work in a Python shop. The Docker images we build are nearly 1 GB. I just built a Go service whose image is only 2.5MB. Admittedly it’s much simpler than the Python apps, but even a complex Go app would never reach the size of our Python app for a number of reasons:
1. Python apps require a distro base image while Go can run on scratch
2. Python images ship with the full standard library; not just the bits you import
3. In Python, if you add a dependency only to use 1 function or variable, you still end up with the whole dependency in your Docker image, while I’m pretty sure Go’s linker strips unused code.
I agree, I'm not a python fan. Python has a few good libraries I can't find in other languages, but it's slow, bad for multi core utilization, hard to distribute, is very wasteful with how many dependencies need to be included, has a terrible package manager and I prefer static strong typing.
I'd like type annotations to be used to optimise performance. After all, if it has been statically verified that a particular variable is always an instance of class X, why not use that to optimise code?
This is an argument for type annotations to be integrated into every dynamically typed language, rather than tacked on via an external tool.
TL;DR - I continue to root for Python's typing story, but it's just not there yet.
I have, and I wanted to like it. On its face it seems like it should be a lot better than Go's--after all, it supports generics and union types! But it falls over in trivial cases, like recursive types (i.e., there's no way to model tree structures such as JSON or linked lists). A few other hard/impossible/confusing things come to mind:
1. How do you declare a typevar for a certain scope. If I define a type parameter `T` for function `foo`, I only want `T` to be scoped to `foo`. I don't want the type checker getting confused with `T`s for other functions/classes/etc.
2. What is the signature for a function that takes args/kwargs?
3. It straight up doesn't work with popular libraries like SQLAlchemy (last I checked, these were simply not supported because the likes of SQLAlchemy are "too magical"--this is a fair take, but frustratingly limiting for users).
These are just a few because my memory is poor, but I run into these sorts of things by the dozens every time I try to use mypy. It's just not ready for prime time. Go's type system is limiting, but its limitations are much more predictable and even less limiting (it turns out recursive types and poor-man's union types are quite a bit better than first-class, non-recursive union types, for example).
It's also tacked on. It's a bit too optional. If one team member doesn't care about typing he makes all his colleagues do the work to make his code work with mypy.
While it can't compete with 2.5MB, python:3.6-alpine (which includes points 1 and 2) weighs less than 100MB. You need a lot of Python code to get to 1GB.
Fair point. Our largest Python image has only 255 Mb of Python dependencies and ~50 Mb of source code. If we could use alpine (our compliance auditors strongly prefer centos base images), it would only be ~400 Mb. This is easily an order of magnitude bigger than an equivalent Go program, but still quite a lot better.
Is that really just Python code?? Must be over a million lines, no? I've worked with Odoo, which is a bit of a kitchen-sink (ERP, CRM, POS, sales, accounting, invoice, stock management, manufacturing control, website builder, marketing and a bunch more) and its Python code weighs just 15MB, the rest is JavaScript or data files.
It's not just souce code--it's also docs and test code and other things that are tedious to omit given our current Docker image hierarchy and repository structure. Our docs are largely Sphinx docs in Python docstrings; a decent minifier could probably reduce this, but it's probably not worthwhile for a ~5% improvement on overall image size.
I’m almost certain that only applies to individual compilation units and not the whole AST. In other words, if I use reflection in my main package, code pruning still works on dependencies, which is quite a lot better than the Python situation.
I hope some day some big corp decides to write a decent (open source) Python to C compiler, which makes optional optimizations based on type annotations.
I find this project: https://github.com/Nuitka/Nuitka very interesting, but its written and maintained only by a single person and I never got it to work with any of my apps.
Cython is a pretty good Python to C compiler and in its latest release is using type annotations... the thing is, you shouldn't need to compile your whole program (compiling is really slow and interpreted Python is fast enough for most of the code).
Good point. I never understood why the developers of every language don't make it trivial to build a .exe or .app file that users can double-click to run. Seems like the Python team is penalising developers for using Python :)
It can internally use an interpreter, JIT, bytecode or full native code like C. That's a different discussion. Just don't make it a pain to distribute.
Even when I am a python developer, there is still a difference between (a) this dependency that I explicitly rely on and that I need to sort out packaging issues for, and (b) code I treat as a black box and simply expect to work.
More often than not, a typical developer's python environment tends towards https://xkcd.com/1987/
Interesting write-up, though it leaves me terrified how a relatively small, more or less single purpose application like the Dropbox client has over 1 million lines of code in Python alone.
This led me to find another loosely related but very entertaining piece of dropbox history. The original "Show HN" post: [1]. It's funny to see so much skepticism knowing now what the company became.
Yes, this is one of the classics - right up there with the "less space than a Nomad, no wireless, lame" comment (which wasn't on HN I don't think - but we all know it could have been :)
Edit: I see the motherlode is in place earlier in the thread "Especially when you could build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem"
You might only use a small part of Dropbox, but I bet there is a lot of functionality that you don't care about but which is critcial to Dropbox as a business/product for others.
The fact that you think it's small means they're probably doing something right!
(FWIW I don't use Dropbox myself, but I definitely had people ask me why Google needed 3,000 employees back in the day. Apparently it now has nearly 90K employees.)
>There's also a wide body of research that's found that decreasing latency has a roughly linear effect on revenue over a pretty wide range of latencies for some businesses. Increasing performance also has the benefit of reducing costs.
I wish he cited some of that research, because Google doesn't show much except for this amazon study with the 100ms.
I'm especially interested if there's any research on engineering tools and their latency (long build times) etc., which are chronically under addressed in quite a few large corporations. I'm just wondering if there's some studies that would make the case for me if I were to present this to management.
Especially when you could build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem!
Sarcasm doesn't translate well on the internet, so I'm really not sure if suggesting using CVS (of all things) over a mounted FTP share as a replacement for Dropbox is a joke!
What does the Dropbox client do for me other than syncing files and exposing a bit of the online functionality such as generating share links? (Serious question...)
We can simply start by asking what does “syncing files” include?
Watching files. Keeping backup of files. Keeping conflicts resolved. Watching Selective Sync files and folders. Watching Smart Sync files and folders. Notifications for synced files. Etc. etc.
There’s way more the client does than what I mention.
I don't think there is much code for conflict-resolving in dropbox. Usually in case of conflicts it renames one of the involved files and add a message about conflict and the date to the name and moves on.
It’s funny, after Mojave was released recently I thought it might finally have a python3 installed, even if it’s not the default. Nope, still Python 2.7.
This is good for me since I deployed a Python-dependent app under the assumption that the system Python would be stable and reliable. It allows relatively complex things to be achieved with a tiny download package.
I’ve been prepared to adopt Python 3 for awhile but it just isn’t necessary when using system defaults.
It'll be very interesting to see what happens with the next macOS when Python 2 will be EOL (which is roughly 3 months after its release). The upgrade to Python 3 is long overdue.
Python being EOL sounds scary but actually won't matter to Apple. They already apply custom patches, they can carry on running python 2.7 forever, with minor bug fixes where really required.
the `xattr` terminal command relies on the system installed Python, though this seems to be the only example.
If you look at the source in `/usr/bin/xattr` it does some work to deal with different versions of Python. All the work ultimately gets handled by the xattr module preinstalled with the system Python. This module has Apple's copyright in it and is different than the `xattr` module on pypi.
Wonder how this Python one-off in macOS came to be.
As a python engineer still living in 2.7, it's great to see major codebases making the move. I know I have a similar experience coming in my future, and I appreciate hearing what seems like more or less a success story come from it.
>>> On the surface, the application would more closely resemble what the platform expects, while behind various libraries, teams would have more flexibility to use their choice of programming language or tooling.
I'm always fascinated by how the implementation of the core principles of an application is dictated by factors alien to it, such as OS, company organisation, etc. Therefore, the job of coding is often a small part compared to the amounts of trivialities, project management decisions, customer's ideas, corporate policies, etc. Although my soul is a coder's one, I always realize how much coding is just a small part of what I call application development.
Here's a nice overview of how Facebook migrated their codebase to Python 3. While it's different in nature (server side vs. client side), it's rather interesting.
I hope this kills (or helps killing) the 100+ thread count I have always seen in macOS. It surpasses any other thread count from far more important/sophisticated processes.
I'd say that's a waste (if not abuse) of the system's resources and scheduling system.
The idea behind embedding is you might have a Python shell in a larger app. But you can also use it to tightly control the execution of the interpreter.
I'm surprised 10% wasn't enough - 10% bottom-line improvement in programming language implementation is normally massive. Twitter is singing from the roof-tops about 10% improvement in Java performance from the new Graal JIT compiler.
I would imagine that for code that is performance sensitive enough that a 10% improvement matters, they would be porting to a language with saner performance instead?
I'm by no means an expert on the subject, but isn't there an advantage to having a JIT compiler use LLVM?
On another note, I believe one thing that has been problematic for pypy adoption is that it does not automatically work with C extensions or Cython, and generally if someone already had performance issues with CPython, they would have written some C/Cython extensions?
It's a tool that allows you to write an entire interpreter in RPython (a subset of Python) and then have it build a native binary with a free jit compiler included, with the specifics of your language encoded within. The reference implementation for this project is a Python interpreter.
Maybe because 10% on a server is much more valuable than 10% on a client.
On a server, you’re paying for that 10%. On a client, you’re not. If it was 10% for nearly free then sure - but maintaining a separate implementation of a language is costly.
and also calculating the cost savings if any. On a technical level the whole exercise seems to lead to a slightly more elegant, consistent code base, but it's still a long way from earning or saving any actual money.
Great write-up and those two graphs are interesting. It's cool to learn how different companies treat their beta users. I wish this article touched upon more of the technically difficulties with switching from Python 2 to 3 too.
As other commenters have noted, a lot of their Python use for large scale systems was an artifact of history and available choices at the time, but from my experience during my time there and following as an outside observer since leaving, they seem to make reasonable infrastructure and language decisions for their core product.
Go wasn't created until 2009 and Dropbox already had tons of Python code by then as well as extremely accomplished Python programmers (they hired the creator of Python only a few years later). I also don't think Go would have allowed the tight integration with, for example, OS X where Dropbox actually superimposes their icons onto your Finder icons. As far as I understand, it was only because of smart programming and use of OS X's Python / Ruby bridge functionality that they were able to do it.
Performance depends very much on what you are doing. Native Python code is much slower than Go, but Python code execution is not the bottleneck in many Python programs. NumPy may be faster than Go and disk IO is the same speed in each.
Go performance would be next tier from Python but memory usage a little less so. The Go team has made great strides in that regard from what it was. The runtime is always getting better, and it's fun to watch.
Rust or native platform (Swift/C# .Net Native, depending) are going to be even more ideal for battery usage.
Proper algorithmic choices are even more important and paramount no matter what is used. It goes without saying that poorly implemented Rust can be bested by well implemented Python.
I'm not sure why it is downvoted. This is true. I write both Golang and Python as part of my day job. I love both languages, but this is definitely the truth.
Go is more verbose. It also gives you the wonder of the compiler telling you about doing stupid things. That doesn't make it a worse language.
go has something in the same general area of error handling called panic/recover. It's not exceptions in the normal sense, and devs expect nothing should panic across api boundaries.
Only in the Google sense i.e. "for building big systems… like servers". Not as in "close to your operating system" or "you should write an operating system in it".
There are few comments worser on HN than "why didn't you implement in [rival language]?" as if [rival language] was a default choice that could only be deviated from if it was extensively motivated.
The claim that Go is "perfect" doesn't make things better.
I wish the authors of more Python tools would deploy standalone applications. I do not like having to maintain various sets of Python installers/package managers (because every Python tool seems to use a different installer). Especially on cloud servers that often lack a whole set of dependencies that Python developers just seem to take for granted.
I’m not a Python developer. I don’t have the time or the inclination to repackage various tools and untangle dependencies.
Given the choice between trying to figure out how to get multiple Python tools to behave together, or using another tool, I’ll almost always choose an alternative.