More and more I want someone to create a new language that amounts to a strict subset of Python, with mypy built-in, and is compilable into machine code. Python has by far my favorite syntax, community, and in my experience leads to the greatest productivity. There just happens to be a lot of overly dynamic features, that aren't even used by most, but used just enough to hold back optimization and structural improvement.
What would you say are the biggest blockers to becoming realistic? I saw on the README they need tools in the Python ecosystem to start utilizing them, which I can help with starting with isort, beyond that I'd want to do whatever I can to help the project succeed.
Probably the biggest issue is that it can't run many libraries and frameworks because they use a lot of dynamic features, i.e. reflection and metaprogramming.
To be more specific: getattr, operator overloading, descriptors, heterogeneous dicts, decorators, etc.
Type checking and metaprogramming are fundamentally at odds [1]. Dynamic languages like Python have more of a focus on the latter. They later added type checking, but it comes at the "cost" of ruling out the more idiomatic metaprogramming and reflection features. In other words, static typing makes your source code bigger.
Well, optional typing to some degree lets you have the best of both worlds -- you can skip type checking of the hard parts. But optional typing doesn't let you compile your program to make it faster -- you need a fully-typed program for that.
----
I'm doing something similar to mypyc with https://www.oilshell.org/ (I actually visited Dropbox and chatted with them about it back in the spring.)
The difference is that I'm compiling Oil's Python source to C++ rather than to Python-C extension modules. So it doesn't depend on the Python runtime. It's not done but it's working well so far, and it's given me a lot of appreciation for which dynamic features Python programs actually use! (both my own and others)
Also note that mypyc was used to speed up mypy, which is a type checker. A type checker is a very particular kind of program that's different than 99% of the use cases of Python. So success on speeding it up is super impressive but it's not clear it generalizes.
The same is true for Oil -- my translation work doesn't generalize to arbitrary Python programs. Lots of people have died on that hill because it's really hard. You have a hard tradeoff between the kinds of Python programs you can support and the speedup you can give them. There are 10-20 projects over the last 2 decades at various points along that spectrum. In addition to mypyc, Oil's strategy was also inspired by Shed Skin, which is an impressive but mostly dormant Python-to-C++ compiler.
----
So in short I would say the problem is that nobody will be able to agree on a subset. You will have a lot of different fragments of Python geared toward particular use cases.
But Python will very often be more appealing than any of those fragments because it has a bigger ecosystem. One thing that I've appreciated more and more while designing a language is how much the network effects and inertia matter. It's why we're still using C and C++ after almost 50 years. I'm sure every day there is still a lot more C++ written than Go, Rust, Swift, and D combined, etc.
Python has a similar network effect and it will be around basically forever in its current form. Software doesn't really get rewritten or reduced -- more stuff just gets added on top.
> type checking and metaprogramming are fundamentally at odds [1]
I would say that dynamic metaprogramming is at odd with type checking (and optimizations, and in general understanding the behaviour of a program statically). But of course metaprogramming can be done perfectly fine in a statically typed language.
There isn't anything specific of static type systems in those two links. Both ocaml and rust had powerful plugin systems that exposed the internal of the the compiler AST and they decided that they do not want to expose it as a stable interface. But exposing implementation details is not required to have a powerful metaprogramming environment. As far as I understand rust macros never did expose these internals and had no such issues.
Also compare with C++ metaprogramming (whose syntax is certainly awful, although it has been continually improving), but works perfectly fine with its type system (in fact most metaprogramming in C++ is done via the type system).
On the other hand CPython also exposes the runtime internals to plugins and de facto that prevents the language from evolving and alternative implementations to gain a foothold, so the issue of exposing implementation details preventing language evolution is not restricted to static languages.
edit: the hard part of typing and metaprogramming is making sure your metaprograms are well typed, i.e. the generated program is guaranteed to typecheck. This is great, but it is not a strict requirement, if you are happy with syntax macros a la rust or unconstrained templates a la C++, there is no particular issue. Your genrated program will be still be typechecked at compile time, which is still better than having a runtime error because of a bad metaprogram.
The point is that they all struggle with the design of metaprogramming. Why are there 4+ different systems in OCaml as "addons" whereas in Lisp and Python it's integrated in the language?
It's exactly analogous to types being integrated in OCaml, Rust, etc. whereas in Python, JS, Ruby, and PHP static typing it's an "add-on".
I like Python, but I often wonder how many developers use Python because they actually use dynamic language features versus just liking the languages' clean syntax and library ecosystem. I'm surprised languages that offer both REPL (for development) and AOT native compilation (for production), like OCaml, are not more popular. Evidence that syntax matters, I guess. :)
mypy and mypyc are interesting but their compile-time checks and optimizations are still hampered by Python's dynamic language semantics.
Don’t underestimate inertia. I’ve worked with Python and Django for seven years. I know the libraries in the ecosystem. I know the framework. It’s far easier for me to start a project with Django than to learn another framework or language.
I think a great deal of this sort of thing could be done by just doing some eval in a dynamic state before you stop the vm and compile its stable state, rather than the actual source code.
I think you missed part of the point of what the article was trying to say - or rather, what they hoped to do with this strict python. One of those things being some form of hot code loading. A snapshot of the state can't be incrementally rebuilt - it's very much all or nothing; whereas if we know or modules are side effect free, or at least some useful part of module loading is, we could cache that part and get faster start up times on incremental changes.
I have not used TypeScript, but looking at it's documentation the syntax for type annotations look identical. Would you be willing to expand on why you think its approach is better / how it's different?
> Inspired by Scala language [5], this proposal adds operator __or__() in the root type. With this new operator, it is possible to write int | str in place of Union[int,str]
Watch the difference: A variable that can be an object with two elements (one of them being a list of strings), or a tuple of exactly two strings.
// typescript
let t: {a: string[], b: number} | [string, string]
# python 3.8
from typing import TypedDict, Tuple, Union
class SomeTypedDict(TypedDict):
a: List[str]
b: Union[float, int]
t: Union[SomeTypedDict, Tuple[str, str]]
I had to google a bunch to figure out how to write the Python version, whereas the typescript one was completely natural to write. It takes one line and requires no imports. The interface is inlined. All of this also makes it more readable when you come across it eg. in an IDE tooltip.
Granted, in python, I'd call the use of a typed dict a smell. If you're able to spend the time creating the typed dict, just promote it to a dataclass. Using python ~3.9, this will look like
@dataclasses.dataclass # or @attr.s
class MyStruct:
a: List[str]
b: int|float
t: MyStruct|Tuple[str,str]
But MyStruct will be an actual object that can be manipulated as an object. And if you want to accept any object that fits that interface, instead of just instances of MyStruct,
class MyStructTmpl(Protocol):
a: List[str]
b: int|float
In JS having the typed-dict type makes sense because you're often working with arbitrary objects with who knows what attributes, but in python that isn't the case. There's fairly succinct and powerful tools (now, anyway) to define record types.
I'll give you that in controlled code the use of typed dicts would be a symptom of a code smell, but in less controlled environments where you're dealing with eg. JSON inputs, form inputs, SQL table results and so on … not so.
I'm also not onboard the "it's a code smell, it doesn't matter" train. IMO if python adopted the typescript typing syntax we'd all be better for it.
I also forgot to mention the atrocious typing syntax for functions. Once again, typescript is a lot more succint and readable.
> but in less controlled environments where you're dealing with eg. JSON inputs, form inputs, SQL table results and so on … not so.
I more or less agree with this, but then again, IMO you should be isolating the less controlled code behind a controlled api. And the marginal value of converting `_ConvertQueryResultDictToQueryResult(qr: Dict[str, Any]) -> QueryResult` to something that uses a typeddict instead (which may not be possible, since that function is probably generic) is low.
> I'm also not onboard the "it's a code smell, it doesn't matter" train.
Emphatically, this isn't what I'm saying. What I will say is that ergonomics encourage certain methods of development. From experience, I'm strongly against the pattern of using a dict as a weak struct. The best comparison I can give is tuple -> namedtuple -> attrs. Namedtuple has absolutely valid uses (when you need tuple semantics, usually for backwards compatibility). But people often use it for any record type, because it's easy and familiar. Dataclasses are usually better, and I'd be happier (and the average python code would be better) if the friction to add a dataclass was lower than the friction to add a namedtuple.
Similarly, if the friction to use a dict in place of an object is much lower, people will be encouraged to use dicts in place of objects. This isn't a good thing. That doesn't mean that we absolutely shouldn't try to improve ergonomics across the board, but I'm a strong believer that the language should make doing the right thing easier than doing the wrong thing, and this is often (but not always!) the wrong thing.
Yeah I get what you're saying. And indeed. I seldom use dataclasses even though I should. Or namedtuples for that matter. It feels like the fact they're an import away makes them harder to use.
strongly agree. I love the extra clarity type annotations bring to the code, though going back to the start of the file every time I want to add an import is slightly dissuading.
> I think the killer language will be typescript with access to both the python and JavaScript ecosystems. We'll see what that looks like.
I think this is an extremely good idea. Python is horrible but forced on a huge number of developers because of its ecosystem ... I think a bridging layer from typescript to python could be built in a way similar to swift’s Python Interop — and I don’t think it would require any special language support ...
I think could actually make a better/easier to use/more robust design than Swift by requiring all interactions with the python interpreter from node be async.
> Python is horrible but forced on a huge number of developers because of its ecosystem
This is a really interesting perspective to me. Coming from Python circles, I've heard too often how horrible JavaScript is as a language and how it's only used because the web has dictated it. Doing web development, I've used both, and generally am inclined to agree. I know TypeScript add some niceties on top of it, but it still is stuck with JavaScript baggage. My perspective has always been that Python is by far the better language, which is why people have written that eco-system in it despite the fact it doesn't have a built-in monopoly of the browser.
I don't think there is any sort of long-term future in anything "Python". I think a successful modern language has to have the potential for efficient concurrency baked in, which isn't really possible without breaking compatibility, and the Python community would never survive another round like the 2->3 transition. (And I'm not convinced the community really survived that one either, given the amount of ongoing bitterness about the whole situation).
Python is very healthy at the moment, and still growing! However, I think that makes it even more important as a community that we don't rest on our laurels and we fix the issues we do have. CPU concurrency is definitely one of those issues.
In my assessment, not really. They are surprisingly spot on (for the top spots and singling out contenders).
Not to mention the language landscape is almost static at the top. Nobody's gonna came and take Python, Java, C, C++, JS, out in the next 10-15 years...
Only a huge self-blunder, like the Perl 5 -> 6 transition, and only at much more volatile time (when paradigms change, e.g. when web dev changed from CGI, Perl 5 had already lost the web framework scene to PHP, Rails, Django and the like even before losing its main niche back then - admin work) can do any serious damage to a top language...
Well, for instance, TIOBE has Rust at #34, below such mainstays as ABAP, COBOL
Do you believe that to be incorrect? I think you're probably underestimating how widely used those languages are in massive, "boring" companies around the world. Rust may be the new cool kid, and may even be the future, but the number of companies around the world that have adopted Rust for anything significant today is minuscule.
It has Groovy above Ruby.
Again, Java is everywhere, and many Java shops have added Groovy to their workflow where it makes sense. Ruby is barely used outside a small of number of tech companies.
Well, Rust is probably well under COBOL, that's for certain.
Not in momentum, but there are tons of installations, and billions of lines of code in COBOL ever churning. If LOC was the main criterium (and not just a factor), and if COBOL projects were hosted in GitHub, most languages with be dwarfed by it.
And Groovy is semi-popular in the Java world, which is huge itself.
But as I said, TIOBE is very good in the top-10 languages, and for spotting new major contenders (by how they jump up spots).
It's not great for relative ranking of the longer tail of languages above the top-10 / top-20...
> I think a successful modern language has to have the potential for efficient concurrency baked in
I agree with this!
> I don't think there is any sort of long-term future in anything "Python".
I disagree with this :)
I think Python has efficient IO concurrency built-in already with async, and I feel it is likely that it finds a way to work out CPU bound concurrency long-term, as projects like sub-shells with channel communication demonstrate.
Efficient CPU concurrency needs either fine-grained locking (and removal of GIL), or data immutability.
Both seem rather hard to implement.
I predict that for CPU-intensive tasks, you'll keep using extensions in native code (like numpy or pytorch), or keep passing serialized objects through queues in multiprocessing setups.
https://github.com/jreese/aiomultiprocess
The above is suffecient at FB scale for fairly intensive processing. Python is also sufficient for running quite a good bit of Instagram. I know some startups like to deploy over-engineered solutions but in reality Python is sufficient in many use-cases. You can always drop down to Cython if you have some hot path you need to optimize. (or Rust).
Aside from having a two decade history of using C#, types is the only thing preventing us from going full Python. Even so, the dynamic types in Python are more often a benefit than a disadvantage because Python is so great at handling them automatic.
We build our employee database, and from there our IDM, from a singel XML file in a really shitty format + three txt files in even worse formats (they are single line output files from an old mainframe system predating sap). We used to do it in a rather complicated Microsoft SSIS workflow with a lot of C# services. All in all it’s a 30 minute nightly runtime. I recently replaced it with around 500 lines of Python and a 1-5 minute Runtime (sometimes at the beginning of a school year we’ll see changes to around 1000 positions).
Python eats the XML like it wasn’t shit. It takes things like terrible date formats, we’re talking the output of a SAP free-text box shitty, and ports then seamlessly into a SQL date field. This alone was a nightmare in C# and Python just does it.
Still, after two decades of strict types it feels dangerous.
I can imagine the next big programming language will be one that is split into two language-variants: the "low-level-variant" and the "high-level-variant".
The high-level-variant is a dynamic language with optional typing, which is good for scripting, fast prototyping, fast time-to-market, etc.
The low-level-variant is similar to the high-level-variant (same syntax, same features mostly, same documentation), but it has no garbage collector, typing is mandatory and it runs fast like C/C++/Rust. Compiled packages that are written in the low-level-variant can be used from the high-level-variant without additional effort at all. The tooling to achieve this comes with the language.
A key consideration here would probably be the expression and passing around of managed instances spawned in the high-level variant through low-level code. Would you explicitly retain and release them? -- etc. I think it should be an ergonomic solution for this language to provide an edge over just using C / C++ / etc. with Lua / Python / etc.
Very much this! For numerical computing, Numba + llvmlite attempts to do it.
I don't know however if this approach could be extended to other domains - say making a web framework. Given, python classes let you do so much tinkering, any attempts to port existing code will probably need a lot of rewriting?
> I've been tracking nim, and would agree it's the most promising so far! I feel though that it's trying to be too flexible in many ways. Examples of this include allowing multiple different garbage collectors and encouraging heavy ast manipulation. I'm also afraid it is different enough to keep it from attracting a significant amount of developers from the Python community. Nonetheless, it's something I plan on using and contributing to, since it's the best option so far.
Though, now that another commenter pointed out mypyc: https://github.com/mypyc/mypyc I believe I'll invest my limited free-time in that project instead, as it will allow me to stay within the Python community and eco-system that I love so much.
It's certainly interesting to use! However, it's type checker still have a lot of work to go, since you can easily segfault due to using a nil reference.
I completely agree. With python I need ten packages. With the shit show of JavaScript I need 100 conflicting packages. Why bother on a backend framework like js. it's a worthless language for backend development
I use Cython a lot! But mostly to speed up existing Python code, and build C-extensions faster. I don't see it as a strict subset of Python or a new language to build a community around. Nuitka I just started experimenting with to build standalone Python executable, and I really like the direction and roadmap they are following. In the end though both of these technologies seem like ways to somewhat speedup existing Python code and not attempts to introduce a strict language subset that would allow the greatest amount of optimization, and finally fix long running issues, like the inability to have multiple versions of a package installed.
As far as I understand it, RPython isnt' really meant for actually writing programs in:
> Do I have to rewrite my programs in RPython?
> No, and you shouldn’t try. First and foremost, RPython is a language designed for writing interpreters. It is a restricted subset of Python. If your program is not an interpreter but tries to do “real things”, like use any part of the standard Python library or any 3rd-party library, then it is not RPython to start with. You should only look at RPython if you try to write your own interpreter.
I've been tracking nim, and would agree it's the most promising so far! I feel though that it's trying to be too flexible in many ways. Examples of this include allowing multiple different garbage collectors and encouraging heavy ast manipulation. I'm also afraid it is different enough to keep it from attracting a significant amount of developers from the Python community. Nonetheless, it's something I plan on using and contributing to, since it's the best option so far.
Sounds like Go. ;) This is a cheeky remark, but I use Python and Go, and Go very much feels like an improved Python in most ways. Especially when it comes to static analysis, build tooling, distribution, performance, etc. In particular, I love that there are no venvs, pipenvs, virtualenvs, pyenvs, wheels, eggs, setuptools, easy_installs, etc.
Pypy is super cool, but it doesn’t solve for maintainability and it only improves performance by one order of magnitude, leaving it 1-2 orders slower than Go. Besides, IMO, goroutines are so much nicer than Python’s async.
Yeah, but typically only locally. Like using a for loop instead of a list comprehension, or handling errors. So more keystrokes, but in most cases not more complexity. In some cases (generic programming), Python really is more expressive, but those ~5% of cases aren’t worth the tooling/perf/maintainability tradeoffs most of the time.
Code reviewers glazing over copy-pasted boilerplate blocks can more easily lose track of the whole, and miss an error which is obvious when the whole is expressed in 10 lines.
There is some optimal range of expressive density for comfortable use by humans. APL or K likely above that level, and Go feels below it, not as low as COBOL, but still.
The opposite is true in my experience. Most of that boilerplate is brackets and indentation, which visually frame the interesting bits, drawing your eye to them. This is, of course, subjective, but I use both regularly and at worst this is not a problem for Go.
The problem here is that there is boilerplate at all. There shouldn't be.
Boilerplate distracts from what is actually going on. I can generally identify code smells from the shape of python code (like, blur all the text so I can't read the words, and the shape of the blocks tells me everything I need), I can't do the same in go, because there's so much more indentation and visual stuff happening, and most of it (boilerplate error handling) isn't interesting.
Like I said, I disagree with this. I suspect this is either because you're very experienced with Python and relatively inexperienced with Go, or perhaps you're simply an outlier. I think if you surveyed developers who are very experienced with Python and have at least a few months of experience with Go, you'll find people say that it's easier to identify issues in Go code--and I think this largely comes down to the role the boilerplate has in visually "framing" or "structuring" (i.e., providing "shape" to) the code.
Have a look at Haskell which goes to great lengths to eliminate boilerplate and I think you'll experience the opposite--Haskell becomes very difficult to read precisely because the code is so dense. Similarly, take the indentation, newlines, etc out of a JavaScript file or JSON blob (minify it, more or less) and see if it's more or less readable as a result. I think you'll find that visual structure is actually important.
At this point I've written fairly little go code, but reviewed quite a bit. Among those I work with, my opinion seems to be shared.
> I think you'll find that visual structure is actually important.
I didn't say otherwise. What I did say is that go adds visual noise that isn't present in python. (and it is noise: the proposal to add try! shows that the error handling style is noisy. It can be basically entirely removed by an automated transformation). Actual pattern matching like rust has, or even what Google C++ has with StatusOr and [1] our nonsense RETURN_IF_ERROR macros are better than what go does, and just as explicit (actually often moreso, since its more difficult to forget an error condition)
> Among those I work with, my opinion seems to be shared.
Yeah, preference distributions are hard to assess. Either of us could be wrong.
> I didn't say otherwise. What I did say is that go adds visual noise that isn't present in python. (and it is noise: the proposal to add try! shows that the error handling style is noisy. It can be basically entirely removed by an automated transformation).
I’m glad we agree that terseness is not readability and visual structure is valuable. How do we meaningfully debate whether some boilerplate is noise or useful visual structure? Why is Python’s implicit propagation of errors elegant and beautiful visual structure while Go’s explicit error handling is ugly noise? Specifically how do we know that you aren’t prejudiced by your disproportionate experience with Python (even assuming my disproportionate experience with Python and preference for Go is an outlier)? What are the criteria?
Build tooling (“go build” vs setup.py), type checker, text editor support (hovering over a symbol for the type and docstring), documentation generator / godoc.org, dependency management (pip is great but it’s not reproducible; go’s toolchain is only modestly better here IMO), no need for virtualenvs, etc. I’m sure I’m missing several.
All my projects now use poetry for the full build tooling and I love it. No setup.py needed just include any settings in the standard pyproject.toml file example: https://github.com/timothycrosley/portray/blob/master/pyproj..., which can be generated with poetry's help using poetry init.
> text-editor support
I feel like Python with type hints (for all their current flaws) does give you this exactly.
> dependency management
Again I think poetry solves the problems here very nicely
I write a lot of Python tools so I'm genuinely curious because if there were unfilled needs I would want to address them as one of my 52 projects: https://timothycrosley.com/
> All my projects now use poetry for the full build tooling and I love it. No setup.py needed just include any settings in the standard pyproject.toml file example: https://github.com/timothycrosley/portray/blob/master/pyproj..., which can be generated with poetry's help using poetry init.
We have yet to try poetry in our org. I'm hesitant to stray off the well-trodden path, but it might be worth a shot. Any idea about installing packages with system dependencies? Packages like `pygraphviz` (which depends on the `graphviz` or `graphviz-devel` system library) has always given us a lot of trouble, for example.
> I feel like Python with type hints (for all their current flaws) does give you this exactly.
I've noticed that some editors try to use these hints, but they seem to have a hard time in many cases loading the modules. It's possible that the editor extensions (e.g., VS Code) are just buggy, but it's still a problem. Further, they require that all of your dependencies have annotations or type stubs.
The killer thing about Go's documentation generation is that it uses type annotations and exposes them in the generated documentation. This is critical because 95% of the reason I'm looking at documentation (especially in Python) is because I need to know the type signatures (and often Python docs omit types, or the types are wrong or vague--e.g., "the type is 'binary'" with no indication if that means a bytestring or a BytesIO or what). This is tablestakes for documentation systems in statically typed languages, but I have yet to find a Python tool that does this well. Further, `godoc.org` also generates links to types including across packages--this is _not_ tablestakes for statically typed languages--so you just have to click the type name and it will take you to the docs for other packages. Further, there is no CI needed to build/publish your documentation; `godoc.org` just needs access to your repo on github or elsewhere (you can run your own godoc.org inside your corporate firewall). Another nice-to-have feature is that documentation is just comments; there's no formal/obscure syntax a la sphinx.
> I write a lot of Python tools so I'm genuinely curious because if there were unfilled needs I would want to address them as one of my 52 projects: https://timothycrosley.com/
I hate the fact that you may be right, because I really don't like Go in many ways:
- I hate it's module system and package eco-system story.
- I don't like its syntax.
- I don't like its error handling.
- I'd much prefer gradual typing.
- I want to maintain the ability to use interactive interpreters.
- I don't like the fact that instead of being community driven it is Google driven.
But, anecdotally, I see go being used as a second language to Python more than anything else and at an ever accelerating rate.
These are all fair points. I really enjoy Python, but there are too many things I fight with on a regular basis that simply aren’t issues in Go. It could be so much better if (1) there was a better type system (mypy is unnecessarily shoehorned into the syntax and still very broken—can’t even express recursive types like JSON), (2) a good way to constrain the dynamism so performance could be improved, and (3) a better environment/package management and distribution story (so far pantsbuild.org and PEX files are the best I’ve found). Then there are a long tail of more minor issues, like async/await vs goroutines, real parallelism, etc.
I agree, but if you for instance look at the TypeScript comparison sub-thread, you'll see that all the issues with both the syntax and implementation of the type-system are being aggressively resolved, and likely will all be so by 3.9.
> Good way to constrain the dynamism so performance could be improved
Couldn't agree more!
> environment
I find poetry a joy to use. If you want to bypass venvs all together, there's a lot of work to make that a reality, such as https://github.com/David-OConnor/pyflow.
> packaging
Python in 3.5 added complete zip app support, which has improved this dramatically from my perspective. Extended by things like shiv https://github.com/linkedin/shiv make it fairly complete.
> async/await
This is interesting to me. I prefer async/await in general, because it has become a standard across programming languages and I find it really easy to reason about. I also find channels to be too widely seen as a cure-all, when the only study so far has shown they actually led to an increase bug count. But I don't discount the value of real-parallelism, and am glad to see that Python has been pushing harder on that lately, with things like subshells that allow bypassing the GIL on a single thread.
> I agree, but if you for instance look at the TypeScript comparison sub-thread, you'll see that all the issues with both the syntax and implementation of the type-system are being aggressively resolved, and likely will all be so by 3.9.
I'm happy to hear that; hopefully the efforts really do address these issues well.
> I find poetry a joy to use. If you want to bypass venvs all together, there's a lot of work to make that a reality, such as https://github.com/David-OConnor/pyflow.
I'll have to check those out, but one inherent problem is that even if these tools really do solve my pain points, adopting them means I'm leaving my org on a relatively small island, isolated from the Python community. If these really are the holy grail, why isn't the broader Python community adopting them? Please don't take this as me looking for something wrong--whatever Python build tool I use, I'll eventually need support and there's a lot to be said for having a thriving community that has almost always run into my exact problem before.
> Python in 3.5 added complete zip app support, which has improved this dramatically from my perspective. Extended by things like shiv https://github.com/linkedin/shiv make it fairly complete.
We're currently using this via pex. It mostly works, but we still run into problems occassionally (system dependencies, for example). Figuring out how to integrate these tools into the broader build process is another problem to solve--we're using `pants` which supports pex out of the box, but we're running into lots of bugs or other problems. I'll keep an eye on shiv.
> This is interesting to me. I prefer async/await in general, because it has become a standard across programming languages and I find it really easy to reason about. I also find channels to be too widely seen as a cure-all, when the only study so far has shown they actually led to an increase bug count. But I don't discount the value of real-parallelism, and am glad to see that Python has been pushing harder on that lately, with things like subshells that allow bypassing the GIL on a single thread.
My biggest issues with async/await are
(1) every package needs an async variant (async boto, async docker, etc etc). We work around this by running them in a thread pool executor, and I think that works, but I don't know if I'm holding the GIL unnecessarily and causing performance issues (fundamentally difficult to diagnose). This is roughly the "what color is my function" problem.
(2) it's really easy to starve the event loop by calling into something that transitively makes a sync call or otherwise just does a lot of CPU-heavy work. We've run into both kinds in production and they've been really hard to troubleshoot (because the requests that time out often aren't the ones that are actually causing the problems).
(3) dynamic typing means it's super easy to forget to await things. Tests should catch this, but we find ourselves writing tests _just_ to catch this (e.g., we now write tests for entrypoints that _just_ `await lib_function(params)`; we would normally not write tests for such simple functions, but now we have to). Static typing is the right way to solve this and mypy does, but mypy has too many other issues (at the moment) for our org.
One substantial criticism of goroutines is that they're less safe than async/await because you need to make sure the code you're running is threadsafe. I appreciate this criticism, but I think it's the right tradeoff for Go's performance aspirations (another great high-performance alternative is Rust's borrow checker, but that's the wrong tradeoff for Go's developer productivity aspirations).
> I'll have to check those out, but one inherent problem is that even if these tools really do solve my pain points, adopting them means I'm leaving my org on a relatively small island, isolated from the Python community. If these really are the holy grail, why isn't the broader Python community adopting them? Please don't take this as me looking for something wrong--whatever Python build tool I use, I'll eventually need support and there's a lot to be said for having a thriving community that has almost always run into my exact problem before.
Only because they are so new. portray was built a few weeks ago and already has a thriving community building around it - but of course it's still a small drop of the whole ecosystem.
Older tools I've built like isort, are now ingrained into the community: https://github.com/timothycrosley/isort, but that took years, even without major issues or complaints being present. It just takes people time to adopt new things.
Go may "feel" like Python, but it's almost nothing like Python in actual practice. It's not dynamic (and doesn't even have generics), and its error handling is dramatically different.
It _is_ like Python in practice (I use both languages all the time). That’s largely why you see it used in many of the same places as Python. It has dynamic features by way of interface{}, which is every bit as “generic” as what Python has to offer. :) But yes, the error handling is different—values vs exceptions.
I am of the view that interface{} is the worst of both worlds with regards to static/dynamic typing. Dynamically typed languages typically have type coercion and structures that make dealing with vars with unknown types easy.
However golang doesn't have that. So you get the danger of a dynamic language without the features that make powerful.
Go has those features in the reflect package (so as far as I know, Go is just as powerful as Python), but you’re right that they aren’t easy to use. If you do use them, it’s quite clear, and will be addressed in code review so you don’t have nearly as many dynamic typing bugs as Python—it’s not anywhere close.
Very dynamic code shouldn’t be easy; the happy path should encourage clear, simple code. By encouraging people to stay on the happy path, their code is more performant, maintainable, etc and it keeps the average code quality quite high across the ecosystem.
It is rare that I have had problems with dynamic typing errors in P* languages. I am of the view that that dynamic code should be one of two things.
1) Super easy. That way doing it right is trivial.
2) Impossiblely difficult so the only people who are doing it can be trusted to do it right.
To me go falls between those two. It’s real easy to say interface{} (indeed it is more difficult to make a non empty interface) but doing it in a way that is safe isn’t easy.
I don’t think expressive power is the point here. As they are both compleate languages. More it is an issue of what trade offs and comprises have been made.
I'm not sure (1) exists, probably by definition. And I certainly don't agree that Python makes it easy to "do it correctly". Our Python app has daily 500s due to typing errors. We also suffered for years because we would build magical things that we thought would work in every scenario but ended up being untestable and/or failed to consider numerous edge cases ("what happens if someone inherits from my magical class?") and/or which failed to extend properly ("oops, someone renamed this attribute and now all of our hasattr checks are broken, and the tests didn't catch it because they passed mocks"). Eventually we built a culture that mostly discourages magic/gratuitous dynamism, but it took years and we're still suffering from that legacy code.
These problems simply don't crop up in Go, or at least they're in a different ballpark in terms of frequency and severity. So yeah, Go lacks typesafe generics, but I'll make that tradeoff all day every day in exchange for the maintainability, performance, tooling, distribution, etc improvements that Go offers today. No contest.
Go has generic types the same way Python has macros, or the same way C++ templates is a functional programming language.
C has void *, writing generic code using it is hell. Enough so that people went through a lot of trouble creating C++ and later Rust to escape it.
I'd say the type casting from interface{} to whatever you assume is in there qualifies as different.
Pretty much every single aspect of these languages is different from what I can see, the only thing they have in common is included batteries, the rest is growing popularity and consequences thereof.
Yeah, I get it. It’s a little disingenuous of me to say that interface{} qualifies as generics, but I can’t quite put my finger on why it is different than Python. Neither are typesafe (although mypy supports generics, but has many other issues), but in any case typesafe generics would I think improve Go.
> This means that just by importing this module, we're mutating global state somewhere else.
Yes, this !
That's why I hate Django and some flask app the most for, the fact that by importing a module, you're implicitly creating a database connection, and a lot of other magic stuff, which mean that now I can't import a constant defined in said module outside of `python manage.py`
Also as said below in the article, suddenly it's much harder to handle smoothly the "the database is momentary unavailable" (because someone has put the line starting the database connection in the global space of a module somewhere)
I much prefer frameworks/modules for which code is executed only once you invoke their "setup" function
Django doesn’t create database connections on import. That would be madness.
It does create an object that can (lazily) connect to the database, so it needs the required database drivers installed. It also needs the required information about _how_ to connect to the database, so it needs the settings loaded.
That's why you need to use `django.setup()` before, to tell it what settings to load. You should never be importing random Django models without this configured, simply because they cannot be used and will not work. We think an exception saying "don't do this, call django.setup()" is less confusing at import time is than "Databases not configured" at runtime. Not that it would even reach that, because you might be using a field from a third party application that needs to be initialized (i.e INSTALLED_APPS configured) or that relies on a configured settings (maybe an encrypted field that needs your SECRET_KEY available).
Stop making it hard, just write a management command. It's super easy.
Every time I hear a comment alike parent's, it makes me think how many times a day I actually read a comment in the same fashion, but about something I actually know nothing about.
With credit to the original poster, they might be complaining about the fact that Django is a monolithic framework and you can't really use Django code without spinning up the i/o portion. Which is legitimate criticism, but frankly if that's what you need then you shouldn't be using Django.
Without calling setup, you cannot import anything that touches Django models, like constants defined in a file that transitively imports a Django model.
In practice, this means that any script that depends indirectly on Django code will incur a lengthy startup cost (from having to call setup()), and will fail to run if there's no database connection, even if the script itself doesn't need the db.
I'm not sure about django but flasks Application object has a before_first_request method which takes a function designed to do this type of initialization operations.
I'm a huge fan of Django, but I always felt that this was true. I wish there was more of a push to decouple parts of the framework. Keep the magic, but allow usage without it.
I love the idea, but it feels like just an idea at this point. I'd rather read about them releasing their 'compile-time' analyzer and revealing their measurements for how much startup time it saves.
In our codebase, we have pretty strict developer-enforced rules about not doing I/O at the module level, usually through the use of simple "Lazy" wrappers for module-level objects. I'd be curious to know what other approaches people have taken with Python here.
It is an interesting approach, though I feel like this could introduce some nasty unintended consequences given how dynamic and introspective Python can be (admittedly I haven't studied this particular implementation).
I always treated this a bit like single underscore private functions/methods, i.e., follow a convention that produces code that's easy to reason about, even if it's not strictly enforced by the language/compiler. So in practice this equates to separating out modules that mutate global state, and placing the majority of logic in "strict" modules that only declare a bunch of "pure" classes/routines. So the "non strict" code is really just a thin layer of wiring gluing everything together. For instance my Celery task files tend to be very thin.
It's interesting to me that they are going down this path instead of the microservices path. This seems like something ripe for slowly breaking down into microservices.
Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.
To be clear, I'm not saying microservices will solve all their problems or be less work. I'm just saying that with an equal level of effort, they would probably get more overall reliability by having multiple services, they'd be able to use multiple languages, whatever is suited to the task at hand, be able to deploy even more often with less risk, and be able to isolate these types of "change on import" behavior to a much smaller surface on any given deployment.
>Someone made a change that took down production because of non-deterministic outcomes? How about break out whatever they were changing into it's own service? With proper fallbacks, breaking that part shouldn't take down all of production again.
Yeah, now you'll have 10 interconnected services, 10x the complexity, and everything will have the ability to take down all of large parts of production, plus all the extra pain points of a distributed system...
You won't have 10 times the complexity if you are taking a monolith and making each section services. You'll have to same dependency graph, it will just use the network to make calls between them instead of being local.
You'll have added complexity with the network calls, which is why I said it wouldn't be any less work, just different work.
>You won't have 10 times the complexity if you are taking a monolith and making each section services. You'll have to same dependency graph, it will just use the network to make calls between them instead of being local.
Merely "use the network to make calls between them instead of being local" will add 10 times the complexity -- you suddenly have a distributed system, latency, delays, parts that can be on or off, de-centralized configuration (which can also get out of sync), and so on.
>it will just use the network to make calls between them
meaning that you get to throw network and server errors into the mix of things that can go wrong, and you get the fun of tracing failures back 3 hops to a server that decides to take too long to run a process one day and times out a connection downstream.
Beyond increasing complexity, I think this also assumes a dependency graph that _can_ be broken down into microservices by the author/the author's team. From my experience a lot of things at this scale have such complex dependencies that unteasing those dependencies is difficult if not impossible without asking several teams to do something differently. And who knows how long that will take?
That's why you do it slowly. You take a small part of the monolith and make a service that does the same thing. Then you replace the code in the monolith with a call to the service, while keeping track of how often it is called in the monolith.
As you keep moving along, some things that depend on that first service will start calling the new service directly, and some will still call it in the monolith. But your tracking will tell you how often and who is doing that, so you can find out why.
In the meantime, nothing will break, because the monolith is still a pass through proxy to your service.
However, at their scale and with their engineering resources, I can only imagine an attitude of "we can make this work" (the monolith) is easier to justify. The same goes for the micro-services approach (except here you have to justify changing what has been working so far?)
I'd love to read more about the history behind this approach at Instagram.
Regardless of whether the monolith or microservices approach is the right way to go for their use case: I could very well imagine that it is too late for such a migration, and that it would hold them back for too long.
> How do we know that the log_to_network or route functions are not safe to call at module level? We assume that anything imported from a non-strict module is unsafe, except for certain standard library functions that are known safe.
It's hard to know anything about the stdlib as it can be monkey patched, e.g. [1]
That said, you could solve this with diagnostics; calculate signatures of stdlib functions and classes to find any known safe ones that were patched. Run that check in your test suite to find problematic imports.
> If the utils module is strict, then we’d rely on the analysis of that module to tell us in turn whether log_to_network is safe.
I like this. It seems far more usable than proposals like adding const decorators.[2]
This is yet another example of the divide between wizarding and engineering[1]. When you're a small startup, what matters is the expressiveness of your language, and the ability do do a lot of things very very quickly. Type safety, performance, readability, those things don't matter. You're just a bunch of engineers who know the whole codebase inside out, you're pretty certain of what you're doing. In short, you're wizarding. If you grow big enough, this approach slows you down greatly, and you need to switch to engineering. You sacrifice some speed for making the codebase more understandable to a larger group of people, you can no longer assume everyone knows all the code, you write unit tests, need types and dislike metaprogramming because of the confusion it creates. This is why languages like Python, Ruby, Lisp or Smalltalk are amazing for small startups, but Java is what enterprises use. They're different ends of the wizarding/engineering spectrum. I wish there was a language that let you move gradually from one end to the other, exactly when you need to.
On that note, I'd include Erlang. It's not gradually typed, per se, but you can have a fully dynamic language (no type specs, no Dialyzer), a completely optimistic static analyzer for inferring types and warning where it's inconsistent (Dialyzer runs), and then you can add specs where needed to tighten up and improve what Dialyzer can catch, to basically be a fully static language.
It's relatively easy, but not free, to do this. I find that the erlang (and elixir) guides seem to be a bit scant on best practices to achieve this level of discipline, for example, wrapping all gen_sever calls in module functions and presenting a well-defined API for the genserver module (and possibly, even linting for no naked genserver calls) is not really explained in this light. Similarly guidance is not provided for wrapping enum module calls (since that similarly destroys typing information)
Yeah; it requires some rigor to do. My point was simply that it _can_ be done, and while the effort is high, it does allow you to move from pure dynamic language, to highly defined type checking.
If you fully spec out your code, it's actually quite close. In a project we did that in, the only type errors we encountered were ones that a static system would not have caught either (due to their being caused by incoming data that did not conform to our type expectations; for instance, deserializing JSON to a specific type).
Without specs, it will assume every type is 'any()', unless it has information to infer something more stringent. For instance, if it sees you add 5 to it somewhere, it will instead assume it is a number. Etc. Even if in practice it actually is a list of some kind (and so that addition of 5 will fail). Which, yes, ain't great. Hence why I said it was a gradual transition; it will catch provable errors (i.e., if you call append on that same variable as above, it will note that there is no type that allows both append, and + an integer, and error), but leave plenty of things uncaught that could have been caught had it known the type in question (via a type spec).
TypeScript is perfectly this. (And other gradually typed solutions; TS is simply the most popular one.)
You have the madness of thousand of developers flinging code at the universe due to the easiness of browsers, JS, and npm.
This results in great speed, but not great quality.
When your project/company now wants quality, you keep your code but transition to types. (In OSS space, Angular and Yarn projects have both done JS => TS migrations of some form.)
Afaik, typescript is pretty bad in terms of catching some basic errors. Types are not enforced. A caller can change sync function to async, breaking the functionality downstream.
It's not just about static typing, though. Macros, metaprogramming, being able to reach as deep as you want to, ugly code full of side effects, global state etc. All of those might actualy benefit you when your project is small, and they make development way faster (see Rails). Later, however, they're a definite impediment.
> When you're a small startup, what matters is the expressiveness of your language, and the ability do do a lot of things very very quickly. Type safety, performance, readability, those things don't matter.
I’ve never worked on a program so small that readability didn’t matter. I consider it a crucial ingredient of expressiveness and development speed.
Though your perspective could explain a few of the more atrocious code bases I’ve seen.
During exploratory programming I don't care at all about readability, just about find a path - any path - to something that works. As soon as I have that readability starts to matter and the first order of battle is then to refactor out all the dead ends and to make the whole thing look good. That's because the project now has long term perspective.
And the worst thing is using languages/libraries/frameworks that presume everyone needs engineering when you need to wizard.
There's two many people that have swallowed SOLID whole and can no longer see good engineering as a trade-off against other factors.
For example, being strict about having the smallest possible public API and making most methods private protects me from future breakage that might never be an issue (I might never upgrade) but forces me copy/paste vast globs of your code into my own if I need access to something you didn't anticipate. (and that's assuming I have access to your source. Worst case is that I have to reimplement things that already exist in the code I'm interfacing with)
Python got this right. Private methods are a weak or strong hint that you might want to think twice before calling them. But you're the boss at the end of the day.
>And the worst thing is using languages/libraries/frameworks that presume everyone needs engineering when you need to wizard.
I think this is why it's easy to point a thousand things built in python which people use every day (like instagram), while in, say, haskell, there are barely a handful (pandoc, facebook spam filter, etc.).
I like the way you characterize this but what on your thoughts on why you can't be a wizard with a typed language? Seems to me that if you start with something like Go or Typescript you cover a decent middle ground of foregoing a lot of boilerplate while having code that you don't need to be a wizard to understand.
> You're just a bunch of engineers who know the whole codebase inside out, you're pretty certain of what you're doing. In short, you're wizarding. If you grow big enough, this approach slows you down greatly, and you need to switch to engineering.
I've never heard of this before... I love it. Thanks for bringing this up.
> When you're a small startup, what matters is the expressiveness of your language, and the ability do do a lot of things very very quickly. Type safety, performance, readability, those things don't matter. You're just a bunch of engineers who know the whole codebase inside out, you're pretty certain of what you're doing.
I'm not familiar with this use of the term "expressiveness".
My understanding is that expressiveness (as per "On the expressive power of programming languages", Felleisen 1991 [0]) has to do with capabilities that a language has that separate it from another language. C is more expressive than Python in that it gives you direct access to memory management, whereas Python is more expressive than C in that it provides inheritance/OO. (These are just examples.)
Type safety, performance, and readability are all wholly separate from expressiveness, I think. A language's type system and performance benchmarks have nothing to do with the expressive power of a language outright, and "readability" is entirely subjective to begin with.
So: would you mind elaborating on what you mean, exactly, by "expressiveness of [a] language" here?
---
In fact, most of what you (and the linked article) are talking about has to do with the dynamic/static spectrum, not this "wizarding/engineering" spectrum you've coined (though I do kind of like the idea of that for discussing development methodologies).
The article is all about how the dynamically-typed nature of Python allowed for rapid iteration at the beginning of the Instagram project, but has since hindered further progress as they've grown larger. But now they feel they can't just rewrite it all in a statically-typed language because of the engineering overhead involved.
On this note, I want to go to your last point:
> I wish there was a language that let you move gradually from one end to the other, exactly when you need to.
With regard to the dynamic/static distinction, there are languages that allow you to move "gradually from one end to the other", and they are (aptly) called gradually-typed languages.
Gradual typing was invented by Jeremy Siek and his PhD student, Walid Taha, back in the mid-2000s at Indiana [1]. In this discipline, you can have a statically-typed codebase with local dynamically-typed regions. You get all of the static guarantees for everywhere that they can be made, and dynamic regions impose runtime checks to ensure consistency. (This connects closely to contracts, which are primarily worked on by Robby Findler at Northwestern, I think.)
Unfortunately (to me), it seems like a lot of these languages are implemented in terms of existing dynamically-typed languages. For example, Sam Tobin-Hochstadt (Indiana) created Typed Racket, which is (of course) built upon Racket but provides a gradual typing discipline. Wherever possible, static types are checked, and everywhere else utilizes contracts to guarantee runtime consistency.
Anyway, all this is to say: the technology exists, technically, but is in its infancy. There's no doubt it'll be some time before it sees widespread use throughout industry. Sam wrote up a brief overview for the SIGPLAN Perspectives blog recently, if you're interested [2].
I expect in this context, expressiveness means something like "the ability to describe the relevant stuff in the code with minimal noise", which might map to having good abstraction.
I find that pythons OOP + functional aspects, combined with a good understanding of the language hits a sweet spot here. One that simply can't be reached in c/cpp/go/java/haskell, and which is much easier to reach than in js/rust/other langs where I think it is possible.
My definition of expressiveness is basically similar to what Paul Graham (@pg) says. He also calls this "powerfulness" in his famous "Beating the averages" essay[1]. In short, a more expressive language is a language that lets you express more with less. Rails, with its "has_many :books" is very expressive, Assembly is the other end.
The wizarding/engineering spectrum was coined by the article I've linked to[2]. I think the post is exactly about that, first Instagram was wizarding and they had a suitable language for wizarding, now they're engineering, but their language is still only good for wizarding.
As I've said in a sister comment, it's not just about static typing, but metaprogramming/macros/side effects everywhere etc. There's more to the expressiveness/powerfulness than just types. While gradual typing is certainly an improvement, I think we need more research in this direction.
Countless companies, huge and small -- from Apple and Amazon, to Google and your friendly local startup, plus all the enterprise world that's not a .NET shop...
In what parallel universe is not Java immensely popular or not used for green projects?
On what concerns the JVM, Kotlin hype will be over in a couple of years, and it will get as much use as Scala, Clojure, Beanshell, Groovy enjoy nowadays.
Guest languages never get to own a platform, and with time all platform languages end up getting enough features that the large majority of developers never bother with extra tooling, debugging layer and idiomatic wrapper libraries of the guest languages.
>On what concerns the JVM, Kotlin hype will be over in a couple of years, and it will get as much use as Scala, Clojure, Beanshell, Groovy enjoy nowadays.
And we know that because?
>Guest languages never get to own a platform
That depends on the platform, who is running it, and how. You couldn't have a worse steering than Oracle.
And most "guest languages" are smaller affairs, they don't have companies the size of Google chosing them for Android app development (a huge niche in itself). Or have first class support from the most popular IDE of the host environment.
Plus, anything is anecdotal, as we have so few cases of major parent/host language rivalries, and even fewer cases with similar dynamics, that there's no real prediction.
Scala was too complex for most Jave-ers, too slow to compile, didn't have a Google pushing it to its platform devs but an insignificant company, etc. Clojure was a Lisp (= doomed), Beanshell and Groovy where from small, insignificant origins, and not pushed by anyone really mainstream the size of Google/FB/etc.
UNIX and C, Web and JavaScript, Windows and .NET/C++, macOS and Objective-C/Swift, Android and J̶a̶v̶a̶/Kotlin/C++....
Google only cares to push Kotlin on Android, and it only matters because Google visibly doesn't want to move Java beyond the Java 8 subset that Android currently supports, so the choice is between an handicapped Java support or Kotlin.
Until there is a JVM written in Kotlin, and Kotlin gets first class support in all Java IDEs instead being a tool to sell InteliJ licenses, it is just yet another language that happens to target the JVM.
This ignoring that Kotlin already has a couple of impedance mismatches with the JVM, sequences vs streams, lambdas vs SAM, co-routines vs fibers, inline classes vs data classes.
Elixir is doing well because many developers seem wary to learn Prolog/Erlang syntax.
> Guest languages never get to own a platform, and with time all platform languages end up getting enough features
They do up to the point where they differ philosophically. Java is never going to turn in to a Clojure, nor is it going to adopt the type of dynamic scripting features Groovy offers.
Can't say about new ones, but after a couple of years of working on a huge Python project, I would accept rewriting it in Java without a whim. I have equal experience in Python and Java by now. Funny thing, I returned from my vacation a few months ago, and started writing my first code after it. Of course, I was more concerned with thinking what am I writing instead of how. Then I looked up and noticed I started typing Java instead of Python. And that's three years after the last time I've written some Java code.
Oh, just Apple, Amazon, Google, Netflix and likely every hospital, utility company, police force, military, or bank you depend on. And the people programming the robot swarms that pack your groceries (https://www.infoq.com/presentations/java-robot-swarms/).
For large scale projects, Java and C++ remain the go-to languages. I've seen a little bit of Go start to show up but no others. Other languages are used for libraries (Rust, C), only at certain employers (OCaml, Erlang), or for small-scale projects (nearly everything else).
On C++, inertia is a wonderful thing. Java has the benefits of extensive dependency injection and JVM/ecosystem tools that lower the risk of deployment of code. .NET also provides the controlled "managed code" environment of CLR.
Why any enterprise would use C++ for standard "business" or "large scale" programming makes no sense to me.
Enterprises want stability, not speed to market. Most of their infrastructure changes slowly (as in features deployed once or twice a year maybe). They have stable support mechanisms for this, including long and complex processes of approval.
As ex-C++ dev, that lives in the Java/.NET worlds since 2005, because they still don't cover all use cases where C++ might be needed.
So while you might not write it as full stack C++, a couple of native libraries might be required as dependency, to access OS features, give some help to the AOT/JIT compilers, or in Java's case implementation of more machine friendly data structures.
One of my clients handles 90% of all the pbm routing in the US, which is millions of transactions per second. They started to completely modernize the application on java... primarily Spring Boot and Apache Geode. After some optimizations they are very happy with the performance and expressiveness of modern java.
Exactly right. Java has lots to improve on but there seems to be unsubstantiated hate towards Java on HN which I find contrary to what happens in the real world. In the real world companies find hiring Java programmers relatively easy (maybe not the best and brightest) who can get a project off the ground easily
C++ is mostly in a different but overlapping use case these days, used for system software or where performance is critical. Go is still a niche language most developers haven't heard of and with no readily available for hire talent pool.
I'm not a fan of the language, but java has a huge number of developers, a rich mature ecosystem of software and is quite productive (enterprise patterns aside), it's a good sweet spot for most companies.
C++ is a hydra of complexity, sure it has it's place, but it's not nearly as productive as Java for your typical web application.
Go is almost the opposite, so simple it lacks features like generics.
The last time I used Go it had fundamental usability issues around dependency management(although I think recent versions have improved on vendoring a little).
> C++ is a hydra of complexity, sure it has it's place, but it's not nearly as productive as Java for your typical web application.
Modern C++ well is as productive ( probably even more productive ) as Java. The main issue with C++ is recruitment, C++ engineers are rare because C++ is barely teached.
C++ is barely taught in ProgrammerGenerationFactories because "modern" C++ still allows "old" C++ and makes it difficult to stop developers from doing that.
Just like MISRA tries to constrain C programmers from doing dumb things in the embedded world, "modern" C++ tries to the same in the business world. But there isn't an easy way to enforce it, especially when you're outsourcing to some code sweatshop.
Every mature enough language has a subset that you need to avoid. Including Java. This is precisely due to this kind of things that every company need to have coding guidelines and proper static analysis tools.
> especially when you're outsourcing to some code sweatshop.
If you outsource your dev to cheap, other side of the world, low quality engineers. Then you deserve your problem, in any language.
I worked in the past for a company (embedded programming) that had an entire team of expensive engineers in Luxembourg just to fix the stupidities of an other team of outsourced engineers in India.
And yet you only offered a misinformed platitude in the form of a question.
Google does all kind of new Java work (Golang contrary to myth, is just one of the languages Google uses for internal stuff, and niche at that), Amazon of course, most of Apple's backend services are Java, Twitter, AirBnB, Uber, LinkedIn, TripAdvisor, and tons of others use Java, and write green stuff in it all the time...
Twitter, and they write some really good open source stuff too. If you're writing rpc services in java I'd almost argue you should default to considering Finagle:
What you call Wizarding I call "ordinary Software Development". A software developer spends ~70% of their time writing features and the rest mixed between organization/planning/roadmapping etc. A software engineer spends ~30% of their time writing features and the rest of it managing technical debt and making long-term investments towards better features and processes.
Too many companies need devs but have engineers, or they need engineers but only have devs :/
Another thing that I would like to see in some kind of strict mode is the ability to mark explicit exports like in JavaScript modules. I often want to import multiple things globally at the top of a module because they are shared by multiple class or function definitions that I am writing. However, such imports end up being exposed to and usable by the consumers of my module, even though the consumers should really have imported those things at their source instead of via my module.
There are currently maybe two ways to tackle this “problem”, without a strict mode:
1. Don’t import at the global module scope; but that’s a bit tedious.
2. Import with rename, like `import os as _os`, and then leave it to the principle of “we’re all consenting adults”. I.e. if anybody imports and used things that start with an underscore, it’s clearly their fault, not mine.
3. Import as normal, and leave it to the principle of "we're all consenting adults"; unless something is explicitly called out as being part of the public API I consider Law of Demeter[1] "violation" the same as accessing _var.
I think this is an interesting idea, which appears to embed a stricter subset of Python within Python itself. Have the Instagram engineers tried floating this with the wider community via established channels like Python-Ideas or discuss.python.org?
I like the idea, but it feels a bit heavy handed outside of a very large team.
I think the first step here is to get away from the assumption that importing a module will have "interesting" side effects. This is not only a problem with Python...
I tend to create mini "dependency injection" frameworks that create a pattern for loading module code at some point well after import. This patterns tends to reduce to wrapping whatever code you have in the module in a function/closure instead of just running whenever.
Again, I like the idea of enforcing constraints with code, but I don't think it's a substitute for educating developers to avoid certain patterns and giving them infrastructure that makes the alternative easy.
I like that idea, it's just not that easy. How to do define module versions and inheritance, when you are not allowed to do global assignments in the module. declarations only, and no IO or global side effect is fine, but declaring versions and inheritance need to be allowed in global scope.
well, if you ask me to write language X, I would definitely make mistakes for the first couple of weeks/months/years, that is why you need code review, mentoring and education plans for your hires.
> Here’s another thing we often find developers doing at import time: fetching configuration from a network configuration source.
MY_CONFIG = get_config_from_network_service()
I am pretty sure this an anti-pattern, if this code passed the code review, you should make your review process more strict.
Well, yes, why would you do this? why would this pass code review? why do we we have linters and other checks for dynamic languages
> It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
It seems we are here blaming python for shortcomings of a monolith also, instead of chunking out specific businesses modules to separate services/micro-services.
TO be honest the strict mode seems interesting, but I believe the problems they seem to be facing can be solved by a couple of changes to their pocess and code:
- everyone gets a mentor if they are not experienced in python or django
- code review atleast by two experienced python developers(does not count if you have coded for Java for 20 years)
- teams should try to move their logic outside the monolith(it sounds like they have a monolith)
- write CI tests to measure how much time it takes to import a file, if it takes more than T(line count * LINE_PROCESSING_THRESHOLD) you have to fix your code.
- prepare config and load it before running the actual server, no network call for getting config
All in all, python is suitable for big companies also, the thing is if don't care about the best practices, you would also have problems when you are a small startup, but in a big co it would make it impossible to move forward, trick is to independent of the company size follow best practices and have code review.
That's a long post to say "do more code review instead of investing into technical solutions to technical problems".
Clearly, Instagram's solution saves them time. That means faster code reviews which incidentally makes them more accurate. Your post doesn't really make sense.
> My current understanding is that the log_calls method would NOT get executed during module load time!?!
That's incorrect. log_calls gets executed on import because it's a decorator, so equivalent to `hello_world = log_calls(hello_world)` at the top-level (which does also get executed).
log_to_network in the _wrapped() definition doesn't get executed until hello_world gets called; but outside of the definition of _wrapped does get executed.
Not really. Mutation in general, and in modules in particular, inhibit a lot of reasoning about the code, and thus stops a whole lot of optimizations from being possible. Guile (a scheme dialect) recently got declarative modules for that reason, where a top level binding cannot change (i.e. you cannot set! a binding, but you can wrap in it a mutable container and change the contents of that container). This makes procedure calls and variable lookup a lot faster. Andy Wingo wrote about it here: https://wingolog.org/archives/2019/06/26/fibs-lies-and-bench... .
Those optimizations won't mean much for cpython, since Cpython doesn't try to run things fast, but for something like pypy this could be a big deal.
You have no idea about their codebase, the implementation details of their features nor how they counted the lines (comments included?). So stating that it’s dumb is beyond ridiculous.
You are right in that it’s certainly a high LoC count for Python, but still...
And yes, knowing nothing else about their code base than A) It's in Python, and B) it's several million lines of code, I feel very confident that there is at least an order of magnitude too much of it. Instagram is just not doing anything that complicated.
(I should mention I specialize in maintaining and refactoring legacy Python code. I know what I'm talking about here.)
Features that are "not complicated" can actually very easily be "very complicated" at scale. Which Instagram does have. 500 million users, every single day.
If you need several millions of lines of Python to do what Instagram server does, the code is bloated.
My bet is that they let too many Java devs loose on the code base, without experienced Python devs reviewing the commits and managing the deluge of unnecessary classes. I've seen it happen before.
>If you need several millions of lines of Python to do what Instagram server does
I have this feeling that you're probably not all that aware of 95% of what their code actually does, and thus probably not in a position to make judgements as to whether their code base is truly bloated relative to what it does.
From a user's perspective, Instagram has:
a) a way to post pictures/videos/sound recordings to a public feed. The pictures can include overlays of links to other users, to other posts, to song lyrics that play in sync with the music, etc etc. Users viewing their posts get the ability to comment/like/link, with automatic language detection and translation on demand.
b) a way to see how other users interact with their posts, allowing comments, seeing views and other analytics, monetizing etc etc
c) Provides advertisers with the ability to place stories (stories are a stream of short-lived (24 hours) video/audio posts that users see) or posts (that can be static/video/audio), with links to external sites, purchasing direct links ("Shop now"/"Buy this") etc
Instagram is much more than a stream of user images.
That doesn't include all the "back office" stuff like spam/reporting/censorship/language translation etc etc.
As orf said you have no idea about their codebase. And you have no idea what's included in that statement -- given that they talk about startup time, they most likely are taking into account the whole framework, a plethora of admin and analytics tools, lots of debugging / debug-only infrastructure, migrations, lots of tooling whose sole purpose is making it easier to work in large teams, etc…
(And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
There's nothing absurd about one of the most visited websites on earth to be a couple million LOC.
> As orf said you have no idea about their codebase.
I do too: It's Python and it's several million lines.
Metaphor: you've got three pallets of goods and have hired three trucks to move them. I don't have to know how you wrapped the pallets to know that you brought two too many trucks.
I don't have to know the details of what's included in "Instagram Server" et. al. to make this call (obviously) based on my experience and first-hand knowledge of similar codebases. Frankly, I am kind of disappointed in the pushback I'm getting on this. The only reason to have a multi-million line Python project is for the entertainment of devs, or, worse yet, job security.
Let me put it this way, if the CTO of Instagram showed up here I would be willing to bet US$100,000 that I could reduce the Instagram code by 90% in six months. (Do you think the devs there would appreciate that? Even the one that got laid off as a result?)
If I sound cynical it's only because I've seen this sort of thing for myself. I'm not trying to say that the Instagram devs are dumb or nefarious, this kind of code happens organically and often despite our best efforts. But that code needs a diet. I'm sure of that.
- - - -
edit: In re:
> (And for the record, Linux is ~37 million lines of actual code, Postgres ~2 million, and gcc ~8 million)
So, call it 50M LoC, what's your ratio for Python/C? Meaning, how many lines of C code are replaced, on average, by one line of Python?
And how feature-complete are we talking? POSIX? GCC targets a lot of languages and platforms, eh?
If you were going for an integrated system, like Oberon OS or a Smalltalk IDE, I think my claim is still plausible, eh?
Let me put it this way, if the CTO of Instagram showed up here I would be willing to bet US$100,000 that I could reduce the Instagram code by 90% in six months.
And from Instagram's POV, the ROI on that would be much less than putting in the sort of belts and braces that the article talked about.
They don't have the time or space to engage in a massive technical debt reduction program, they're too busy destroying Snapchat and other competitors, reacting to TikTok, implementing an entirely new IGTV video service that provides their customers (ie advertisers and marketers) the equivalent of youtube within the Instagram universe, etc.
I'm sure that every large internet service's codebase out there could be made much leaner and smaller. The question is whether that is worth their while.
Dude, sincerely, thank you. I feel like this is the sane answer I was waiting for. Cheers! (and for your other comment in re: what all Instagram does. I appreciate it.)
> So that's a third pain point for us. Mutable global state is not merely available in Python, it's underfoot everywhere you look: every module, every class, every list or dictionary or set attached to a module or class, every singleton object created at module level. It requires discipline and some Python expertise to avoid accidentally polluting global state at runtime of your program.
> One reasonable take might be that we’re stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.
> But we’re past the point of codebase size where a rewrite is even feasible. And more importantly, despite these pain points, there’s a lot more that we like about Python, and overall our developers enjoy working in Python. So it’s up to us to figure out how we can make Python work at this scale, and continue to work as we grow.
Those are literal quotes from the article. That is quite damning. How did they get to this point? By starting when Python was appropriate, and taking it day by day.
It still blows my mind that people don't use strongly typed languages in the first place and spare themselves from all this future pain.
My guess (based on my experiences) is that companies wind up in this position from having inexperienced people building early versions of products instead of hiring experienced engineers (who are usually more expensive).
Dynamic typing vs. static typing is on a different axis than strong vs. weak typing. Python is a strong dynamically typed language, with some "static lite" features introduced in Python 3.
Dynamic typing means that types can be changed arbitrarily at runtime, compared to statically typed languages which define all types at compile time.
Strong/weak means that type coercions rarely/never happen automatically. For instance JS has some interesting behavior enabled by weak typing `[] + [] -> ""`. Whereas Python rarely coerces things for you. The division operator in Python 2 was strongly typed, while they changed it to weak typing in Python 3 (inline with the practicality vs. purity convention).
"strong typing": everything has a type and cannot be accessed at some other type; "static typing": everything's type can be determined statically (according to one definition)
In Python everything has a type, and you can't use a float as a list, for instance. It's correct to call it both strongly typed and dynamic, those are not antonyms.
It's a constant struggle against the current. Dynamically-typed
languages are often “good enough for the time being”. I have the same
issue explaining to our C/C++/Obj-C team why they should use static
(Clang-Tidy, Infer, PVS-Studo) and dynamic (ASan, MSan, UBSan) analysis
tools. They just keep giving me basically the same response of “I am
a good programmer, and my code is good, and shame on you for even daring
to think that a mere machine could find bugs in my code!”. I don't know
what kind of status anxiety causes it. It also makes me think about
what kind of other I am missing because of the was I keep thinking
that I do that thing well-enough myself.
I'm confused. It should be easy to demonstrate the benefit, if there is one. Just show them the bugs!
For me, it's not "status anxiety". It's simply not worth the effort.
The last couple static analysis tools I ran on my programs, I spent a while getting the tool to not-crash (because even though the authors obviously had a static analysis tool themselves, they either didn't bother to run it on their own code, or it wasn't good enough to find actual issues). These tools flagged only a couple issues, and almost all of them were places where it couldn't really cause any problems, but the type system was not strong enough for me to prove why it couldn't go bad. So I spent a while sorting through false-positives.
I'm not going to spend hours with a tool to find only a couple (real) bugs, which no user has ever reported seeing, and which I've gotten no automated crash reports about. I have much better uses for my time.
See, that's another thing that a lot of people don't understand about
static analysis. It's not just there to find bugs in existing code,
it's there to find bugs as you write or edit the code! Of course it
won't find a lot in a tested code base. It's tested after all. But it
immensely shortens debug time as you develop, and thus reduces testing
time as well.