Julia is compiled unlike Python though (at least by default), the moment you call Zygote it will have to run through the entire program you want to differentiate and fail immediately if any type does not match (without the need of any type annotation besides in the method arguments for the multiple dispatch and structs, as Julia has type inference). As some people say, Julia will fail "Just Ahead of Time", and if you're experimenting with live code in the REPL or Jupyter (the recommended workflow) it could very well be just as good.
That said, for machine learning a static type system (as they are now) is definitely not as much of a boom as in most other areas. Most of it involves tensor operations, and embedding things like it's shape in the generics will often lead to an exponential explosion of possible monomorphizations that the compiler will be forced to create. Even JIT languages like Julia will have trouble even though it only needs to compile when it's used (for example StaticArrays will get to a point when the compiler will take so much time that it's not worth it anymore). And even then I feel like most of the issues that actually take time are deeper than stuff like shapes and not naming dimensions, like numerical instabilities, hyperparameter tuning, architecture search and other stuff that gets more benefit of a language allowing quicker exploration.
Python is also compiled to bytecode in the mainstream implementation. But that is an implementation detail and has nothing to do with the (lack of a) type system.
Julia is also "compiled" down to it's (untyped) IR before even starting (when all macros are expanded), but since it's a dynamic language like Python, it can't know types at compile-time, so this step can only know parser errors. It could only type check at this step if the type was in the definition and not in the variable inside (if it was static). And both Julia and Python are also strongly typed (there isn't really a language that lacks a type system).
The difference is that whenever a function in called in Julia, based on the type of the arguments the compiler can infer every type of every variable and arguments of every subsequent function, immediately compiling them down to machine code (unlike Python there is no such thing as a Julia Virtual Machine or even a Julia Interpreter outside of the debugger). Whenever you enter a function in Julia it becomes a static program, with all the properties of a static language (for example, you can't redefine types, you can define functions but the program can't see them since they are not part of the compiled code, you can't import libraries, and all types were already checked before running). That's why Julia is fast, and the language was entirely designed for working this way (there are lots of things you can do in Python that you can't in Julia to make this work).
> And both Julia and Python are also strongly typed (there isn't really a language that lacks a type system).
No they are not. At least not for my definition of the term "strongly typed". Of course you can use a different definition, but if, as you say, "there isn't really a language" that does not fulfill your definition you might want to reconsider its usefulness. The point of classifying languages is mood when every fan comes along and says "my favorite language also has that, if you tweak your understanding just a bit".
I used the more common definition (strong vs weak being orthogonal to static vs dynamic). Both Julia and Python are strong and dynamic (and duck typed).
That said, for machine learning a static type system (as they are now) is definitely not as much of a boom as in most other areas. Most of it involves tensor operations, and embedding things like it's shape in the generics will often lead to an exponential explosion of possible monomorphizations that the compiler will be forced to create. Even JIT languages like Julia will have trouble even though it only needs to compile when it's used (for example StaticArrays will get to a point when the compiler will take so much time that it's not worth it anymore). And even then I feel like most of the issues that actually take time are deeper than stuff like shapes and not naming dimensions, like numerical instabilities, hyperparameter tuning, architecture search and other stuff that gets more benefit of a language allowing quicker exploration.