> The syntax looks quite the same, but Amazon's awscli Python installer has loads of dependencies. I'll have to see if it's worth switching.
Why are the dependencies a problem? By combining a handful of smaller, focused modules that each do something well, you can end up with something better than if you were to re-invent the wheel for every need.
AWS and the Python dev team are doing a heck of a job on botocore, and have cranked up the pace of improvement in the last 6 months. This CLI reaching "official" status will guarantee (at least until further notice) that it will see updates and fixes. It's likely to see early or earlier support for new AWS services.
`pip install awscli` just installed 26 other modules besides awscli. Now I feel a little obliged to go check out those 26, as well, to see what they are.
I agree about not re-inventing the wheel. But the amount of stuff installed is definitely a considering factor when choosing between two seemingly identical scripts.
> `pip install awscli` just installed 26 other modules besides awscli. Now I feel a little obliged to go check out those 26, as well, to see what they are.
So? Use a virtualenv and stop worrying. These 26 dependencies will be separately updated and maintained, who knows what warts are sitting in the monolithic perl scripts.
With boto, I battled for years trying to avoid any dependencies. But that has a lot of negative side effects, too. One of the great things about Python is the amazing variety and quality of libraries available. We decided to embrace that with AWS CLI. We have 10 direct dependencies. Four of those are our own packages that we decided to split to allow maximum reuse. Then there are fundamental things like requests, six, docutils. The rest are things that, we think, improve the experience. Virtualenv is an awesome way to manage this. I highly recommend it.
I feel like it's a valid concern for someone to want to know what's on their system. Sure virtualenv is a great solution, and I'm sure the person you're responding to knows about it. But there is a place for skepticism in required dependencies, and perhaps the OP parses Perl better than Python.
In a pre pip/rubygem auto-installer world, this would have been one package that had the 26 dependencies embedded as modules inside it. You wouldn't have known they were unless you looked at the source code. You also wouldn't be able to update them independently, you'd get updates when a new release of the original thing was available that included embedded updates.
Would this make you feel more like you knew what was installed on your system? Would you have felt the need to look at the source code, including the source code for any embedded dependencies, in, say, a 'vendor' or 'lib/dist' directory or something?
What are the plusses and minuses of each approach? There are some plusses and some minuses either way. After considering them, do you still have a problem with the 'new way' of doing things, where a program might install along with explicit dependencies via pip (or rubygems in ruby) separated out in a different place in the file system, vs. embedded/bundled dependencies?
The issue is that they can't release them all compiled into one thing. Instead the dependencies have to pollute the rest of the operating system. This is one of those things that Java got right and virtually everything else except npm (local node_modules) got wrong out of the gate.
I'd love to have a simple way to package python apps that depend on other python and native libraries without having to install things separately.
> The issue is that they can't release them all compiled into one thing. Instead the dependencies have to pollute the rest of the operating system.
Bullshit. Python supports having modules installed into local locations (see virtualenv).
Just `virtualenv ~/.local/lib/aws; ~/.local/lib/aws/bin/pip install awscli; ln -s ~/.local/lib/aws/bin/aws ~/.local/bin/aws` and put `~/.local/bin` into your PATH.
If you don't want the pip dependencies to "pollute the rest of the system," you can just use pip install with the --root option.
Java's CLASSPATH causes enormous pain for end-users. Just read the Hadoop mailing list. The fact that Java doesn't have a sane default for where to put anything or how to manage dependencies is a huge flaw.
Back when I was doing AWS, I just used the C binaries (I forget what they were called) to transfer things to or from S3. I just wanted to avoid installing hundreds of megs of dependencies. We paid money to transfer our AMIs around, after all! Still, a more full-featured tool will no doubt come in handy in some scenarios.
If you are using CLASSPATH for Java you are doing it wrong. Full dependency jars with every class needed is really the only way to do this reliably. That requires only a single file to be passed around. You can even include native libraries in it. Hadoop is a nightmare, I agree. They are doing it wrong.
Hadoop is a framework, not a library, so user applications need to link against the jars they need. That implies using CLASSPATH to locate them. Whether or not there is 1 jar or 100, the fact that there's no standard place to install jars in Java is a problem.
Hadoop jars are hundreds of megabytes, and we have multiple daemons. Duplicating all those jars in each daemon would multiply the size of the installation many times over. That's also a nontrivial amount of memory to be giving up because jars can no longer be shared in the page cache.
Some of these problems could be mitigated by making Hadoop a library rather than a framework (as Google's MR is), or by pruning unnecessary dependencies.
Most of these issues could be addressed by actually modularizing the core of Hadoop, some of which has been done in the latest code. Also, many things could be provided at runtime by the system with only the interfaces required to be in the jars that customers depend on, thus making their jars backwards compatible and more robust. BTW, let's say you didn't want to put the jars in one jar but didn't want a classpath. You can use META-INF/manifest to include those jars automatically as long as they are in a well defined place relative to the host jar. Redesign with the requirement that end users don't have to worry about CLASSPATH and you will find that there are solutions.
I do sympathize that something akin to the maven repository and dependency mechanism hasn't been integrated into the JDK. I was on the module JSR and continually pushed them to do something like that but it turns out IBM would rather have OSGI standardized and so it deadlocked. Maybe something will come in JDK 9.
Well, I work on Hadoop. I don't know what you mean by "modularizing the core." There was an abortive attempt a few years ago to split HDFS, MapReduce, and common off into separate source code repositories. At some point it became clear that this was not going to work (I wasn't contributing to the project at the time, so I don't have more perspective than that).
Right now, we have several Maven subprojects. Maven does seem to enforce dependency ordering-- you cannot depend on HDFS code in common, for example. So it's "modular" in that sense. But you certainly never could run HDFS without the code in hadoop-common.
None of this really has much to do with CLASSPATH. Well, I guess it means that the common jars are shared between potentially many daemons. Dependencies get a lot more complicated than that, but that's just one example.
Really, the bottom line here is that there should be reasonable, sane conventions for where things are installed on the system. This is a lesson that old UNIX people knew well. There are even conventions for how to install multiple different versions of C/C++ shared libraries at the same time, and a tool for finding out what depends on what (ldd). Java's CLASSPATH mechanism itself is just a version of LD_LIBRARY_PATH, which also has a very well-justified bad reputation.
I don't know of anyone who actually uses OSGI. I think it might be one of those technologies that just kind of passed some kind of complexity singularity and imploded on itself, like CORBA. But I have no direct experience with it, so maybe that is unfair.
I like what Golang is doing with build systems and dependency management. They still lack the equivalent of shared libraries, though. Hopefully, when they do implement that feature, they'll learn from the lessons of the past.
Why are the dependencies a problem? By combining a handful of smaller, focused modules that each do something well, you can end up with something better than if you were to re-invent the wheel for every need.
AWS and the Python dev team are doing a heck of a job on botocore, and have cranked up the pace of improvement in the last 6 months. This CLI reaching "official" status will guarantee (at least until further notice) that it will see updates and fixes. It's likely to see early or earlier support for new AWS services.