https://github.com/PredictionIO/PredictionIO/blob/develop/process/engines/itemsi...

mcphilip · on Oct 19, 2013

Is the inconviently deep tree data structure housing an OO framework really so important that it should be discussed front and center on HN? Isn't the function of the framework more interesting? Honest question.

vidarh · on Oct 19, 2013

People who might want to understand the code and make changes in order to take advantage of it will care about form. It matters. I often pick tools based not only on function, but on how easy it will be for me to make changes or maintain the code base if necessary, and I doubt I'm the only one.

tsenkov · on Oct 19, 2013

It matters, but not as much as this: https://news.ycombinator.com/item?id=6574237

mbell · on Oct 18, 2013

I can't explain the first 80% of it but the last bit:

    io/prediction/evaluations/itemsim/topkitems/

Is the folder structure required by the package definition.

EDIT: mostly it seems to be the result of the project being constructed as dozens of separate modules each with it's own build process...yikes...

yeukhon · on Oct 18, 2013

Well after all, Java and Scala they are like father and son ....

sigh

lespea · on Oct 18, 2013

While I also find the incredibly deep nesting on the excessive side, I can explain it.

* process/engines/itemsim/evaluations/scala/topkitems

They're using sbt's awesome multi-project feature here (http://www.scala-sbt.org/release/docs/Getting-Started/Multi-...) so basically every "sub-project" that makes up the whole can have its own dependencies, options, versions, etc while also maintaining which projects depend on each other. This really helps keep all of the logic separated and sbt deals with all the compilation-order madness that ensues when you have a tangled nest of inter-dependencies.

Note: this isn't necessarily reflected on what is published as most projects that do this will still publish it as one single jar file or project; it just helps with development and really helps with compilation speed (in my experience).

Again not really condoning what they're doing here as they're really taking it to an extreme; for my "big" project I basically just have a top-level "modules" folder and each sub project is one below that. You can see how the hierarchy is defined here: https://github.com/PredictionIO/PredictionIO/blob/develop/bu... which I find to be quite human-readable but I've been using sbt for years so ymmv. The customized settings for that particular project are here: https://github.com/PredictionIO/PredictionIO/blob/develop/pr...

* src/main/scala

This is the basic structure of an sbt project. By default, you put all of your code/resources in the `src` directory. Then you have two directories, main and test(optional) which is how you seperate code/resources that belong in the final project and which is just used for testing. The last level there are three (default) directories that are processed: java, scala, and resources. The first two should be pretty self explanatory and the last is where you put any files that you need to be packaged/available to your project. So if you have main/resources/aDir/logback.xml then you can reference that (via class resources which is a java thing) with "aDir/logback.xml" (I didn't include a leading slash because it's ~complicated).

Example layout:

    src
     - main
       - java
       - resources
       - scala
     - test
       - java
       - resources
       - scala

* io/prediction/evaluations/itemsim/topkitems/TopKItems.scala

In Java it is mandatory that your package name be reflected in your directory structure. So here we can see that the TopKItems class is in the package "io.prediction.evaluations.itemsim.topkitems" if they followed that convention. As hinted at, scala does not mandate this silly requirement but it's considered best practice to follow along as it keeps things separated and easy to follow. Scala projects mostly used short package names so it isn't as nested as this.

This might all make it seem that development would be a nightmare trying to manage everything but all of this integrates beautifully with a good IDE such as IntelliJ (which is the recommended one for scala -- eclipse is just way too slow and freezes constantly, even on beefy machines). You just run a quick gen-idea command and the entire thing is recognized by intellij, sub-projects and all. You never even see the crazy nesting of folders!

P.S. I mostly just lurk here so I'm sure I butchered the markdown. Sorry.

dmazin · on Oct 18, 2013

I'm not evaluating or judging, but this is quite a path:

  /develop/process/engines/itemrec/algorithms/hadoop/...

  cascading/popularrank/src/main/java/io/prediction/algorithms/...

  cascading/itemrec/popularrank/PopularRankAlgo.java

Most of these contain a single folder.