Monday, September 26, 2011

Teach your build tool jars, not classes

As far as I can tell, the single compelling feature about ant as a build system is that it makes it easy to compile Java. I just encountered a wiki page where SCons developers are discussing some of the problems.

One of the problems is this:
Currently the DAG built by SCons is supposed to have knowledge about every single generated file so that the engine can work out which files need transformation. In a world where there is 1 -> 1 relationship between source file and generated file (as with C, C++, Fortran, etc.) or where there is 1 -> n relationship where the various n files can be named without actually undertaking the compilation, things are fine. For Java, and even more extremely for languages like Groovy, it is nigh on impossible to discover the names of the n files without running the compiler -- either in reality or in emulation.

It's actually worse. It's not really a 1->n compile. The 1 file on the left can only be compiled by consulting other input files, and if any of those files change, you also need to recompile. Determining the exact dependency graph is a rather complicated problem.

I believe such a graph is unavoidable and indispensable if you want to have a decent build tool. "Decent" is subjective, but surely anyone would say that rebuilds should be reliable. You don't generally get that with ant. If you are building with ant, your have probably gotten very familiar with "ant clean".

To address Java's build problems without having to use ant, what I do is set up my build files in terms of jar files rather than directories of class files. If you do that, then even though the Java or Scala compiler produces loads of class files, the build tool doesn't actually see them. They are created in a temporary directory, combined into a jar, and then the temporary directory is deleted. While it's true that you don't get the optimal rebuild this way if you change just one file, I'd usually use an IDE if I am repeatedly editing the files of a single Java or Scala module. If I change just one file at random, I'd prefer to have a safe rebuild than the absolute fastest one.

In principle you can update the build tool to accurately model all the class files. Ant's depend task does so, and the Simple Build Tool uses a scalac plugin to track dependencies. While I don't have experience with the SBT version, I have found the ant version to significantly slow down compiles and yet still not be completely reliable. I prefer to stick with jar files and have the build tool be reliable. Besides, you shouldn't have to use a particular build tool just because your project includes some code in some particular language. It doesn't scale as soon as you add a second language to the project.

1 comment:

Unknown said...

This is pretty much how Maven works. You create projects. Projects produce JAR artifacts and depend on other JAR artifacts (some local, some from the entire Java world). You change a file in a project, and you need to rebuild that project.

If the rebuilds are a problem, just refactor the project into smaller sub projects.

Being a declarative set of dependencies (as opposed to imperative sequence of instructions to create them) also produces benefits in that Maven project information allows IDEs to import project dependency structures, it allows dependency analysis tools, and it permits search engines like http://mvnrepository.net