Monday, February 2, 2009

Private dependencies in Java

In any system of components, the more contexts you want a component to work in, the lower you want its dependencies to be. Using a Java jar as a component format works pretty well, but it does have a weird non-obvious dependency. Java jars heavily use the global, classloader-wide name space of package and top-level class names. Thus, whenever a jar includes a class name at the top level, that jar can only be used in tandem with other jars that do not have a same-named class at the top scope. This is a dependency, but it looks a little backwards at first. Most dependencies are that the context do provide a service. This dependency is that the context not. This weirdness has repercussions that are analogous to the problems of programming with lots of global variables.

The most common cases of this issue are addressed by using DNS-based package names. This approach doesn't work, though, for diamond-shaped dependencies. Imagine that component Lib is used by both A and B, and that both A and B are used by an application App. App then depends on Lib through two different paths: App->A->Lib and App->B->Lib. There are several ways to cope with this multiple dependency, and it's not solvable in general. A good component distribution system can help you manage the difficulties, but it won't solve it outright.

Within this set of problematic cases, there is a large subset where it would actually work fine for A and B to get their own separate copies of Lib. Having two copies around has its down sides, such as he memory and program-size of the resulting bundle being larger. However, the robustness of the overall application is frequently higher because both A and B are using the version of Lib that they were developed against. For those cases where the trade off looks good, one question is, how can it be implemented in Java?

I know of two general approaches, and would love to know what else is out there.

One approach is to rename A's preferred version of Lib to use different global names than it originally did. All packages in Lib are renamed to somewhere else, typically underneath A's chunk of the global DNS-based name space. Simultaneously, all references in A to something in Lib are updated to point to the new location. This approach is supported by a tool called Jarjar. To make a long story short, we've tried this approach with some of GWT's third-party dependencies, and we have roughly a 50% success rate so far. For some libraries, such a renaming Just Works.

Other libraries have trouble, because somewhere there is some class reference that Jarjar doesn't know how to rename. For example, Jetty includes XML configuration files that have class names embedded in them, and Jarjar doesn't rename those by default. Arguably it is questionable style to program this way, but for one reason or another some libraries do.

This leads to a second approach: load the library in a separate class loader from its client. That is, instead of A accessing Lib directly, it creates a class loader and loads Lib in that separate class loader. This approach has the beauty of reliably putting the implementation of Lib in a separate name space from the rest of the application. However, for it to work, there must be an interface to the library still included in the main application. This leads to two significant limitations of the approach. First, the library must have a small enough interface that it is practical to maintain all of these interfaces. Second, the interfaces still have the dependency hell problem, because they will be in the main package and can conflict with each other.

It's a nasty problem in general, one that it would be great for a language to address more thoroughly. I know that the OSGi system used by Eclipse has some support for this approach, with its heavy use of interfaces and subsidiary class loaders. I don't know the details of how it works, though, because it comes in the context of a bunch of other component-related features that I have never had time to study.

1 comment:

Neil Bartlett said...

I think you should take a closer look at OSGi. Many of these problems are exactly what OSGi is designed to support. The "component related" features exist at a higher level in OSGi called the Service Layer, so it's possible to learn about and use the Module Layer on its own if that's your preference.

As you implied, OSGi creates a separate classloader for each module (or "bundle", to use the OSGi terminology) and strictly controls the visibility of classes between bundles using a system of exports and imports. The old Java flat classpath is not used.

So the "diamond" scenario you described is solved this way in OSGi. A and B both import Lib, and they may choose to import different versions of Lib (all imports and exports are versioned). Both versions reside in memory at once.

This works very well as long as A's and B's use of Lib is "internal". If instances of classes from Lib need to be passed out of or between A and B, or up to App, then you will get a ClassCastException. But this is as it should be: if A and B need to communicate in terms of artefacts from Lib then they should have a common understanding of what those artefacts are.