Friday, February 6, 2009

Why not inject dependencies "manually"?

I just read the first half of the Guice User's Guide.  I stopped there because the motivation already left me hanging, and the very first feature I saw seems at odds with dependency injection.

I agree with the general sentiment of the user's guide: don't use new so much, and don't even use static factories.  Instead, "inject"  a class's dependencies via constructor parameters.  That way, the class is abstracted over both the service implementation as well as from where the service comes from.  This style of programming is already emphasized in at least the Joe-E and Scala communities.  I like it.  OO designers like it.  PL developers like it.

However, I don't understand the difficulty in doing this "manually".  The guide gives this lovely example of substituting a mock service in a test case:

public void testClient() {
  MockService mock = new MockService();
  Client client = new Client(mock);
  client.go();
  assertTrue(mock.isGone());
}

So far so good.  Here we see the payoff of moving the new ServiceImpl() out of Client is that the constructor of a Client can instantiate the service in an unusual way.  Where I get lost is in the instantiation of the normal production version of the client.  The manual gives this code sample:

public static class ClientFactory {
  private ClientFactory() {}
  public static Client getInstance() {
    Service service = ServiceFactory.getInstance();
    return new Client(service);
  }
}


Where did these two factories come from?  I find them odd, because one of the beauties to me of dependency injection is that it composes well.  In most cases, the production version would know exactly which concrete client and service to instantiate.  In the remaining cases, the code assembling them should itself have parameters influencing how to construct things.  Percolating this idea to the top level of the application, the parameters to the top-level application factory would be precisely those things configurable via configuration files and command-line arguments.

So far my experience matches this intuition.  The Scala compiler is implemented in a dependency-injection style.  Almost all modules find out about their dependencies by having a reference to the dependency passed in at construction time.  The Scala group has not taken advantage of this with testing mocks, even though that would seem straightforward.  However, I can attest that it was straightforward to reassemble the components to make an X10 variant of the compiler.

Overall, I believe that dependency injection is a good style.  However, I have not yet seen an example where it leads to extraneous code.  At least, the example in the guide isn't very good.  Further, the one tool feature I managed to read before being turned away by motivation was the ability to designate a "default" implementation of an interface.  Isn't this a temptation, though, not to use a dependency-injection style?  I would think if you are injecting dependencies, then any code constructing an implementation would already know enough to choose which one to construct.

Probably there is a breakdown somewhere in this argument.  Tools get popular for a reason.  At the least, though, the counter-argument is not documented in any easily findable place.

Monday, February 2, 2009

Private dependencies in Java

In any system of components, the more contexts you want a component to work in, the lower you want its dependencies to be. Using a Java jar as a component format works pretty well, but it does have a weird non-obvious dependency. Java jars heavily use the global, classloader-wide name space of package and top-level class names. Thus, whenever a jar includes a class name at the top level, that jar can only be used in tandem with other jars that do not have a same-named class at the top scope. This is a dependency, but it looks a little backwards at first. Most dependencies are that the context do provide a service. This dependency is that the context not. This weirdness has repercussions that are analogous to the problems of programming with lots of global variables.

The most common cases of this issue are addressed by using DNS-based package names. This approach doesn't work, though, for diamond-shaped dependencies. Imagine that component Lib is used by both A and B, and that both A and B are used by an application App. App then depends on Lib through two different paths: App->A->Lib and App->B->Lib. There are several ways to cope with this multiple dependency, and it's not solvable in general. A good component distribution system can help you manage the difficulties, but it won't solve it outright.

Within this set of problematic cases, there is a large subset where it would actually work fine for A and B to get their own separate copies of Lib. Having two copies around has its down sides, such as he memory and program-size of the resulting bundle being larger. However, the robustness of the overall application is frequently higher because both A and B are using the version of Lib that they were developed against. For those cases where the trade off looks good, one question is, how can it be implemented in Java?

I know of two general approaches, and would love to know what else is out there.

One approach is to rename A's preferred version of Lib to use different global names than it originally did. All packages in Lib are renamed to somewhere else, typically underneath A's chunk of the global DNS-based name space. Simultaneously, all references in A to something in Lib are updated to point to the new location. This approach is supported by a tool called Jarjar. To make a long story short, we've tried this approach with some of GWT's third-party dependencies, and we have roughly a 50% success rate so far. For some libraries, such a renaming Just Works.

Other libraries have trouble, because somewhere there is some class reference that Jarjar doesn't know how to rename. For example, Jetty includes XML configuration files that have class names embedded in them, and Jarjar doesn't rename those by default. Arguably it is questionable style to program this way, but for one reason or another some libraries do.

This leads to a second approach: load the library in a separate class loader from its client. That is, instead of A accessing Lib directly, it creates a class loader and loads Lib in that separate class loader. This approach has the beauty of reliably putting the implementation of Lib in a separate name space from the rest of the application. However, for it to work, there must be an interface to the library still included in the main application. This leads to two significant limitations of the approach. First, the library must have a small enough interface that it is practical to maintain all of these interfaces. Second, the interfaces still have the dependency hell problem, because they will be in the main package and can conflict with each other.

It's a nasty problem in general, one that it would be great for a language to address more thoroughly. I know that the OSGi system used by Eclipse has some support for this approach, with its heavy use of interfaces and subsidiary class loaders. I don't know the details of how it works, though, because it comes in the context of a bunch of other component-related features that I have never had time to study.