Monday, September 21, 2009

Exclusively live code

Dead-for-now code splitting has the compiler divide up the program into smaller fragments of code that can be downloaded individually. How should the compiler take advantage of this ability? How should it define the chunks of code it divides up?

Mainly what GWT does is carve out code that is exclusively live once a particular split point in the program has been activated. Imagine a program with three split points A, B, and C. Some of the code in the program is needed initially, some is needed when only A has been activated, some is needed when only B has been activated, and some is needed once both are activated, etc. Formally, these are written as L({}), L({A}), L({B}), and L({A,B}). The "L" stands for "live", the opposite of "dead". The entire program is equivalent to L({A,B,C}), because GWT strips out any code that is not live under any assumptions.

The code exclusively live to A would be L({A,B,C})-L({B,C}). This is the code that is only needed once A has been activated. Such code is not live is B has been activated. It's not live when C has been activated. It's not live when both B and C together are activated. Because such code is not needed until A is activated, it's perfectly safe to delay loading this code until A is reached. That's just how the GWT code splitter works: it finds code exclusive to each split point and only loads that code once that split point is activated.

That's not the full story, though. Some code isn't exclusive to any fragment. Such code is all put into a leftovers fragment that must be loaded before any of the exclusive fragments. Dealing with the leftovers fragment is a significant issue for achieving good performance with a code-split program.

That's the gist of the approach: exclusively live fragments as the core approach plus a leftover fragment to hold odds and ends. It's natural to ask why, and indeed I'm far from positive it's ideal. It would be more intuitive, at least to me, to focus on positively live code such as L({A}) and L({B,C}). The challenges so far are to come up with such a scheme that generalizes to lots of split points. It's easy to dream up strategies that cause horrible compile time or bad web-browser caching behavior or both, problems that the current scheme doesn't have. The splitting strategy in GWT is viable, but there might well be better ones.

3 comments:

Sami Jaber said...

Lex,

The fact that leftover fragment is loaded before any exclusive ones is a bit annoying.
Let's say I have 3 split points, A,B,C.
initial fragment is loaded and when I click on a button, A is loaded. There is a shared type that is used in B and C but not in A. As it is a shared type, it is placed into leftovers.

But if I never call B and C, leftover will be loaded. Correct me if I'm wrong (I have tested it)

Sami

Lex Spoon said...

That's right, Sami. There's just one leftovers produced.

As a practical matter, one option you have in such a case is to put a split point around the shared type. Then you'll get a fragment corresponding to that type, albeit at the expense of an extra round trip.

As another option, combine B and C so that they are behind a common split point.

As for the system in general, it makes sense in principle to have more than one leftovers fragment. It's simply devilish to come up with a precise splitting algorithm that does so.

John Zabroski said...

Lwx,

I would think that if you load code by user role, then two users might load two different sets of code. Thus, we would have signficant server-side caching memory pressure, but each individual client could cache modules sepcific to them. In the current version of our architecture, the way we've done things is piss-poor... we use virtual machines to maneauver around the fact we can't scale beyond 300 processes per app instance. -- I see this as a problem the client and server collaborate to solve.

This is why I said Ben Livshits has an essential question to answer: If clustering analysis shows that split points correspond to program features (and features we know are user-specific), then how come we just don't code that way from the start?

Finally, "leftovers" is very application-specific. What if leftovers is a service layer library that is used across many applications? If you start dividing it different ways across each program, then you lose client-side caching consistent across all apps - in short, you lose your CDN.