I recently explored
using JavaScript
source maps with a language very different from JavaScript. Source maps let developers debug in a web browser while still looking at original source code, even if that source code is not JavaScript. A lot of programming languages support them nowadays, including
Dart, Haxe,
and CoffeeScript.
In my case, I found it helpful to use "source" code that is different from what the human programmers typed into a text editor and fed to the compiler. This post explains why, and it gives a few tricks I learned along the way.
Why virtual source?
It's might seem obvious that the source map should point back to
original source code. That's what the Closure Tools team designed it
for, and for goodness' sake, it's called a source map. That's
the approach I started with, but I ran into some difficulties that
eventually led me to a different approach.
One difficulty is a technical one. When you place a breakpoint in
Chrome on a file mapped via a source map, it places one and only one
breakpoint in the emitted JavaScript code. That works fine for a
JavaScript-to-JavaScript compiler, but I was compiling from Datalog.
In Datalog, there are cases where the same line of source code
is used in multiple places in the
output code. For example, Datalog rules are run in two different
modes: once during the initial bootstrapping of a database instance, and
later during an Orwellian "truth maintenance" phase. With a conventional
source map, it is only possible to breakpoint one of the instances,
and the developer doesn't even know which one they are getting.
That problem could be fixed by changes to WebKit, but there is a
larger problem: the behavior of the code is different in each
of its variants. For example, the truth maintenance code for a Datalog
rule has some variants that add facts and some that remove them. A
programmer trying to make sense of a single-stepping session needs to
know not just which rule they have stopped on, but which mode of
evaluation that rule is currentlty being used in. There's nothing in
the original source code that can indicate this difference; in the
source code, there's just one rule.
As a final cherry on top of the excrement pie, there is a
significant amount of code in a Datalog runtime that doesn't have any
source representation at all. For example, data input and data output
do not have an equivalent in source code, but they are reasonable
places to want to place a breakpoint. For a source map pointing to
original source code, I don't see a good way to handle such loose
code.
A virtual source file solves all of the above problems. The way it
works is as follows. The compiler emits a virtual source file in
addition to the generated JavaScript code. The virtual source file is
higher-level than the emitted JavaScript code, enough to be human
readable. However, it is still low-level enough to be helpful for
single-step debugging.
The source map links the two forms of output together. For each
character of emitted JavaScript code, the source map maps it to a line
in the virtual source file. Under normal execution, web browsers use
the generated JavaScript file and ignore the virtual source file. If
the browser drops into a debugger--via a breakpoint, for example--then
it will show the developer the virtual source file rather than the
generated JavaScript code. Thus, the developer has the illusion that
the browser is directly running the code in the virtual source file.
Tips and tricks
Here are a few tips and tricks I ran into that were not obvious at
first.
Put a pointer to the original source file for any code where such a
pointer makes sense. That way, developers can easily go find the
original source file if they want to know more context about where the
code in question came from. Here's the kind of thing I've been using:
/* browser.logic, line 28 */
Also, for the sake of your developers' sanity, each character of
generated JavaScript code should map to some part of the source
code. Any code you don't explicitly map will end up implicitly
pointing to the previous line of virtual source that does have a
map. If you can't think of anything to put in the virtual source file,
then try a blank line. The developer will be able to breakpoint and
single-step that blank line, which might initially seem weird. It's
less weird, though, than giving the developer incorrect information.
Name your JavaScript variable names carefully. I switched generated
temporaries to start with "z$" instead of "t$" so that they sort down
at the bottom of the variables list in the Chrome debugger. That way,
when an app developer looks at the list of variables in a debugger,
the first thing their eyes encounter are their own variables.
Emit variable names into the virtual source file, even when they
seem redundant. It provides an extra cue for developers as they
mentally map what they see in the JavaScript stack trace and what they
see in the virtual source file. For example, here is a line of virtual
source code for inputting a pair of values to the "new_input" Datalog
predicate; the "value0" and "value1" variables are the generated
variable names for the pair of values in question.
INPUT new_input(value0, value1)
Implementation approach
Implementing a virtual source file initially struck me as a
cross-cutting concern that was likely to turn the compiler code into a
complete mess. However, here is an approach that makes it not so bad.
The compiler already has an "output" stream threaded through all
classes that do any code generation. The trick is to augment the class
used to implement that stream with a couple of new methods:
- emitVirtual(String): emit text to the virtual source file
- startVirtualChunk(): mark the beginning of a new chunk of output
With this extended API, working with a virtual source file is
straightforward and non-intrusive. Most compiler code remains
unchanged; it just writes to the output stream as normal. Around each
human-comprehensible chunk of output, there is a call to
startVirtualChunk() followed by a few calls to emitVirtual(). For
example, whenever the compiler is about to emit a Datalog rule, it
first calls startVirtualChunk() and then pretty prints the code to the
emitVirtual() stream. After that, it emits the output JavaScript.
With this approach, the extended output stream becomes a single
point where the source map can be accumulated. Since this class
intercepts writes to both the virtual file and the final generated
JavaScript file, it is in a position to maintain a mapping between the
two.
The main downside to this approach is that the generated file and
the virtual source file must put everything in the same order. In my
case, the compiler is emitting code in a reasonable order, so it isn't
a big deal.
If your compiler rearranges its output in some wild and crazy
order, then you might need to do something different. One approach
that looks reasonable is to build a virtual AST while emitting the
main source code, and then only convert the virtual AST to text once
it is all accumulated. The startVirtualChunk() method would take a
virtual AST node as an argument, thus allowing the extended output
stream to associate each virtual AST node with one or more ranges of
generated JavaScript code.