Lex Spoon: browserology

Showing posts with label browserology. Show all posts

Monday, July 2, 2012

Saving a file in a web application

I recently did an exploration of how files can be saved in a web application. My specific use case is to save a table of numbers to an Excel-friedly CSV file. The problem applies any time you want to save a file to the user's local machine, however.

There are several Stack Overflow entries on this question, for example Question 2897619. However, none of them have the information organized in a careful, readable way, and I spent more than a day scouting out the tradeoffs of the different available options. Here is what I found.

Data URLs and the download attribute

Data URLs are nowadays supported by every major browser. The first strategy I tried is to stuff the file's data into a data URL, put that URL as the href of an anchor tag, and set the download attribute on the anchor.

Unfortunately, multiple problems ensue. The worst of these is that Firefox simply doesn't support the download attribute; see Issue 676619 for a depressingly sluggish discussion of what strikes me as a simple feature to implement. Exacerbating the problem is Firefox Issue 475008. It would be tolerable to use a randomly generated filename if at least the extension were correct. However, Firefox always chooses .part at the time of this writing.

Overall, this technique is Chrome-specific at the time of writing.

File Writer API

The File Writer API is a carefully designed API put together under the W3C processes. It takes account of the browser security model [sic], for example by disallowing file access except those verified by the user by using a native file picker dialog.

This API is too good to be true. Some web searching suggests that only Chrome supports or even intends to support it; not even Safari is marked as planning to support it, despite the API being implemented in Webkit and not in Chrome-specific code. I verified that the API is not present in whatever random version of Firefox is currently distributed with Ubuntu.

The one thing I will say in its favor is that if you are going to be Chrome-specific anyway, this is a clean way to do it.

ExecCommand

For completeness, let me mention that Internet Explorer also has an API that can be used to save files. You can use ExecCommand with SaveAs as an argument. I don't know much about this solution and did not explore it very far, because LogicBlox web applications have always, so far, needed to be able to run in non-Microsoft browsers.

For possible amusement, I found that this approach doesn't even reliably work on IE. According to a Stack Overflow post I found, on certain common versions of Windows, you can only use this approach if the file you are saving is a text file.

Flash

Often when you can't solve a problem with pure HTML and JavaScript, you can solve it with Flash. Saving files is no exception. Witness the Downloadify Flash application, which is apparently portable to all major web browsers. Essentially, you embed a small Flash application in an otherwise HTML+JavaScript page, and you use the Flash application to do the file save.

I experimented with Downloadify's approach with some custom ActionScript, and there is an unfortunate limitation to the whole approach: there is a widely implemented web browser security restriction that a file save can only be initiated in response to a click. That alone is not a problem by itself in my context, but there's a compounding problem: web browsers do not effectively keep track of whether they are in a mouse-click context if you cross the JavaScript-Flash membrane.

Given these restrictions, the only way I see to make it work is to make the button the user clicks on be a Flash button rather than an HTML button, which is what Downloadify does. That's fine for many applications, but it opens up severe styling issues. The normal way to embed a Flash object in a web page involves using a fixed pixel size for the width and height of the button; for that to work, it implies that the button's face will be a PNG file rather than nicely formatted text using the user's preferred font. It seems like too high of a price to pay for any team trying to write a clean HTML+JavaScript web application.

Use an echo server

The most portable solution I am aware of is to set up an echo server and use a form submission against that server. It is the only non-Flash solution I found for Firefox.

In more detail, the approach is to set up an HTML form, stuff the data to be saved into a hidden field of the form, and submit the form. Have your echo server respond with whatever data the client passed to it, and have it set the Content-Disposition HTTP header to indicate that the data should be saved to a file. Here is a typical HTTP header that can be used:

Content-Disposition: attachment; filename=export.csv

This technique is very portable; later versions of Netscape would probably be new enough. On the down side, it requires significant latency to upload the content to the server and then back down again.

Thursday, October 6, 2011

Throttling via the event queue

Here's a solution to a common problem that has some interesting advantages.

The problem is as follows. In a reactive system, such as a user interface, the incoming stream of events can sometimes be overwhelming. The most common example is mouse-move events. If the OS sends an application a hundred mouse-move events per second, and if the processing of each event takes more than ten milliseconds, then the application will drift further and further behind. To avoid this, the application should discard enough events that it stays caught up. That is, it should throttle the event stream. How should it do so?

The solutions I have run into do one of two things. They either delay the processing of events based on wall-clock time, or they require some sophisticated support from the event queue such as the ability to look ahead in the queue. The solutions that use time have the problem that they often introduce a delay that isn't necessary; the user will stop moving the mouse, but the application won't know it, so it will add in a delay anyway. The solutions using fancy event queues are not always possible, depending on the event queue, and anyway they make the application behavior more difficult to understand and test.

An alternative solution is as follows. Give the application a notion of being paused, and have the application queue Unpause events to itself to get out of the paused state. The first time an event arrives, process it as normal, but also pause the application and queue an Unpause event. If any other events arrive event while the application is paused, simply queue them on the side. Once the Unpause event arrives, if there are any events on the side queue, drain the side queue, process the last event, and queue another Unpause event. If an Unpause event arrives before any other events are queued, then simply mark the application unpaused.

This approach has much of the advantages of looking ahead in the event queue, but it doesn't require any direct support for doing so. It also has as good of responsiveness under system load as appears possible to achieve. If the system is lightly loaded, then every event is processed, and the system is just as responsive as it would be without the throttling. If the system is loaded, then enough events are skipped that the application avoids backlogging further and further behind. If the load is temporarily high, and then stops, then the last event will be processed promptly.

The one tricky part of implementing this kind of solution is posting the Unpause event from the application back to the application itself. That event needs to be on the same queue that the other work is queuing up on, or the approach will not work. How to do this depends on the particular event queue in question. For the case of a web browser, the best technique I know is to use setTimeout with a timeout of one millisecond.

Tuesday, June 29, 2010

Wrapping code is slow on Firefox

UPDATE: Filed as bug 576630 with Mozilla. It would be great if this slowdown can be removed, because wrapping chunks of code in a function wrapper is a widely useful tool to have available.

I just learned, to my dismay, that adding a single layer of wrapping around a body of JavaScript code can cause Firefox to really slow down. That is, there are cases where the following code takes a second to load:

statement1
statement2
...
statement1000

Yet, the following equivalent code takes 30+ seconds to load:

(function() {
statement1
statement2
...
statement1000
})()

This is disappointing, because wrapping code inside a function is a straightforward way to control name visibility. If this code defines a bunch of new functions and vars, you might not want them all to be globally visible throughout a browser window. Yet, because of this parsing problem on Firefox, simply adding a wrapper function might not be a good idea.

After some investigation, the problem only arises when there are a lot of functions defined directly inside a single other function. Adding another layer of wrapping gets rid of the parse time problem. That is, the following parses very quickly:

(function() {
(function() {
statement1
..
statement10
})()
...
(function() {
statement991
...
statement1000
})()
})()

Of course, to use this approach, you have to make sure that the cross-references between the statements still work. In general this requires modifying the statements to install and read properties on some object that is shared among all the chunks.

Example Code and Timings

I wrote a Scala script named genslow.scala that generates two files: test.html and module.html. Load the first page in firefox, and it will cause a load of the second file into an iframe. An alert will pop up once all the code is loaded saying how long the load took.

There are three variables the top of the script that can be used to modify module.html. On my machine, I get the following timings:

default: 1,015 ms
jslink: 1,135 ms
wrapper: 34,288 ms
wrapper+jslink: 52,078 ms
wrapper+jslink+chunk: 1,188 ms

The timings were on Firefox 3.6.3 on Linux. I only report the first trial in the above table, but the pattern is robust across hitting reload.

Saturday, June 5, 2010

Evidence from successful usage

One way to test an engineering technique is to see how projects that tried it have gone. If the project fails, you suspect the technique is bad. If the project succeeds, you suspect the technique is good. It's harder than it sounds to make use of such information, though. There are too few projects, and each one has many different peculiarities. It's unclear which peculiarities led to the success or the failure. In a word, these are experiments are natural rather than controlled.

One kind of information does shine through from such experiments, however. While they are poor at comparing or quantifying the value of different techniques, they at least let us see which techniques are viable. A successful project requires that all of the techniques used are at least tolerable, because otherwise the project would have fallen apart. Therefore, whenever a project succeeds, all the techniques it used must at least be viable. Those techniques might not be good, but they must at least not be fatally bad.

This kind of claim is weak, but the evidence for it is very strong. Thus I'm surprised how often I run into knowledgeable people saying that this or that technique is so bad that it would ruin any project it was used on. The most common example is that people love to say dynamically typed languages are useless. In my mind, there are too many successful sites written in PHP or Ruby to believe such a claim.

Even one successful project tells us a technique is viable. What if there are none? This question doesn't come up very often. If a few people try a technique and it's a complete stinker, they tend to stop trying, and they tend to stop pushing it. Once in a while, though....

Once in a while there's something like synchronous RPC in a web browser. The technique certainly gets talked about. However, I've been asking around for a year or two now, and I have not yet found even one good web site that uses it. Unless and until that changes, I have to believe that synchronous RPC in the browser isn't even viable. It's beyond awkward. If you try it, you won't end up with a site you feel is launchable.

Tuesday, December 29, 2009

Browsers giving up on repeatedly failing script tags

Today's browser oddity is that Chrome instantly fails a script tag download if the requested URL has already failed a couple of times. It stops issuing network requests and fails the download before even trying. Probably other WebKit-based browsers do the same; I didn't check.

I can see why this behavior would make sense if you think of script tags as part of the static content of a page. If there are ten script tags requesting the same resource, you don't want to issue ten failing requests. However, it caught me by surprise, because I'm trying to use script tags as a way of downloading content over the web.

Browsers sure are a messy platform.

Tuesday, December 15, 2009

Detecting download failures with script tags

Matt Mastracci has done some experimentation and found that most browsers provide some callback or another for indicating that a script tag has failed to download. This is very interesting, because script tags don't have to follow the Same Origin Policy. Here is my replication, for the possible aid of anyone else wandering in these murky woods.

Available callbacks

The simplest callback is the onerror attribute. It can be attached to a script tag like this:


script.onerror = function() { 
  /* code here is called if the download fails */
}

For completeness, there is also an onload attribute. It's analagous to onerrer except that it indicates success rather than failure. It can be attached to a script tag like this:


script.onload = function() {
  /* code here is called if the download succeeded */
}

Finally, IE supports onreadystatechange, similarly to the XHR attribute of the same name. The supplied callback will be invoked as the download progresses. The state of the download can be queried via the readyState attribute, which will reach state 'loaded' and/or 'complete'.


script.onreadystatechange= function () {
  if (script.readyState == 'loaded') {
    script.onreadystatechange = function () { }  // prevent duplicate calls
    /* error handling code goes here */
  }
}

Results

I used the test page to see which of the three events are fired on several browsers.

Loading a bad page:
Firefox 3.5: onerror
Safari 4: onerror
Chrome 4: onerror
IE 7: onreadystatechange
IE 8: onreadystatechange

Loading a good page:
Firefox 3.5: onload
Safari 4: onload
Chrome 4: onload
IE 7: onreadystatechange (if not cached)
IE 8: onreadystatechange (if not cached)

Analysis

The onerror attribute works on all browsers but IE. For IE, onreadystatechange is available. Conveniently, no browser supports both of them, so a handler hooked up to both of them will fire exactly once.

A complication on IE is that onreadystatechange doesn't differentiate whether the download succeeded or not. Downloading a non-cached version looks the same as a download failure. Thus, any code using onreadystatechange needs to check whether the download succeeded or not.

Followup: Order of evaluation versus onreadystatechange

On IE, if onreadystatechange indicates the installation is complete, in what circumstances should the loading be considered to have failed?

I did a followup test where the loaded code (exists.js) does a window.alert. That way, I can see which happens first: the alert, or the onreadystatechange callback. On both IE7 and IE8, the alert happens first. That means if the script sets a global flag to true once it loads, the onreadystatechange callback can check it and reliably determine whether the download has succeeded.

Test script


<head> 
<title>Test page</title> 
 
<script> 
function loadFile(url) {
  var head = document.getElementsByTagName('head').item(0);
  var script = document.createElement('script');
  script.src = url;
  script.onload = function() {
    window.alert("onload called");
  }
  script.onerror = function() {
    window.alert("onerror called");
  }
  script.onreadystatechange= function () {
    if (script.readyState == 'loaded') {
      script.onreadystatechange = function () { }
      window.alert("onreadystatechange (" + script.readyState + ")");
    }
  }
  head.appendChild(script);
 
}
 
function good() {
  loadFile("exists.js");
}
function bad() {
  loadFile("bad.js");
}
</script> 
</head> 
 
<body> 
<input type="button" value="good" onclick="good()"> 
<input type="button" value="bad" onclick="bad()"> 
</body>

Wednesday, March 4, 2009

Installing top-level code with JavaScript's eval

JavaScript is wonderfully dynamic, so it is odd that its eval function is so unportable. I already knew that it was tricky if not impossible to use eval to install code in an arbitrary nested scope. Today I learned that even the simple case of installing code into the global scope is different on each browser. Here's what I found after some digging around on the web and some experimentation.

First, there are a lot of web pages discussing this topic. Here's one of the first ones I read, that tipped me off that there is a well-known problem:

http://piecesofrakesh.blogspot.com/2008/10/understanding-eval-scope-spoiler-its.html

The following page also discusses the problem, but has a really good collection of comments:

http://dean.edwards.name/weblog/2006/11/sandbox/

UPDATE: Prototype has gone through the same issue, and come up with similar conclusions as mine. Here is a page with all the bike shedding:

http://prototype.lighthouseapp.com/projects/8886/tickets/433-provide-an-eval-that-works-in-global-scope

Based on reading these and on tinkering on different web browsers, here are some techniques that look interesting:

window.eval, what I tried to begin with
window.eval, but with a with() clause around it. Some people report better luck this way.
window.execScript, a variant of window.eval
window.setTimeout
adding a script tag to the document

What I did in each case was try to use the technique to define a function foo() at the global scope, and then try to call it. I tested these browsers, which I happen to have handy:

Safari/Mac 3.1.1
Firefox/Mac 3.0.6
Firefox/Linux 2.0.0.20
Firefox/Windows 3.0.3
IE 6.0.2900.xpsp_sp3_gdr.080814-1236 updated to SP3
Chrome 1.0.154.48

Here are the browsers where each technique works. I lump together the Firefoxes because they turn out to behave the same on all platforms:

window.eval: FF
window.eval with with: FF
window.execScript: IE, Chrome
window.setTimeout: Chrome, FF, Safari
script tag: IE, Chrome, FF, Safari

Conclusions

The window.execScript function is available on IE and Chrome, and when present it does the right thing.
The window.eval function only works as desired on Firefox.
Adding a with(window) around the window.eval does make a difference, but I couldn't get it to do precisely what is needed for GWT. In particular, GWT does not have a bunch of "var func1,func2, func3" declarations up front, but such vars are assumed in some of the other web pages I read.
I could not find a synchronous solution for Safari. Instead, setTimeout and script tags work, but they won't load the code until a few milliseconds have gone by.
Script tags work on all browsers.
Surprisingly, I couldn't get setTimeout to work on IE. From some web browsing, it looks like the setTimeout callback might run in the wrong scope, but I didn't investigate far. On IE, execScript is a better solution for the present problem.

Based on these, the following chunk of code is one portable way to install code on any of the major browsers. It uses execScript if it's available, and otherwise it adds a script tag.

if (window.execScript) {
  window.execScript(script)
} else {
  var tag = document.createElement("script")
  tag.type = "text/javascript"
  tag.text = script
  document.getElementsByTagName("head").item(0).appendChild(tag)
}

The Code

Here is the code for the above examples, for anyone who wants to know the details and/or to try it for themselves.

The wrapper script is as follows:

function installFoo() {
var script = "function foo() { alert('hi') }"
// varying part
}
installFoo()
window.foo()

For the versions that install the code asynchronously (setTimeout or script tags), I changed the window.foo() line to be:

window.setTimeout(function() { window.foo() }, 100)

The "varying part" is as follows for each way to load the code. Note that some of them include a gratuitous reassignment of window to $w; that's how I first ran the test and I don't want to go back and redo all of those.

// window.eval
window.eval(script)

// window.execScript
window.execScript(script)

// window.eval with a with
var $w = window
with($w) { $w.eval(script) }

// setTimeout
window.setTimeout(script, 0)

// script tag
var tag = document.createElement("script")
tag.type = "text/javascript"
tag.text = script
document.getElementsByTagName("head").item(0).appendChild(tag)

Tuesday, January 6, 2009

Just how evil is synchronous XHR?

The web is based on de facto specs, requiring a lot of investigation to find out what exactly the platform does. One question about the de facto behavior is being reraised by code-splitting efforts: just how bad is synchronous XHR, if used in a place that the application may as well pause anyway? This question comes up because you can't use GWT's code-splitting approach without good static analysis. You can use dynamic analysis, but whenever the dynamic analyzer guesses wrong, the system must fall back on synchronous XHR.

Mark S. Miller sent me a link to Mark Pruett, who did some actual experiments to see what happens in practice. Pruett concludes that all browsers but Firefox 2 are fine.

Kelly Norton is less sanguine. While he likes Opera's behavior reasonably well, he's unsure about IE 6, and he thinks the Safari discussion is inaccurate. It's not clear to what extent browsers are going to mimic Opera.

Overall, I come away thinking that synchronous XHR is reasonable for code splitting so long as the dynamic analyzer is only very infrequently wrong. The app will freeze when it happens, which is bad, but it should be infrequent. Further, the development effort will need to put time into setting up a suite of interaction cases to feed to the dynamic analyzer, and that will consume time. I guess extra time is expected, though, if you want better results.

It makes me glad not to be working with raw JavaScript, though. To the extent code-splitting is important, I really think new web applications should not use raw JavaScript. They should use some analyzable subset of JavaScript that has not yet been defined, or they should use a suitable existing language such as Java.

Lex Spoon