Tuesday, August 23, 2011

A Scientist's Manifesto?

I was disheartened today to read so many glowing reviews of Robert Winston's "Scientist's Manifesto".

I think of science as a way to search for knowledge. It involves forming explanations, making predictions based on those explanations, and then testing whether those predictions hold true. Scientists make claims, and they fight for those claims using objective evidence from repeatable experiments.

Winston promotes an alternative view of science, that scientists are people who are in a special inner circle. They've gone to the right schools, they've gone through the right processes, and they've had review by the right senior scientists. Essentially, they are priests of a Church of Science. His concern is then with the way in which members of this church communicate with the outside.

If that sounds overblown, take a look at item one in the 14-item manifesto. It even uses the term "layperson":
We should try to communicate our work as effectively as possible, because ultimately it is done on behalf of society and because its adverse consequences may affect members of the society in which we all live. We need to strive for clarity not only when we make statements or publish work for scientific colleagues, but also in making our work intelligible to the average layperson. We may also reflect that learning to communicate more effectively may improve the quality of the science we do and make it more relevant to the problems we are attempting to solve.

Aside from the general thrust of it, many individual items I disagree with. For example, I think of scientists interested in a topic as conferring with each other through relatively specialized channels. Thus item three is odd to me:
The media, whether written, broadcast or web-based, play a key role in how the public learn about science. We need to share our work more effectively by being as clear, honest and intelligible s possible in our dealings with journalists. We also need to recognize that misusing the media by exaggerating the potential of what we are studying, or belittling the work of other scientists working in the field, can be detrimental to science.

Of course, it makes perfect sense if you think of science as received wisdom that is then propagated to the masses.

I also think of science as seeking objective truth. I can't really agree the claim that it is relative:
We should reflect that science is not simply ‘the truth’ but merely a version of it. A scientific experiment may well ‘prove’ something, but a ‘proof’ may change with the passage of time as we gain better understanding.

I don't even think peer review is particularly scientific. The main purpose of peer review is to give a mechanism to measure the performance of academics. In some sense it measures how much other academics like you. Yet, item 8 in the manifesto claims that peer review is some sacred process that turns ordinary words into something special, much like the process behind a Fatwah:
Scientists are regularly called upon to assess the work of other scientists or review their reports before publication. While such peer review is usually the best process for assessing the quality of scientific work, it can be abused....

I have an alternative suggestion to people who want the public to treat scientists with esteem. Stop thinking of yourself as a priest, evangelist, or lobbyist trying to propagate your ideas. Instead, remember what it is that's special about your scientific endeavors. Explain your evidence, and invite listeners to repeat crucial parts of the experiment themselves. Don't just tell people you are a scientist. Show them.

Friday, August 19, 2011

Why LaTeX?

Lance Fortnow laments that no matter how crotchety he gets he can't seem to give up LaTeX:
LaTeX is a great system for mathematical documents...for the 1980s. But the computing world changed dramatically and LaTeX didn't keep up. Unlike Unix I can't give it up. I write papers with other computer scientists and mathematicians and since they use LaTeX so do I. LaTeX still has the best mathematics formulas but in almost every other aspect it lags behind modern document systems.

I think LaTeX is better than he gives it credit for. I also think it could use a reboot. It really is a system from the 80s, and it's... interesting how many systems from the 70s and 80s are still the best of breed, still in wide use, but still not really getting any new development.

Here's my hate list for LaTeX:
  • The grammar is idiosyncratic, poorly documented, and context-dependent. There's no need for any of that. There are really good techniques nowadays for having a very extensible language nonetheless have a base grammar that is consistent in every file and supports self-documentation.
  • You can't draw outside the lines. For all the flexibility the system ought to have due to its macro system, I find the many layers of implementation to be practically impenetrable. Well written software can be picked up by anyone, explored, and modified. Not so with LaTeX--you have to do things exactly the way the implementers imagined, or you are in for great pain and terrible-looking output.
  • The error messages are often inscrutable. They may as well drop all the spew and just say, "your document started sucking somewhere around line 1308".
  • The documentation is terrible. The built-in documentation is hard to find and often stripped out anyway. The Internet is filled with cheesy "how to get started" guides that drop off right before they answer whatever question you have.
  • Installing fonts is a nightmare. There are standalone true-type fonts nowadays. You should be able to drop in a font and configure LaTeX to use it. That this is not possible suggests that the maintainers are as afraid of the implementation as I am.
  • Files are non-portable and hard to extract. This problem is tied up in the implementation technology. Layered macros in a global namespace are not conducive to careful management of dependencies, so everything tends to depend on everything.
However, as bad as that list is, the pros make it worth it:
  • Excellent looking output, especially if you use any math. If you care enough to use something other than ASCII, I would think that the document appearance trumps just about any other concern.
  • Excellent collaborative editing. You can save files in version control and use file-level merge algorithms effectively. With most document systems, people end up mailing each other the drafts, which is just a miserable way to collaborate.
  • Scripting and macros. While you can't reasonably change LaTeX itself, what you can easily do is add extra features to the front end by way of scripts and macros.
  • It uses instant preview instead of WYSIWYG. WYSIWYG editors lead to quirky problems that are easy to miss in proofreading, such as headers being at the wrong level of emphasis. While I certainly want to see what the output will look like all the time, I don't want to edit that version. I want to edit the code. When you develop something you really want to be good, you want very tight control.
  • Scaling. Many document systems develop problems when a document is more than 10-20 pages long. LaTeX keeps on chugging for at least up to 1000-page documents.
I would love to see a LaTeX reboot. The most promising contender I know of is the Lout document formatting system, but it appears to not be actively maintained.

Monday, August 15, 2011

Paul Chiusano on software patents

Paul Chiusano reminds us why we would conceivably want software patents:
What I find irritating about all the software patent discussion is that patents are intended to benefit society - that is their purpose, "To promote the Progress of Science and useful Arts". But no one seems to want to reason about whether that is actually happening - that would mean doing things like thinking about how likely the invention was to be independely discovered soon anyway, estimating the multiplier of having the invention be in the public domain, etc. Instead we get regurgitation of this meme about making sure the little guy working in his basement gets compensated for his invention.

It's a good reminder. The point of patents is to make society better off.

The standard argument for patents requires, among other assumptions, that the patented inventions require a significant level of investment that would not otherwise occur. As Paul points out, that is not the case for software:
Software patents rarely make sense because software development requires almost no capital investment, and as a result, it is almost impossible for an individual to develop some software invention that would not be discovered by multiple other people soon in the future. Do you know of any individual or organization that is even capable of creating some software "invention" that would not be rediscovered independently anyway in the next five or ten years? I don't. No one is that far ahead of everyone else in software, precisely because there is no capital investment required and no real barriers to entry.

I agree.

I have read many posts where people try to fine tune software patents to make them less awful. I wish we could instead start by considering the more fundamental issue. Do we want software patents at all?

Wednesday, August 10, 2011

Schrag on updating the IRBs

Zachary Schrag has a PDF up on recent efforts to update IRBs. Count me in as vehemently in favor of two of the proposals that are apparently up for discussion.

First, there is the coverage of fields that simply don't have the human risks that medical research does:
Define some forms of scholarship as non-generalizable and therefore not subject to regulation. As noted above, the current regulations define research as work “designed to develop or contribute to generalizable knowledge.” Since the 1990s, some federal officials and universities have held that journalism, biography, and oral history do not meet this criterion and are therefore not subject to regulation. However, the boundaries of generalizability have proven hard to define, and historians have felt uncomfortable describing their work as something other than research.

I would add computer science to the list. A cleaner solution is as follows:
Accept that the Common Rule covers a broad range of scholarship, but carve exceptions for particular methods. Redefining “research” is not the only path to deregulation contemplated by the ANPRM, so a third possibility would be to accept Common Rule jurisdiction but limit its impact on particular methods.

Schrag's PDF gives limited attention to this option, but it seems the most straightforward to me. If a research project involves interviews, studies, or workplace observations, then it's just shouldn't need ethics review. The potential harms are so minor that it should be fine to follow up on reports rather than to require ahead-of-time review.

Schrag also takes aim at exempt determinations:
Since the mid-1990s, the federal recommendation that investigators not be permitted to make the exemption determination, combined with the threat of federal sanction for incorrect determinations, has led institutions to insist that only IRB members or staff can determine a project to be exempt. Thus, “exempt” no longer means exempt, leaving researchers unhappy and IRBs overwhelmed with work.

Yes! What kind of absurd system declares a project exempt from review but then requires a review anyway?

Monday, August 8, 2011

TechDirt on the latest draft of PROTECT IP

Tech Dirt has an analysis of the latest available version of PROTECT IP.
Yesterday, we got our hands on a leaked copy of the "summary" document put together by those writing the new version of COICA, now renamed the much more media friendly PROTECT IP Act. It looked bad, but some people complained that we were jumping ahead without the actual text of the bill, even if the summary document was pretty straightforward and was put together by the same people creating the bill. Thankfully, the folks over at Don't Censor the Internet have the full text of the PROTECT IP Act, which I've embedded below as well. Let's break it down into the good, the bad and the horribly ugly.

I find it hard to care about the nitty gritty details of the approach. The bill is still fundamentally about taking down DNS names on the mere allegation of infringement, and that seems like a very bad idea to me.

Sunday, August 7, 2011

Brad on foreign CEOs

Brad Templeton describes a good way to explain the current distribution of nationalities in the tech field:
I gave him one suggestion, inspired by something I saw long ago at a high-roller CEO conference for the PC industry. In a room of 400 top executives and founders of PC and internet startups, it was asked that all who were born outside the USA stand up. A considerable majority (including myself) stood. I wished every opponent of broader immigration could see that.

I agree with Brad that, at least in the software field, we benefit tremendously from foreign workers.

I suspect most observers would agree if they thought about it. You don't have to look just at executives. Walk into any software shop and you will see a large fraction of the workers who were born abroad. Futhermore, talk to any software developer about the job market, and it's not like they are hurting for work. If we sent all the foreign workers home, it's not like we'd have more American programmers at work. We'd simply have less total computer work being done.

It seems that software is getting swept up in laws and regulation that were developed with other fields in mind. If you follow the political discussions on the topic, it is always about lower-skilled jobs in fields where it is tough to start a new company. This depiction simply does not match computer science.

It's the same sort of thing that happens with research oversight. Research oversight is driven by the needs of medical research, and it just doesn't match the ethical issues that computer researchers face.

Inducing infringement alive and well

Mitch Golden writes, in a good analysis of the legal state of LimeWire's file-sharing software, that inducing infringement was a key part of the October 2010 court case against them:
Interestingly, the court largely sidestepped the technical issues as to whether Gnutella itself had non-infringing uses or not, or whether a Gnutella client can be legally distributed. The court's decision instead turned on evidence submitted by the plaintiffs that LimeWire intended to facilitate filesharing.

I continue to feel that we are much better off leaving content carriers alone. Trying to make content carriers into IP policemen is not going to work out well.