Skip to main content

John Resig - Pure JavaScript HTML Parser

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

Bookmark History

Saved by 9 people (1 private), first by anonymouse user on 2008-05-05


Public Sticky notes

I've been toying with the ability to port env.js to other platforms (Spidermonkey derivatives and the ECMAScript 4 Reference Implementation) and if I were to do so I would need an HTML parser. Because of this fact it became easiest to just write an HTML parser in pure JavaScript.

I did some digging to see what people had previously built, but the landscape was pretty bleak. The only one that I could find was one made by Erik Arvidsson - a simple SAX-style HTML parser. Considering that this contained only the most basic parsing - and none of the actual, complicated, HTML logic there was still a lot of work left to be done.

(I also contemplated porting the HTML 5 parser, wholesale, but that seemed like a herculean effort.)

However, the result is one that I'm quite pleased with. It won't match the compliance of html5lib, nor the speed of a pure XML parser, but it's able to get the job done with little fuss - while still being highly portable.

Highlighted by smoody