Hubbub is an HTML5 compliant parsing library, written in C. It was developed as part of the NetSurf project and is available for use by other software under the MIT licence.
The HTML5 specification defines a parsing algorithm, based on the behaviour of mainstream browsers, which provides instructions for how to parse all markup, both valid and invalid. As a result, Hubbub parses web content well.
If you are looking for an HTML5 parser in Python or Ruby, you may wish to look at html5lib.
You can browse the source code via the online interface. Alternatively, you can check it out with Git:
$ git clone git://git.netsurf-browser.org/libhubbub.git
Hubbub is licensed under the MIT Licence.
If you would like to help develop Hubbub, or have questions about the library, please join the NetSurf developer mailing list.