Python: html5lib parser example

Really easy to use:

import html5lib
file = open("web.html")
parser = html5lib.HTMLParser()
doc = parser.parse(file)
Advertisements

Clean up your Web pages with HTML TIDY

TIDY

When editing HTML it’s easy to make mistakes. Wouldn’t it be nice if there was a simple way to fix these mistakes automatically and tidy up sloppy editing into nicely layed out markup? Well now there is! Dave Raggett’s HTML TIDY is a free utility for doing just that. It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.

Tidy is able to fix up a wide range of problems and to bring to your attention things that you need to work on yourself. Each item found is listed with the line number and column so that you can see where the problem lies in your markup. Tidy won’t generate a cleaned up version when there are problems that it can’t be sure of how to handle. These are logged as “errors” rather than “warnings”.

Dave Raggett has now passed the baton for maintaining Tidy to a group of volunteers working together as part of the open source community at Source Forge. The source code continues to be available under an open source license, and you are encouraged to pass on bug reports and enhancement requests at http://tidy.sourceforge.net.

SOURCEhttp://www.w3.org/People/Raggett/tidy/