[PLUG] Parsing HTML with Perl

Paul Heinlein heinlein at madboa.com
Wed Jun 30 15:20:03 UTC 2004


On Wed, 30 Jun 2004, Shahms King wrote:

> Have you tried using HTML::Parser? It should be included with 
> Fedora/RedHat and will probably work a lot better than just using 
> regular expressions.

This is the best easy solution.

An alternative would be to run the HTML through tidy[1] and use a 
full-fledged XML parser to grab your data, which would probably allow 
you a bit more flexibility than HTML::Parser.

--Paul Heinlein <heinlein at madboa.com>

[1] http://tidy.sourceforge.net/




More information about the PLUG mailing list