Perl & LWP

Perl & LWP by Sean M. Burke

Perl & LWP

Binding:
Paperback
Number of Pages:
264
ISBN:
0596001789
Product Group:
book
Publisher:
O'Reilly Media
Publication Date:
June 20, 2002
BooksForGeeks.com ID:
1379

Perl and LWP explains how to write programs that browse the Web, using the excellent Library for the World Wide Web or LWP. It is aimed at developers who already know both Perl and HTML, although you don't need to be an expert in either.

The fascination of this topic is that it makes you see the Web in a different way, not as a set of pages for users to browse, but as a huge database for your programs to explore. The most robust technique for querying Web sites programmatically is through XML Web Services, but this approach is in its infancy. LWP takes a different route, called screen-scraping. In essence, your Perl code pretends to be a browser and grabs HTML for processing. Using LWP you could write a command-line program to search your favourite auction site, fetch news headlines, or check multiple retail sites for the best prices. As the author acknowledges, the problem with screen-scraping is its brittleness: if the target Web site adopts a new look, it breaks your code. There are also interesting fair usage issues. Even so, it's a powerful technique with many possible applications. This clear and concise guide comes complete with typically terse Perl code examples. Topics include LWP basics, posting form data, processing results with regular expressions, using trees to process HTML, imitating different browser types, and supporting cookies programmatically. An appendix offers handy information like HTTP status codes, character tables, and MIME types. LWP is large, but while this title does not attempt to cover all the modules, it does provide all you need to start coding your own Web-mining programs.--Tim Anderson

Reviews for Perl & LWP

  1. Fabulous book!

    Rated 5 out of 5 stars, August 12st, 2002

    This book is a comprehensive and authoritative guide to web automation. It reads as both a gentle tutorial and a well organized reference. Basic HTTP operation, regexp HTML parsing, tokenizing, cookie authentication, form handling, and robot spidering are covered extensively in numerous case studies and practical examples.

    Naturally, I was impressed by the simple, consistent treatment of examples: inspect source and find the interesting bits, code things up and then enhance to suit. :-)

    A particularly satisfying thing to me is the sane way of working, that the author assumes. So many people seem to just bungle their way through web programming while ignoring basics like the robots.txt file. This book helps to prevent this.

    One would think that only a thick tome would be sufficient to cover such vast territory, but the author (who is an active LWP module developer) does a fabulous job covering this extensive subject matter.

    I recommend this book both to anyone starting out on their way to working with the underside of the web and to accomplished professionals in need of a full reference manual.

  2. Does exactly what it says on the tin!

    Rated 5 out of 5 stars, August 12th, 2002

    A satisfyingly short (242p) book that covers its subject perfectly. The examples are well written and explained and are ideal for using as a starting point for your own work. Within 15 minutes I had written a script to fetch pages of football results from a web site, process the data and produce files for uploading to my database. Previous I did this by downloading the html and editing it by hand - automating it will save me about 30 minutes a week.

    Of course it's an O'Reilly title so the attractive layout, typography and attention to detail goes without saying. I would heartily recommend this book to anyone who wants to automate the extraction of data from the web - follow Sean's guidance and you'll be productive sooner than you thought possible.

  3. Does exactly what it says on the tin

    Rated 5 out of 5 stars, August 12th, 2002

    A satisfyingly short (242p) book that covers its subject perfectly. The examples are well written and explained and are ideal for using as a starting point for your own work. Within 15 minutes I had written a script to fetch pages of football results from a web site, process the data and produce files for uploading to my database. Previous I did this by downloading the html and editing it by hand - automating it will save me about 30 minutes a week.

    Of course it's an O'Reilly title so the attractive layout, typography and attention to detail goes without saying. I would heartily recommend this book to anyone who wants to automate the extraction of data from the web - follow Sean's guidance and you'll be productive sooner than you thought possible.

Our Network

BooksForGeeks.com is a participant in the Amazon Europe S.à r.l. Associates Programme, an affiliate advertising programme designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.co.uk