Data Munging with Perl: Techniques for Data Recognition, Parsing, Transformation and Filtering

Data Munging with Perl: Techniques for Data Recognition, Parsing, Transformation and Filtering by David Cross

Data Munging with Perl: Techniques for Data Recognition, Parsing, Transformation and Filtering

Binding:
Paperback
Number of Pages:
300
ISBN:
1930110006
Product Group:
book
Publisher:
Manning Publications
Publication Date:
Feb. 14, 2001
BooksForGeeks.com ID:
1387

Reviews for Data Munging with Perl: Techniques for Data Recognition, Parsing, Transformation and Filtering

  1. No-nonsense resource for meat and potatoes Perl scripting

    Rated 4 out of 5 stars, July 12st, 2007

    The quintessential Perl activity is data processing, particularly in a Unix environment, where output is piped into a script from some other program, transformed, and spat out again. Many people's first encounter with Perl will probably be in this task. David Cross's book shows how to do this with the minimum of fuss and the maximum of flexibility. It's not a Perl tutorial however, so you will need some basic knowledge of Perl, having read The Llama is enough. There is an appendix of 'essential Perl' to refresh your memory if you're a bit rusty.

    The book begins by revising some of those basic Perl practices that come in handy for scripting, e.g. command line options, regular expressions and sorting. The second part of the book deals with parsing fairly simple data: traditional fixed-width record data (e.g. the column-based stuff that you often find as the output of old Fortran and C programs), unstructured data (e.g. doing word counts on text files), and formats such as CSV, PNG and MP3. This is the strongest section of the book, and contains lots of useful hands-on information.

    The third part of the book deals with more modern forms of data files, in the shape of XML. Parsing HTML also gets a chapter to itself, after the author usefully demonstrates the limitations of any simple solution (e.g. using regexes), which provides pretty strong evidence in favour of the standard 'don't try it yourself, use a CPAN module' argument. The XML chapter itself covers the XML::Parser module in reasonable detail. However, there are now many more XML parsers in Perl out there, and XML::Parser is probably no longer the best solution (Grant McClean's Perl XML FAQ on the net has a good overview of the options). Excluding the seemingly obligatory 'here's a bunch of books and websites to learn more' chapter, the last proper chapter is on parsing, and the Rec::Descent module, and it's a very good gentle introduction.

    If you're not working in a command line environment, there's not a whole lot here you're going to need. Equally, if you've been doing this sort of thing for a while, there's not much here that will be new to you, not all the subjects are explored in any great depth. And some of it (particularly the XML chapter) is a bit outdated and superficial, so I would knock off a star from my rating if you're more interested in the XML/HTML chapters.

    But for the simpler tasks, e.g. parsing column based data, this is recommended. You're shown all the handy tricks you need such as piping, taking input from standard in as well as files, slurping paragraphs etc. My 4-star rating applies if this sounds like what you need: it's a clear, short and to-the-point book, which is definitely taking with you on your first journey into data munging.
  2. Very good 2-3 months after "Learning PERL"

    Rated 4 out of 5 stars, February 12th, 2002

    This is an excellent book if you have recently started PERL, have got beyond "Learning PERL" but have not got into some of the more advanced books. However, if you have been studying PERL for longer and have read around the subject then much of this book will be old news, and some of the topics are very much "tasters", not imparting sufficient information to use in anger.
  3. Solid Gold - Easy Parsing

    Rated 5 out of 5 stars, March 12th, 2001

    This is a juicy, if slim book (283 pages incl index). It is worth its weight in gold, if like me, you are not a perl porter, but inhabit the fat end of the perl expertise pyramid. The author munges for a living, and it shows in a non-academic, on-the-money, practical book on converting, filtering and parsing data. This book adds value, even if like me, you have all the O'Reilly Perl books because (1) it is really easy to understand, (2) it gives a valuable conceptual overview, (3) it gives trench-proven tips, (4) and best of all, it shows you how to do it.

    I too found the chapter on Parse::RecDecent easy to understand having struggled with Damian's TPJ article. But this book to get on your way to being an expert munger.

  4. High quality slim volume

    Rated 4 out of 5 stars, March 12th, 2001

    This book is well written, and quite informative. But it should not be read in isolation, as it does gloss over quite a lot so that it remains focused on the key topic "data munging".

    If I have any critisism it's with the physical properties of the book, I'm so use to ORA books for Perl, that I find the different paper quality and fonts jarring.

  5. Worth every cent!

    Rated 5 out of 5 stars, February 12th, 2001

    I bought this book solely for the XML parsing section, it explained in four lines what other documentation couldn't in 100 lines. It turns out the other chapters are just as brilliant!

    If you do _any_ data manipulation with Perl then GET this book, it explains what Perl data structures(or modules) to use with what data, and then how to use that data structure...

    The author writes and thinks like a programmer, so programmers can ultimately understand the topics/concepts and not just cut-paste someone elses code.

    I also recommend 'OO Perl' by Damian Conway (same publisher).

    These two Manning books are the only other Non-O'Reilly books I own.

Our Network

BooksForGeeks.com is a participant in the Amazon Europe S.à r.l. Associates Programme, an affiliate advertising programme designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.co.uk