Data Analysis with Open Source Tools

dataAnalysis.gif Data Analysis with Open Source Tools by Philipp K Janert is a simply superb, solid and exhaustive synthesis of instruction, workshops and hands-on exercises designed for those serious about conducting professional data analysis. This is not a lightweight undertaking. This is a serious get-down-to-it and do-it-right kind of manual. The author (as has been mentioned elsewhere) is passionate about his subject and it shows. He knows how to convey the most complex concepts in an approachable and effective way.

The title of the book suggests a very hands-on approach and suggests possibly that it is more bout applying known techniques utilising specific tools. The book delivers on this through a series of ‘workshops’ that are attached to each chapter. These sections lead you through specific application of specific data analysis tasks using particular packages (primarily Python and R) in a very useful manner. But don’t be deceived, the book is far more than these workshops and to its credit provides extensive grounding in the theory and principles of data analysis itself so as to ground you in the application.

Whether you are approach first attempts at data analysis and feel unsure about the practise, or you have been using simple tools such as excel to carry out your analysis or are looking to hone your techniques by exploring the power of R is moving to presentation of your analytical findings, you will find this volume a superb choice. Spending time working through the workshops will build a firm foundation of capability to extend the theoretical and additionally provides a superb grounding for approaching courses dealing more directly with the tools themselves, such as the volumes on R that I have explored previously.

Janert uses the progression of presenting the data (to find the patterns) –> modelling the data (to explore) –> mining the data computationally (to understand the data) –> and finally applying the data (to actually use it in real world instances) through the book. He uses examples liberally to maintain engagement and stylistically asks the reader questions and makes reminders constantly to keep you moving through what I remind is pretty heavy material. This is not a book to try in one setting, nor however, is it a reference manual. It is really a course that should be approached over a suitable length of time.

One of the questions the author poses early on is ‘what’s with the math?’ and he assures you that if you do find this intimidating its worth the time to familiarise and gain some comfort with them as they are necessary should you really want to carry out effective data analysis. He’s right and the way the book is structured you do need to take this advice on board. This will limit the book to people that are willing to commit if uncomfortable with the concepts. But as I mentioned above this is a serious book and the author seems to have made a commitment himself to deliver the material and asks for a bit of reciprocation. Now, I (possibly less than fondly) recall much of this from the distant haze of undergraduate statistics or mathematical economics, but there is a collection of great aids in the appendicies to the volume to help you out and these are well presented. I really could have used these 25 years ago when I was struggling through these courses.

All in all, this is a very good book. It actually does more than it promises and delivers a comprehensive and effective course in data analysis with superb hands-on exercises to drive home the learning.

I review for the O'Reilly Blogger Review Program

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.