randomosity

strikingly random thoughts and 'maximum data existentialisation'

  • Research
    • Conference Papers
    • Datasets
      • 1871 Populations of Ontario
      • Breweries and Distilleries in Ontario, 1914–15
      • Canadian Federal Railway Charters
      • 1871 Tavernkeepers in Huron County
    • Maps
      • 1891 Ontario Census Divisions
      • Admissions from Gaols to Hamilton Asylum
      • Asylums in New Zealand, 1900
      • Asylums in Scotland, 1797–1897
      • Asylums in the Australian Colonies, 1860
      • Asylums in Western Canada, 1911
      • Asylums of England and Wales, 1765–1845
      • Asylums of England and Wales, 1845–1860
      • Asylums of Ireland, 1814–1869
      • Discharge Rate from Hamilton Asylum
      • Duration of Stay for First Admissions to Hamilton Asylum
      • First Admissions to Hamilton Asylum by County
      • Rate of Readmission to Hamilton Asylum
      • Study Context
      • 1841 Settlers Map of Ontario
      • 1851 Essex County by Religion Stated in Census
      • 1848 Circulation Map of Paris
      • Modern Circulation Map of Paris
      • Irish and Indian-Trained Psychiatrists in Canada
      • Asylums in the United States, 1850
    • Other Research Stuff
      • Sir Frank Smith
    • Visual Support Materials
      • 1851 — 1911 Essex County Census District Evolution
      • Guelph Historical GIS
      • Occupational Comparison 1867–2007
      • Pajek Apple Taxonomy
      • Napoleonic Timeline
      • 1878 Guelph Mass Model
  • Gallery
  • Archives
  • About
    • Contact Me
    • Contact Me
    • Curriculum Vitae
    • Ligit Results
    • Movies
    • Stuff
    • Stats
    • Collophon
    • Delicious Tags

Data Analysis with Open Source Tools

Posted by shawnday on 8 January 2011
Posted in: Technology, Visualization. Tagged: O'Reilly, Review, Statistics. Leave a Comment

dataAnalysis.gif Data Ana­lysis with Open Source Tools by Phil­ipp K Jan­ert is a simply superb, solid and exhaust­ive syn­thesis of instruc­tion, work­shops and hands-on exer­cises designed for those ser­i­ous about con­duct­ing pro­fes­sional data ana­lysis. This is not a light­weight under­tak­ing. This is a ser­i­ous get-down-to-it and do-it-right kind of manual. The author (as has been men­tioned else­where) is pas­sion­ate about his sub­ject and it shows. He knows how to con­vey the most com­plex con­cepts in an approach­able and effect­ive way.

The title of the book sug­gests a very hands-on approach and sug­gests pos­sibly that it is more bout apply­ing known tech­niques util­ising spe­cific tools. The book deliv­ers on this through a series of ‘work­shops’ that are attached to each chapter. These sec­tions lead you through spe­cific applic­a­tion of spe­cific data ana­lysis tasks using par­tic­u­lar pack­ages (primar­ily Python and R) in a very use­ful man­ner. But don’t be deceived, the book is far more than these work­shops and to its credit provides extens­ive ground­ing in the the­ory and prin­ciples of data ana­lysis itself so as to ground you in the application.

Whether you are approach first attempts at data ana­lysis and feel unsure about the prac­tise, or you have been using simple tools such as excel to carry out your ana­lysis or are look­ing to hone your tech­niques by explor­ing the power of R is mov­ing to present­a­tion of your ana­lyt­ical find­ings, you will find this volume a superb choice. Spend­ing time work­ing through the work­shops will build a firm found­a­tion of cap­ab­il­ity to extend the the­or­et­ical and addi­tion­ally provides a superb ground­ing for approach­ing courses deal­ing more dir­ectly with the tools them­selves, such as the volumes on R that I have explored previously.

Jan­ert uses the pro­gres­sion of present­ing the data (to find the pat­terns) –> mod­el­ling the data (to explore) –> min­ing the data com­pu­ta­tion­ally (to under­stand the data) –> and finally apply­ing the data (to actu­ally use it in real world instances) through the book. He uses examples lib­er­ally to main­tain engage­ment and styl­ist­ic­ally asks the reader ques­tions and makes remind­ers con­stantly to keep you mov­ing through what I remind is pretty heavy mater­ial. This is not a book to try in one set­ting, nor how­ever, is it a ref­er­ence manual. It is really a course that should be approached over a suit­able length of time.

One of the ques­tions the author poses early on is ‘what’s with the math?’ and he assures you that if you do find this intim­id­at­ing its worth the time to famil­i­ar­ise and gain some com­fort with them as they are neces­sary should you really want to carry out effect­ive data ana­lysis. He’s right and the way the book is struc­tured you do need to take this advice on board. This will limit the book to people that are will­ing to com­mit if uncom­fort­able with the con­cepts. But as I men­tioned above this is a ser­i­ous book and the author seems to have made a com­mit­ment him­self to deliver the mater­ial and asks for a bit of recip­roc­a­tion. Now, I (pos­sibly less than fondly) recall much of this from the dis­tant haze of under­gradu­ate stat­ist­ics or math­em­at­ical eco­nom­ics, but there is a col­lec­tion of great aids in the appen­di­cies to the volume to help you out and these are well presen­ted. I really could have used these 25 years ago when I was strug­gling through these courses.

All in all, this is a very good book. It actu­ally does more than it prom­ises and deliv­ers a com­pre­hens­ive and effect­ive course in data ana­lysis with superb hands-on exer­cises to drive home the learning.

I review for the O'Reilly Blogger Review Program


Share this:

  • Print
  • LinkedIn
  • Twit­ter
  • Google +1
  • Tumblr

Posts navigation

← The Art of Community by Jono Bacon
App Savvy by Ken Yarmosh →
Logging In...
Cancel Reply
  • about.me

    Shawn Day

    Shawn Day

    Shawn Day is an entrepreneur, digital historian, economist and blender of the aesthetic and the informative. Raised in Canada, Shawn now works with the Digital Humanities Observatory, a project of the Royal Irish Academy, to leverage Ireland's participation in the emerging practise of digital humanities scholarship. He lectures in Social Computing and the Philosophy of Technology.

    His own research explores the social and economic circumstances of the nineteenth century retail liquor trade and it's impact on family. He applies digital, spatial and social network analysis to the study of the relationships between credit, respectability, and order in the Victorian community. Recent articles have examined the social dimensions of the Victorian public mental hospital using GIS and statistical modeling tools. Shawn has been involved in a number of successful and innovative digital humanities projects throughout Canada. Most recently he has worked with large manuscript census databases in the 1871/1891 census project (University of Guelph). He is a team member of the national TAPoR text analysis portal project, the Canadian Network for Economic History and the Network for Canadian History and the Environment (NiCHE - UWO).

    Shawn has blended his background in management economics with an entrepreneurial ethos to found a number of successful software development ventures in Canada and find a means to leverage this in the academic arena.

  • Twitter Updates

    • @omurphy16 @mikecosgrave Past Mallow…we are chugging along 17 hours ago
    • @mikecosgrave @omurphy16 Any vector once I reach building there? 17 hours ago
    • @Johnm2501 Sounds like the single deer has had a knock on effect. We are making up time now tho. 18 hours ago
    • @omurphy16 National news…reminds me of repeating footage of named kitten stuck in middle of motorway on news at 6 and 9 ;-) 18 hours ago
    • @omurphy16 @mikecosgrave Train ahead hit deer - was hoping for share of spoils - little behind schedule but moving again c u soon 18 hours ago
  • Flickr

    			shawnday posted a photo:				shawnday posted a photo:				shawnday posted a photo:				shawnday posted a photo:				shawnday posted a photo:
    Used tag: blog
  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  • Pages

    • About
      • Collophon
      • Contact Me
      • Contact Me
      • Curriculum Vitae
      • Delicious Tags
      • Ligit Results
      • Movies
      • Stats
      • Stuff
    • Archives
    • Gallery
    • Research
      • Conference Papers
      • Datasets
        • 1871 Populations of Ontario
        • 1871 Tavernkeepers in Huron County
        • Breweries and Distilleries in Ontario, 1914–15
        • Canadian Federal Railway Charters
      • Maps
        • 1841 Settlers Map of Ontario
        • 1848 Circulation Map of Paris
        • 1851 Essex County by Religion Stated in Census
        • 1891 Ontario Census Divisions
        • Admissions from Gaols to Hamilton Asylum
        • Asylums in New Zealand, 1900
        • Asylums in Scotland, 1797–1897
        • Asylums in the Australian Colonies, 1860
        • Asylums in the United States, 1850
        • Asylums in Western Canada, 1911
        • Asylums of England and Wales, 1765–1845
        • Asylums of England and Wales, 1845–1860
        • Asylums of Ireland, 1814–1869
        • Discharge Rate from Hamilton Asylum
        • Duration of Stay for First Admissions to Hamilton Asylum
        • First Admissions to Hamilton Asylum by County
        • Irish and Indian-Trained Psychiatrists in Canada
        • Modern Circulation Map of Paris
        • Rate of Readmission to Hamilton Asylum
        • Study Context
      • Other Research Stuff
        • Sir Frank Smith
      • Visual Support Materials
        • 1851 — 1911 Essex County Census District Evolution
        • 1878 Guelph Mass Model
        • Guelph Historical GIS
        • Napoleonic Timeline
        • Occupational Comparison 1867–2007
        • Pajek Apple Taxonomy
Proudly powered by WordPress Theme: Parament by Automattic.