Data Source Handbook: A Guide to Public Data

data source handbookThe Data Source Hand­book by Pete Warden provides a con­cise and handy guide to some of the main sources of pub­lic data access­ible on the web today. It’s a very short book of 40 pages. This in itself does not stand against the book. These sources are rap­idly chan­ging and com­pil­ing and com­mit­ting an exhaust­ive sur­vey to a prin­ted volume would damn it to almost instant obsol­es­cence. It would also pre­vent any treat­ment of indi­vidual data­sources in any use­ful detail.


As it is, Warden is able to pick a select few and identify strengths and avail­able APIs in a use­ful fash­ion. He organ­ises the type of sources into logical cat­egor­ies and iden­ti­fies some key sources for each:

  • Web­sites
  • People
  • Search terms
  • Loc­a­tions
  • Com­pan­ies
  • IP Addresses
  • Books, films, movies, music and products

He selects the key open pro­viders of data in these areas and sys­tem­at­ic­ally shows how to access the inform­a­tion along with simple pro­gram­matic instruc­tions. In a volume of such lim­ited length you would not expect to find extens­ive instruc­tions or dis­cus­sion — and you won’t. What you have is a very con­cise sur­vey identi­fy­ing the key play­ers and giv­ing a nut­shell indic­a­tion of what you can use the data­sources for.
This is a use­ful and quick ref­er­ence for any­one routinely access­ing, com­pil­ing, aggreg­at­ing or aug­ment­ing their own data­sets. Although very few of the sources iden­ti­fied would be new to most people in the data ana­lysis space, this does provide a use­ful com­pil­a­tion and also handy con­cise reminder of how one might aug­ment a lim­ited data­set quickly in an auto­mated fash­ion.
This is an eas­ily access­ible volume, well organ­ized and with the only major fail­ing that it will be become dated in a pub­lished form. How­ever, as an eBook it is ideal and I would recom­mend it to any­one new to the area of adata visu­al­isa­tion look­ing for some great sample data to access, or to the more seasoned data trav­el­ler look­ing to keep their famili­ar­ity with the wide vari­ety of avail­able data current.

