Data Source Handbook: A Guide to Public Data

data source handbookThe Data Source Handbook by Pete Warden provides a concise and handy guide to some of the main sources of public data accessible on the web today. It’s a very short book of 40 pages. This in itself does not stand against the book. These sources are rapidly changing and compiling and committing an exhaustive survey to a printed volume would damn it to almost instant obsolescence. It would also prevent any treatment of individual datasources in any useful detail.


As it is, Warden is able to pick a select few and identify strengths and available APIs in a useful fashion. He organises the type of sources into logical categories and identifies some key sources for each:

  • Websites
  • People
  • Search terms
  • Locations
  • Companies
  • IP Addresses
  • Books, films, movies, music and products

He selects the key open providers of data in these areas and systematically shows how to access the information along with simple programmatic instructions. In a volume of such limited length you would not expect to find extensive instructions or discussion – and you won’t. What you have is a very concise survey identifying the key players and giving a nutshell indication of what you can use the datasources for.
This is a useful and quick reference for anyone routinely accessing, compiling, aggregating or augmenting their own datasets. Although very few of the sources identified would be new to most people in the data analysis space, this does provide a useful compilation and also handy concise reminder of how one might augment a limited dataset quickly in an automated fashion.
This is an easily accessible volume, well organized and with the only major failing that it will be become dated in a published form. However, as an eBook it is ideal and I would recommend it to anyone new to the area of adata visualisation looking for some great sample data to access, or to the more seasoned data traveller looking to keep their familiarity with the wide variety of available data current.

I review for the O'Reilly Blogger Review Program

Leave a Reply