Big Data Glossary by Pete Warden

The *Big* Data Glossary is actually a relatively *short* book, best enjoyed as an eBook in my estimation. This volume is similar to a number of recent releases from O’Reilly that have moved from being deep and comprehensive to providing a higher-level taste-test overview from a more conceptual standpoint. In this instance, the Big Data Glossary by Pete Warden could also be described as an annotated bibliography of the variety of tools and platforms recently emerged to work with linked data or large and rich datasets.

This glossary moved through the basic services and components that could be employed to create a comprehensive research environment to conduct data-mining or to create a deep visualisation for analysis. The concise volume is designed to provide a context for further exploration of the various tools and services defined and offers useful links for such exploration. The anticipated audience for this volume might be an academic researcher new to the areas mentioned or a developer transitioning from a more traditional data background. Although brief the volume does much to draw together a qualified list of services and accomplished much by identifying the stronger current players and summarizing the strengths and weaknesses of each. In this regard, you might consider this book more of a technical industry survey. It is a valuable wee tome for getting up to speed quickly with the players and knowing how you might judge services within a particular category as diverse as on-demand storage, data visualisation or natural language processing. Much like Designing Data VIsualisations which I previously reviewed, this volume too could fit very nicely into an introductory syllabus and provide an excellent guide for an introduction to data processing or digital research methodologies.

I have no criticisms of this book. It’s short and concise and although you’d certainly like more info, it does what it bills itself to do. And it does it well. It is the sort of book again that lends itself to an electronic format as the content by definition is constantly changing and evolving. If anything, the ways in which the various services are described textually probably could be accomplished in a tabular format which would facilitate the better cross-service evaluation of features, strengths and weaknesses, but that’s what Wikipedia is for. The descriptions here are brief enough that you will read through at least a chapter as a whole (if not the entire volume) and come away with an informed understanding of a particular space.

I would recommend this book to anyone needing to quickly bring themselves up to speed on the available services in a specific area of data processing, those wishing to keep current with emerging players or those that are facing developing requirements documents that may need to provide definite technological references (or for that matter want to speak in real world terms about conceptual solutions).

Leave a ReplyCancel Reply