Big Data Glossary by Pete Warden

bigData.gifThe *Big* Data Gloss­ary is actu­ally a rel­at­ively *short* book, best enjoyed as an eBook in my estim­a­tion. This volume is sim­ilar to a num­ber of recent releases from O’Reilly that have moved from being deep and com­pre­hens­ive to provid­ing a higher-level taste-test over­view from a more con­cep­tual stand­point. In this instance, the Big Data Gloss­ary by Pete Warden could also be described as an annot­ated bib­li­o­graphy of the vari­ety of tools and plat­forms recently emerged to work with linked data or large and rich datasets.

This gloss­ary moved through the basic ser­vices and com­pon­ents that could be employed to cre­ate a com­pre­hens­ive research envir­on­ment to con­duct data-mining or to cre­ate a deep visu­al­isa­tion for ana­lysis. The con­cise volume is designed to provide a con­text for fur­ther explor­a­tion of the vari­ous tools and ser­vices defined and offers use­ful links for such explor­a­tion. The anti­cip­ated audi­ence for this volume might be an aca­demic researcher new to the areas men­tioned or a developer trans­ition­ing from a more tra­di­tional data back­ground. Although brief the volume does much to draw together a qual­i­fied list of ser­vices and accom­plished much by identi­fy­ing the stronger cur­rent play­ers and sum­mar­iz­ing the strengths and weak­nesses of each. In this regard you might con­sider this book more of a tech­nical industry sur­vey. It is a valu­able wee tome for get­ting up to speed quickly with the play­ers and know­ing how you might judge ser­vices with in a par­tic­u­lar cat­egory as diverse as on-demand stor­age, data visu­al­isa­tion or nat­ural lan­guage pro­cessing. Much like Design­ing Data VIsu­al­isa­tions which I pre­vi­ously reviewed, this volume too could fit very nicely into an intro­duct­ory syl­labus and provide and excel­lent guide for an intro­duc­tion to data pro­cessing or digital research methodologies.

I have no cri­ti­cisms of this book. It’s short and con­cise and although you’d cer­tainly like more info, it does what it bills itself to do. And it does it well. It is the sort of book again that lends itself to an elec­tronic format as the con­tent by defin­i­tion is con­stantly chan­ging and evolving. If any­thing, the ways in which the vari­ous ser­vices are described tex­tu­ally prob­ably could be accom­plished in a tab­u­lar format which would facil­it­ate bet­ter cross-service eval­u­ation of fea­tures, strengths and weak­nesses, but that’s what wiki­pe­dia is for. The descrip­tions here are brief enough that you will read through at least a chapter as whole (if not the entire volume) and come away with an informed under­stand­ing of a par­tic­u­lar space.

I would recom­mend this book to any­one need­ing to quickly bring them­selves up to speed on the avail­able ser­vices in a spe­cific area of data pro­cessing, those wish­ing to keep cur­rent with emer­ging play­ers or those that are facing devel­op­ing require­ments doc­u­ments that may need to provide def­in­ite tech­no­lo­gical ref­er­ences (or for that mat­ter want to speak in real world terms about con­cep­tual solutions).


Leave a Reply