Archives for category: Text Analysis

data source handbookThe Data Source Hand­book by Pete Warden provides a con­cise and handy guide to some of the main sources of pub­lic data access­ible on the web today. It’s a very short book of 40 pages. This in itself does not stand against the book. These sources are rap­idly chan­ging and com­pil­ing and com­mit­ting an exhaust­ive sur­vey to a prin­ted volume would damn it to almost instant obsol­es­cence. It would also pre­vent any treat­ment of indi­vidual data­sources in any use­ful detail.

 

Read the rest of this entry »

mining the social web.gifIn Min­ing the Social Web Mat­thew A Rus­sell offers to instruct in identi­fy­ing social con­nec­tions, trends in dis­cus­sion and loc­a­tions by tap­ping into social media data. He suc­ceeds in spades. This fast-paced and rich hand­book jumps right into the fray and provides an imme­di­ate and use­ful exer­cise in access­ing the Twit­ter API using python and doing a very quick visu­al­isa­tion of trend­ing sub­jects. I was hooked and greed­ily and imme­di­ately con­sumed a few more of his les­sons. His approach is to go dir­ect to real world applic­a­tions of why you’d want to mine data from social media such as Twit­ter, Buzz, Face­book and util­ise other freely avail­able tools such as Google Maps to look for pat­terns and present solid research findings.

Read the rest of this entry »

The folks at Many Eyes recently intro­duced their new com­par­ison cloud tool. Basic­ally, it lets you visu­al­ise two frag­ments of text dis­play­ing word fre­quency for each in the same cloud. It’s an inter­est­ing addi­tion to the more famil­iar word cloud. cloud3.jpg Using a stand­ard word cloud you get a mat­rix of words with rel­at­ive size, weight or col­our high­light­ing fre­quency in a selec­ted text. This quickly allows you to visu­ally per­ceive an author or speaker’s emphasis on a par­tic­u­lar theme or style of writ­ing or speak­ing. With Many Eyes hybrid tool, words which occur in both text are abut­ted. You can now visu­ally com­pare two texts from the same author for sim­ilar empah­sis or quickly determ­ine a dif­fer­ence between texts. In the example presen­ted at Many Eyes, they com­pare the US pres­id­en­tial State of the Union addresses from 2002 and 2003. In this example they note the less fre­quent men­tion of Afgh­anistan and the increase in men­tion of Sad­dam. Whether this allows one to con­clude a change in policy or not, it does demon­strate the use of the tool for pro­vok­ing ques­tions for fur­ther exploration.

On Sat­urday, the Ontario gov­ern­ment offi­cially announced how much fund­ing each uni­ver­sity in Ontario is to receive for main­ten­ance and renewal of facil­it­ies. I just happened to see announce­ments from a few insti­tu­tions appear sim­ul­tan­eously in my RSS reader and was struck by the rather dif­fer­ent ways in which they presen­ted this news.

Read the rest of this entry »

shot_timeline.jpgA year ago I wrote a recipe for the TAPoR pro­ject to demon­strate a way for his­tor­i­ans to util­ize text ana­lysis tools to plumb his­tor­ical data from Google. In the recipe a user aggreg­ated search res­ults from Google and used the TAPoR DateFinder tool to rap­idly con­struct a chro­no­logy. This rather basic oper­a­tion has now been auto­mated by the folks at Google labs. Now, with the simple addi­tion of two words in your search request you can choose to view the famil­iar text search res­ults in two excit­ing addi­tional con­texts, tem­poral and spa­tial. The new Google Timeline and Map views is a power­ful but simple tool for his­tor­i­ans and oth­ers as well. Read the rest of this entry »

I admit it…I had no idea what a con­tronym was until I saw this fas­cin­at­ing list of examples. A con­tronym is a word which its own ant­onym. Get it. Maybe not. The word cus­tom for example means both usual and also spe­cial . I like that one! Check out this web page where there is a won­der­ful smat­ter­ing of words that meet this cri­teria — ordin­ary every­day words which we use and pos­sibly have never thought twice about. Eng­lish can be such a funny language.

This is a just really cool. Knutz.net Thomas Broome has a series of pic­tures drawn entirely with words…really. Interior land­scapes com­posed of chan­deliers drawn only using the word chan­delier. Won­der­ful little details to appreciate.

keynotes.jpg
Clever lads have run the CES address of Bill Gates and the Mac­world Key­note by Steve Jobs through a vari­ety of text ana­lysis tools to get an idea of why one has greater impact than the other. The art­icle demon­strates that there is a huge dif­fer­ence in the com­plex­ity of the mes­sage. Jobs deliv­ers short, eas­ily com­pre­hen­ded sen­tences, where Gates tends to be using longer sen­tences, with more com­plex lan­guage. The word clouds gen­er­ated from the speech are not that dif­fer­ent in terms of focus. Both fea­tured most fre­quent ref­er­ences to the products being fea­tured. Inter­est­ingly this con­tras­ted with Michael Dell’s CES present­a­tion which was seem­ingly used much more ambigu­ous lan­guage with less dir­ect ref­er­ence to par­tic­u­lar products. There’s also a slider-based ver­sion linked to the art­icle that offers an altern­at­ive way to view the clouds. Unfor­tu­nately unless you use the arrow keys (i.e. read the small print) it seems next to impossible to click on the magic spot to get Gates cloud dis­played.
This exer­cise begs the ques­tion of magic how­ever, and whether it is merely the mes­sage and not thew actual tech­no­logy being presen­ted that enthralls the audi­ence. One would expect that the concept of the iPhone itself may actu­ally be more appre­ciable than Win­dows Vista and Michael Dell simply didn’t talk as much about products because he didn’t have any excit­ing new product to intro­duce. Non­ethe­less, a fun little intel­lec­tual exer­cise.
Gates in fact doesn’t seem to have always had the product focus that he does now. There is a word cloud timeline of his com­mu­nic­a­tions and it is only recently that products have bgun to exper­i­ence high fre­quency of reference.

Spoke ScreenEvery­one knows that the value is in the net­work. SNA is a won­der­ful tool for aca­demic and I am using it to map my local webs of com­merce. The folks at Spoke how­ever are doing this one step bet­ter. They have the typ­ic­ally enorm­ous and touted list of key decision makers and influ­en­cers at com­pan­ies around the world. Noth­ing short of a big Spam list there. How­ever, on join­ing the net­work, you con­trib­ute your own con­tact list. Again noth­ing revolu­tion­ary in that…but here’s where it gets inter­est­ing. The little cli­ent that har­vests your con­tacts for Spoke also meas­ures how con­nec­ted (inbetwee­ness in SNA-speak) you are based on fre­quency and nature of con­tact based on your email his­tory. Sure its not flaw­less, but when you over­lay this with all the other par­ti­cipants they are build­ing one mega web and are cre­at­ing a poten­tially rich map of influ­ence flows. It raises some ser­i­ous pri­vacy and trust issues, but it is clearly push­ing the envel­ope one step bey­ond. Many CRM apps are out there try­ing to build sim­ilar webs in an auto­mated fash­ion, but gen­er­ally require huge rejig­ging and manual cre­ation of hier­arch­ical rela­tion­ship by thew user. Few actu­ally auto­mate the pro­cess, let alone start to weight the res­ults through con­tac­ted­ness (not con­nec­ted­ness) min­ing. Intriguing.