Comparing Word Clouds

The folks at Many Eyes recently intro­duced their new com­par­ison cloud tool. Basic­ally, it lets you visu­al­ise two frag­ments of text dis­play­ing word fre­quency for each in the same cloud. It’s an inter­est­ing addi­tion to the more famil­iar word cloud. cloud3.jpg Using a stand­ard word cloud you get a mat­rix of words with rel­at­ive size, weight or col­our high­light­ing fre­quency in a selec­ted text. This quickly allows you to visu­ally per­ceive an author or speaker’s emphasis on a par­tic­u­lar theme or style of writ­ing or speak­ing. With Many Eyes hybrid tool, words which occur in both text are abut­ted. You can now visu­ally com­pare two texts from the same author for sim­ilar empah­sis or quickly determ­ine a dif­fer­ence between texts. In the example presen­ted at Many Eyes, they com­pare the US pres­id­en­tial State of the Union addresses from 2002 and 2003. In this example they note the less fre­quent men­tion of Afgh­anistan and the increase in men­tion of Sad­dam. Whether this allows one to con­clude a change in policy or not, it does demon­strate the use of the tool for pro­vok­ing ques­tions for fur­ther exploration.

On Sat­urday, the Ontario gov­ern­ment offi­cially announced how much fund­ing each uni­ver­sity in Ontario is to receive for main­ten­ance and renewal of facil­it­ies. I just happened to see announce­ments from a few insti­tu­tions appear sim­ul­tan­eously in my RSS reader and was struck by the rather dif­fer­ent ways in which they presen­ted this news.

At McMas­ter, there was a rel­at­ively terse announce­ment that provided very little detail on how the money would be spent. West­ern on the other hand had a pretty pic­ture and a com­plete list of how much was being dis­trib­uted to each insti­tu­tion. The Uni­ver­sity of Guelph was more detailed than McMas­ter and provided very pre­cise details of what the money would be spent on. I was struck by the dif­fer­ences, so I thought I’d see how I might quickly use a text ana­lysis tools to com­pare the announcements.

I rely on two sources for tools such as these TAPoR­Ware and ManyEyes. For indus­trial strength ana­lysis and fast res­ults, I use TAPoR­Ware tools. By simply choos­ing the URLs of the announce­ments from two uni­ver­sit­ies I receive a wealth of inform­a­tion about the announce­ments. I am par­tic­u­larly inter­ested in extremes in this case. What makes each announce­ment sim­ilar and what appar­ent dif­fer­ences are there. Tak­ing a look at a chart of com­mon words and their fre­quency of use is a first attempt at this.

tapor.jpg

A sim­ilar chart was cre­ated show­ing me words that appeared only in one or the another and I was imme­di­ately struck by the fact that cam­pus didn’t occur at all in the McMas­ter announce­ment, where it was the most fre­quent word at Guelph. On the other hand, McMas­ter emphas­ized engin­eer­ing and psy­cho­logy. Yet, neither word occurred in the text of announce­ments. The reason for this was my use of the the web addresses of the announce­ments, as opposed to the text of the announce­ments them­selves. The TAPoR­Ware tool ana­lysed all the text on the page and McMaster’s announce­ment page con­tained sum­mar­ies of a vari­ety of other announce­ments, thus ‘pol­lut­ing’ my ana­lysis. Thank­fully there is an easy way to fix this. By choos­ing to upload only the text of the announce­ments them­selves (And thus help the tool know just what is import­ant to me) I can get the res­ults I want to consider.

tapor2.jpg

Voila! Now I can see that Guelph emphas­izes the future and cam­pus, whereas McMas­ter emphas­izes renewal. Inter­est­ing. I want to con­sider this fur­ther, but I am far more a visual thinker, and while these bar charts are pleas­ing, and take a wealth of data and dis­till it to a very nice sum­mary, I want to take it one step fur­ther. Word clouds are a way of accom­plish­ing this as I men­tioned above.

ManyEyes new tool gives me a way to quickly accom­plish this com­par­at­ive ana­lysis. Unfor­tu­nately, I can’t just point ME at the web pages and have it cap­ture the text. I had grabbed the text files above to bet­ter focus TAPoR­Ware, and so it was a mat­ter of copy and past­ing the text from each of the uni­ver­sity web pages and insert­ing a short com­ment line between each frag­ment. Then you simply upload it to Many Eyes by past­ing it into a text box, apply­ing some meta inform­a­tion (a title, source and descrip­tion) and click­ing the upload but­ton. Once uploaded, you can choose from a vari­ety of avail­able visu­al­iz­a­tion tools. Choos­ing the word cloud tool imme­di­ately presents you with a default cloud dis­play. In this case, Many Eyes noted the frag­ment dividers and auto­mat­ic­ally selec­ted the com­par­ison cloud type. I could have over­rid­den this option if the frag­ment dividers were actu­ally part of the text I was analysing.

The texts that I was ana­lys­ing are some­what shorter than the examples that the Many Eyes blog fea­tured and one thing that became appar­ent was that shorter text may demon­strate a far fewer num­ber of com­par­able words. Non­ethe­less for the ones that are iden­ti­fied, one might be inspired to con­sider whether lack of emphasis is reflect­ive of insti­tu­tional pri­or­it­ies. In the word cloud in this post, I com­pare the announce­ments from York and Western.

wordcloud1.jpg

York seems to be emphas­iz­ing cam­pus renewal, where West­ern seems to focus on fund­ing as a concept.

To fur­ther refine the ana­lysis, you can also choose word pairs on the dis­play and change the cloud to most fre­quent pairs of words. Unfor­tu­nately in my samples, cam­pus renewal and facil­it­ies renewal are the only two repeated pairs, York favour­ing both.

wordCloud2.jpg

If we con­sider the announce­ment from Guelph versus the one from York, the word future fea­tures very large in the Guelph announce­ment. Does this mean they have vis­ion for the future or that they fear the future? The word York in their own announce­ment is the single most fre­quent word, where ref­er­ences to Guelph in theirs is rare. Does this sug­gest that York Uni­ver­sity is far more inter­ested in self-promotion? These are the sort of ques­tions that calls for fur­ther invest­ig­a­tion and under­lies the danger of try­ing to use word clouds on their own. They are all the rage and can be very power­ful, but as with any visu­al­iz­a­tion tool, they call for con­sid­er­a­tion of short­com­ings as well as strengths.

By the way, thanks to Western’s com­pre­hens­ive list of where all the fund­ing went, it begged cre­at­ing a bar chart of dis­tri­bu­tion amongst insti­tu­tions. Click on the chart below to go to ManyEyes to see it.

funding.jpg

Leave a Reply

*