Reu­ters released the API for their Cal­ais web ser­vice last week. I dabbled with it quickly calaislogo.gif last week, and then was reminded about it earlier today. I took a closer look and come away very impressed and thought­ful about the applic­a­tion of this tech­no­logy. Cal­ais accepts text and quickly extracts a vari­ety of meta data about your con­tent or as they phrase it : “auto­mat­ic­ally annot­ates your con­tent with rich semantic metadata.” Cur­rently it attempts to determ­ine ref­er­ences to:

  • Entit­ies: city, com­pany, con­tin­ent, coun­try, industryTerm, MoneyAmount, Organ­iz­a­tion, Per­son, Province­Or­State, Region and URL;
  • Events/Facts: acquis­i­tion, alli­ance, bank­ruptcy, busi­ness­Rela­tion, buy­backs, com­panyEarn­ing­sAn­nouce­ment, com­panyEarn­ings­Guid­ance, com­pa­ny­In­vest­ments, compantLeg­alIs­sues, jointVen­ture, Man­age­mentChange, mer­ger, per­son­Polit­ical, per­son­Polit­ic­al­Past, Per­son­Pro­fes­sion, Per­son­Pro­fes­sion­al­Past, stockSplit

This is a rather rich col­lec­tion of metadata — and they tar­get expand­ing from here. They have released the API, which is rich and simple, along with a couple demo applic­a­tions to exper­i­ment with. A web-based applic­a­tion allows for an imme­di­ate glimpse of the poten­tial. As an art­icle today on Cal­ais at Read­WriteWeb explains, there’s a as much in this for poten­tial users as for Reu­ters itself. The need to refine and evolve the semantic tag­ging cap­ab­il­ity requires con­stant applic­a­tion of the tech­no­logy and going the route of offer­ing this open API to Calais.

As a test, I grabbed the art­icle from RWW and had res­ults in about 1 second.

calaisScreen.jpg

The screen also includes the embed­ded RDF below this demon­stra­tion output.

As a rather unique busi­ness pro­pos­i­tion they are also offer­ing boun­ties to private developers to accom­plish cer­tain tasks. The first is a 5K bounty to develop a WP plug-in to take advant­age of the service.

For his­tor­ical research, a ser­vice such as Cal­ais offers an abil­ity to quickly reverse the nor­mal search pro­cess. Rather than search­ing for words of interest and dis­patch­ing a robot to wander about and return matches to the para­met­ers you define, you could simply browse through a list of iden­ti­fied terms. They have been semantic­ally iden­ti­fied and allows for serendip­it­ous dis­cov­ery rather than tar­geted search­ing. Com­bin­ing the meta terms with a fre­quency count, you could allow terms to rise to the top and poten­tially have an addi­tional brows­ing reference.