Semantic Tuesdays

Reuters released the API for their Calais web service last week. I dabbled with it quickly calaislogo.gif last week, and then was reminded about it earlier today. I took a closer look and come away very impressed and thoughtful about the application of this technology. Calais accepts text and quickly extracts a variety of meta data about your content or as they phrase it : “automatically annotates your content with rich semantic metadata.” Currently it attempts to determine references to:

  • Entities: city, company, continent, country, industryTerm, MoneyAmount, Organization, Person, ProvinceOrState, Region and URL;
  • Events/Facts: acquisition, alliance, bankruptcy, businessRelation, buybacks, companyEarningsAnnoucement, companyEarningsGuidance, companyInvestments, compantLegalIssues, jointVenture, ManagementChange, merger, personPolitical, personPoliticalPast, PersonProfession, PersonProfessionalPast, stockSplit

This is a rather rich collection of metadata – and they target expanding from here. They have released the API, which is rich and simple, along with a couple demo applications to experiment with. A web-based application allows for an immediate glimpse of the potential. As an article today on Calais at ReadWriteWeb explains, there’s a as much in this for potential users as for Reuters itself. The need to refine and evolve the semantic tagging capability requires constant application of the technology and going the route of offering this open API to Calais.

As a test, I grabbed the article from RWW and had results in about 1 second.

calaisScreen.jpg

The screen also includes the embedded RDF below this demonstration output.

As a rather unique business proposition they are also offering bounties to private developers to accomplish certain tasks. The first is a 5K bounty to develop a WP plug-in to take advantage of the service.

For historical research, a service such as Calais offers an ability to quickly reverse the normal search process. Rather than searching for words of interest and dispatching a robot to wander about and return matches to the parameters you define, you could simply browse through a list of identified terms. They have been semantically identified and allows for serendipitous discovery rather than targeted searching. Combining the meta terms with a frequency count, you could allow terms to rise to the top and potentially have an additional browsing reference.

One comment

  1. I’m glad the little demo app I wrote was useful for you, Shawn. Thanks for the link. I’ll be adding some more functionality over the weekend. so that the data from Calais is actually used.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.