IBM

01.12.09

Entity Extraction is the process of automatically extracting document metadata from unstructured text documents.  Extracting key entities such as person names, locations, dates, specialized terms and product terminology from free-form text can empower organizations to not only improve keyword search but also open the door to semantic search, faceted search and document repurposing.  This article defines the field of entity extraction, shows some of the technical challenges involved, and shows how RDF can be used to store document annotations. It then shows how new tools such as Apache UIMA are poised to make entity extraction much more cost effective to an organization.

01.16.08

Back when I was an industry analyst (VP, E-Business Strategies at the META Group, since acquired by Gartner), I often had to critique emerging markets.  Unlike venture capitalists, industry analysts are privy to product roadmaps from publicly-traded companies, including the industry giants (Oracle, SAP, Microsoft, IBM).  And unlike i-bankers, they are privy to product roadmaps from start-ups.  And as a kicker, some analysts (actually, only those with the largest firms; back then, primarily limited to those analysts with Gartner, Forrester, META and Giga) get a lot of great feedback from CIOs and other end users.