High Precision Entity Extraction: A U.S. State Department Case Study – SemTech 2009 Audio

You must register or login to download attachments.

Joseph C. Wicentowski, U.S. Department of State
Dan McCreary, Dan McCreary and Associates

The U.S. State Department’s Office of the Historian has embarked on an ambitious effort to migrate its diplomatic history document archive from paper to an enriched electronic media for online consumption. We have extremely high standards for semantic precision and accuracy, due to Congressional mandates, which makes this unique resource useful to a broad audience, which includes scholars, government officials, and the general public. Furthermore, the new format allows us to repurpose our content and integrate it with “mashup” applications such as timelines and geographical map views.

This case study reviews the U.S. State Department’s requirements and the decision process that led us to adopt high-precision semantic markup standards that are supported by our tools as well as by our vendors. We will review our requirements and decision-making, and will show concrete examples of how the precise identifiers for people, locations, and events allow us to enrich the display of our documents online.
We will also review the full document lifecycle and the need for automated but high quality entity extraction tools to minimize document conversion costs. This case study will discuss some of the tradeoffs others may face when advanced technology decisions have both risks and rewards for the digital historian.
In this presentation we will:

  • Review business requirements for a high precision entity extraction application
  • Describe our semantic approach
  • Demonstrate entity extraction
  • Demonstrate timeline and other mashups
  • Summarize project benefits
High Precision Entity Extraction – A US State Department Case Study.mp354.54 MB
Speakers Profiles:

After completing a Fulbright grant in Asia for his doctoral research and receiving his Ph.D. in History from Harvard University, Joseph C. Wicentowski joined the U.S. Department of State’s Office of the Historian. He has taken a leadership role in digital history management as a digital historian, developing new digital formats for the Department’s archive of U.S. diplomatic and foreign affairs documents, which reach back to the founding of the historian’s office in 1861. He has led development of a new website for these documents, based on a native XML database, and is working to bring the benefits of data visualization, metadata management, and other digital history applications to the federal government and the public. He has particular interests in XML, XQuery, and U.S. and Chinese history.

Dan is an enterprise data architect/strategist living in Minneapolis. He has worked for organizations such as Bell Labs and Steve Job’s NeXT Computer as well as founding his own consulting firm of over 75 people. He has a background in object-oriented programming and declarative XML languages (XSLT, XML Schema design, XForms, XQuery, RDF, and OWL). He has published articles on various technology topics including the Semantic Web, metadata registries, enterprise integration strategies, XForms, and XQuery. He is author of the XForms Tutorial and Cookbook.