Wikipedia's Structured Data Challenge

Wikipedia contains more than 15 million articles in 265 languages. These articles are flat text. Structured data is presented in standardized layouts using community-developed templates. The use of templates offers a potential path to extract structure from the article source text, and indeed, projects like DBPedia have done so. Other projects, like Semantic MediaWiki, aim to enrich wiki syntax and the template system with relation types, backend storage, queries, and data export functionality.

The Wikimedia Foundation, which operates Wikipedia, is not blind to the potential of these technologies. Due to limited resources, we are focusing our attention on the most pressing user experience issues first. However, we have begun exploring smarter ways to manage templates inside Wikipedia, which could potentially be a foundation for further improvements to structured data in Wikipedia. In this presentation, we will discuss our thinking to-date, as well as some of the longer term challenges in managing structured data inside a massively collaborative, multilingual community. As an open source project, we will appeal to SemTech attendees to help us address these challenges.