Have Semantic Technologies Crossed the Chasm Yet?


This article kicks off a series of interviews on Semantic Technologies in the MIT Entrepreneurship Review with industry thought leaders including Thomas Tague (Thomson Reuters), Chris Messina (Google), David Recordon (Facebook), Will Hunsinger (Evri) and Jamie Taylor (Metaweb).

At first sight, the answer is yes. I recently attended the Semantic Technology Conference in San Francisco. What had begun in 2005 as a 300-person conference has grown into a 5-day event with an amazing depth both of workshops and panels and over 1,300 participants this year. The conference is organized by Semantic Universe, an online platform with the goal of “educating the world about semantic technologies and applications”.

I have had the opportunity to talk to some of the key actors and innovators that have pushed semantic technologies and linked data forward over the past years since the term “Semantic Web” was first coined by Sir Tim Berners-Lee of the World Wide Web Consortium (W3C). The term takes on different meanings in different contexts: to some it is about representation of information in certain well-defined formats to make it machine-readable and easy to interpret; to others it is about web services and the aggregation of information to create valuable applications for users, while still others would highlight the artificial intelligence aspect and its use in tackling complex problems.

I have been personally drawn to the field of semantic technologies for some time, realizing the impact that these technologies will have on the way we consume information online as well as on the possibilities from an enterprise perspective. One thing I realized at the conference was that a lot of things that we take granted today, like online recommendations, are already powered by semantic technologies. In fact, a lot of the conversations happening in the hallways, between sessions, were not just around technical topics like how to best construct OWL ontologies or how to structure SPARQL queries, but rather about business issues like designing the right monetization models, improving e-commerce with semantic technologies, gauging the potential business impact of Facebook’s Open Graph, Twitter annotations or Google’s rich snippets. The New York Times, BBC, Newsweek, Tesco, Best Buy are some examples of companies that have been building and are relying on semantic technologies. To me, these are all strong indicators that semantic technologies have reached the tipping point.

Jamie Taylor, Minister of Information at Metaweb, the company behind Freebase, sees clear indications that semantic technologies have become more mainstream:  “Just the sheer size of the conference has increased pretty dramatically, as well as the diversity of people who actually have commercial offerings in terms of tools that matter to your typical webmaster, your typical content manager.” While there is still a strong academic track to semantic technologies, Taylor says, “it’s very interesting that sometimes semantic technologies have met the Web 2.0 lightweight user contribution-type model and as you add semantics into these types of systems – fairly lightweight semantics – all of a sudden they start getting much greater benefit.”

Managing one of the best-known semantic technology start-ups, Will Hunsinger, CEO of Evri, tells me that he has “seen a lot more activity in the last 12 month”. Naming Microsoft’s acquisition of Powerset and Apple’s acquisition of Siri as examples, he also points out that these “transactions have given validation that the technology is here and ready, but also that there is a path to liquidity.” One advice for startups and companies in the semantic technologies sector is to focus less on the technology itself and spend more time understanding consumers’ needs by asking themselves: “What does this technology do better than what’s out there such that you are going to solve a real problem”.  For example, at Evri, he adds “we create a better experience for the consumer applying the technology where it actually has a distinct advantage over keyword e.g. delivering precise results around general topics like “movies” or “reality tv”, understanding meaning and context (e.g. why is a particular entity popular right now) or even enabling consumers to follow topics over time”.

From a technological perspective, the recent developments around RDFa, a simpler version of RDF which allows users to add metadata to their content, will further accelerate the growth of the Semantic Web. Drupal 7, one of the biggest open source content management systems used on hundreds of thousands of websites, comes with major RDFa functionality. The latest HTML5 draft has RDFa support in it. Facebook’s Open Graph protocol is based on RDFa. Google Rich Snippets support RDFa. According to a recent GigaOM report, Twitter Annotations are looking to use it.

The benefits of semantic technologies with respect to making online search better are most obvious and to some extent already observable today. David Recordon, Senior Open Programs Manager at Facebook, sees some powerful applications in search, essentially “giving you a filter into the world based on your friends”. Thanks to semantic technologies built into the Facebook platform “developers [can] build on top of information which people have trusted Facebook with, whether that’s status updates or things they like, people they are connected to […]”. Google’s Open Web Advocate, Chris Messina, told me he agrees that social search will play a key role in the future: “we are starting to see Google integrating Twitter streams in search experience, hopefully providing users with more actionable information, providing a number of different opinions, more contextual data. It is certainly something Google is paying a lot of attention to – information that is contextual to the user, not just generic to the world.”

But what about exploiting the power of the semantic web by pulling in data from different sources, the premise of linked data? Thomas Tague, VP Platform Strategy at Thomson Reuters and in charge of the OpenCalais project, a free service to analyze and extract concepts from user-submitted texts or web sources, told me about the exciting opportunities he sees at the intersection of highly trusted monetized content and free web content. He says that “people are not going to make $100 million bets based on blog postings. But that blog posting may be an outlier, may be an initial indicator, maybe about a layoff at a factory or something like that, that the user can now immediately link back to Thomson Reuters data and gain insight and take action.” While Tague certainly shares the enthusiasm for the growth of semantic technologies and adoption of standards by industry participants, utilization of linked data remains low in his view. Therefore, his short-term outlook with respect to utilization of the linked data cloud, remains rather cautious: “There is a lot of talk about it, but with respect to our linked-data company information, people aren’t picking it up yet very much.”

So what can we expect in the near future? Jamie Taylor tells me that he thinks “the idea that you can aggregate is something very novel: all of a sudden my data is not limited to my data silo.” He distinguishes two types of data: core data, which must be managed by the organization to drive the core business, and context data–such as geo data. He believes that what “semantic technologies allow is in some sense to outsource [context data] to the community for maintenance.”

Overall, there seems to be consensus that as semantic technologies move out of the purely technical corner and beyond the innovators and early adopters in academia and government, content-heavy organizations and users like publishers or e-commerce sites will help these technologies cross the chasm as they see the largest benefit in applying the technology. As pointed out earlier, companies like The New York Times or Best Buy have already begun to build and rely on semantic technologies. As more and more companies start adopting linked data standards and share data in the linked data cloud, we will see more businesses created to derive value from aggregating data across different datasets to provide value to their users.

If this article has sparked your interest into semantic technologies, I can recommend a documentary by Kate Ray, a recent graduate from NYU with a major in Journalism/Psychology, who has contributed to the demystification of the Semantic Web through interviews with thought leaders, including Tim Berners-Lee, Clay Shirky, Chris Dixon, David Weinberger, Nova Spivack, Jason Shellen, Lee Feigenbaum, John Hebeler, Alon Halevy, David Karger and Abraham Bernstein. The clip has been viewed by more than 120,000 people so far. I asked Kate what motivated her to do the documentary: “My dad has been doing semantic web stuff for years, and my entire family never really knew what he was doing, so partly I was trying to make something that all these people here could show to their friends and family. I also had an academic interest in it.” Kate is now working on a company called Kommons, which she describes as a “Q&A forum built on top of Twitter; to let people ask questions to public figures – or anyone – and backing questions you agree with”.

MIT is at the forefront of exploring applications to commercialize linked data and semantic technologies, adding a new seminar, Linked Data Ventures, to the fall curriculum. The class will be taught by an all-star team consisting of Sir Tim Berners-Lee, Dr. Lalana Kagal, K. Krasnow Waterman, as well as Reed Sturtevant and Katie Rae. Computer science and business students will work in small teams to develop prototypes based on Semantic Web technologies.

About The Author

Rene Reinsberg is currently a member of the Entrepreneurship & Innovation program at MIT. His interests span Linked Data, Big Data, Open Data, and social graph analytics.