Six Weeks to the Semantic Web

We set out on a six week quest to harness the Semantic Web to find insight by compiling and combining our enterprise data from disparate systems. We aspired to convert single points of data with limited individual value into a collective database where semantically linked information could provide more power. We set out on a six week quest to harness the Semantic Web; to find insight by compiling and combining our enterprise data from disparate systems. We aspired to convert single points of data with limited individual value into a collective database where semantically linked information could provide more power.

The catch was that we hadn’t a clue where to begin, only six weeks to complete the mission, and no budget. This is the story of our journey, which we believed at the outset was unlikely to be successful, but pursued anyway, in order to learn, and then to be able to use what we learned in different ways.

Ask and Ye Shall Retrieve

The two of us had worked on an enterprise search project earlier, so we thought that our mission should be confined to the area of search in order to keep our goal at a manageable level. It also helped that the user interface and other elements were already in place, eliminating any need to build a delivery method for our semantic system.

Quite naturally, we thought search would be the main enabler of retrieving this new enterprise knowledge. One would have to ask a question, after all, to retrieve the appropriate answer from this knowledge store. Or so we thought.
We had access to many tools on subject matter expertise location and lessons learned, systems that we are constantly improving. Just last year a subject matter expertise locator was piloted with good usability, a fine user interface and superb notification mechanisms. From an IT standpoint, the pilot was executed flawlessly, we had lots of change management, everything was communicated well, and people were excited.

But, in the beginning, few people used it.  It wasn’t until we talked to Andy Schain at NASA that we understood why, and the conclusion of the pilot confirmed his answer. The problem, it turns out, is not with the technology but with basic human nature. People in many cases do not like to ask questions in a public forum.

A Who-You-Know World

Andy’s approach to solving the problem of getting people to ask more questions was to find who knew whom in order to enable people to ask a person they knew, and were comfortable with; to ask the expert, instead of asking the expert directly. So, that’s what we decided to do. We set out to connect the dots between people in our business- a database of who knew whom. It is important to start small and very, very narrow. Take baby steps to avoid mountainous problems arising from multiple dimensions of information teaming with complexity, diversity, trust, size, and rate of growth issues. Any one of these can be your downfall when you are a beginner. If you have to tackle one of these issues, pick a project that is small and quick.

We chose data that was readily available in screen format and uses the lowest possible denominator, in our case that was the employee ID we encrypted for testing purposes. For our data sources, we used the LDAP systems, basic directory services information, and a subject matter expertise locator which had experts in it as our best practices repository.  We thought we could associate experts with our Six Sigma repository of projects. In other words, we picked systems we knew had a lot of trust but not much usage and with a slow and stable growth rate that was not particularly diverse or complex.  

From there, we would assign simple values for commonalities among people. One value was assigned for working under the same supervisor five years ago, another for working under the same supervisor now. A value was similarly assigned for attending the same school, obtaining the same degree, working on the same project, etc. The higher the end value, the more related any given pair of people in a functional unit. In effect, we could now know who knew whom and how.

Non-technically, Technically Speaking
At first we spent our time reading and exploring the many tools out there, most of which are available in open source or free trials. The various tutorials were very helpful in increasing our understanding. Although we learned a great deal, we were unable to put everything together and make it work, especially when it came to screen scrapers, since none of the data was embedded with RDFa, EDRF or microformats.
Then we learned that MIT’s Simile Piggy Bank tool can do faceted browsing and that Solvent, a screen scraper, runs on top of Piggy Bank. That was the glue we needed, and facilitated merging different data sources.
Lastly, we learned of RDF-izers that enable you to convert information in many different types of files into RDF and drop it into Piggy Bank. This is a very useful way of integrating your data and making it available. There are several pages with lists of RDF-izers including the Simile web site, MindSwap and W3C’s Semantic Web wiki.

Solvent forces you to think about that unique identifier you are going to use to integrate the data. Once you select an identifier and give it a unique name in the system, anything you pile into the system with your screen scraper is going to fall neatly into place. The Simile project also offers a Semantic Bank, which allows you to publish your local Piggy bank data into a shared repository to provide a broader view of the integrated data. This type of bank would be behind the firewall, of course. But, there are also several public Semantic Banks, most notably the original at MIT.

Money Talk
Our start-up costs were zero. The two of us worked mostly after hours and on weekends on this six week project and used our laptops and a couple spare PCs lying around the office.

Other pressing considerations include security and privacy issues. As much as we wanted to provide all the info we found to enterprise users, it simply wasn’t prudent to do so. Even though the same data could be found in various existing silos, putting the information together in one place reveals more than any portion of the data alone. Sometimes the compilation and analysis reveals too much about the company or invades an individual’s space beyond legal limits or societal acceptance. We highly recommend that you plan to run into these issues from the outset and bring the right people to the table early to curtail any potential liability problems.

We arrived at the end of our quest both elated and exhausted, convinced the Semantic Web could move our enterprise ahead in new, meaningful, and efficient ways. The many frustrations and obstacles we faced along the way were nothing when weighed against this new and intriguing competitive edge.
To view the full presentation along with audio narration by the presenters, click this button. The presentation will appear in a separate window in your media player.