The Pfizer IDEA project: An Interview with Franz’ Jans Aasman and IO Informatics’ Robert Stanley

Semantic Universe editor Tony Shaw recently spoke with Jans Aasman, CEO of Franz Inc., and Robert Stanley, President & CEO of IO Informatics, about the announcement of their new strategic partnership to deliver ‘fit for purpose’ applications created by an innovative Semantic application framework. Their partnership has already seen success with the Pfizer IDEA pilot, which serves as a real-world example of using a semantic application in the pharma industry. This pilot was used to integrate data for compound purity verification and drug product stability analysis. The IDEA project was originally expected to take four to six months to produce results, but by using the AllegroGraph-Sentient framework, it was completed in only six weeks.

Tony Shaw:  Can you describe what your company does in terms of semantics?

Jans Aasman:  Franz is the leading supplier of commercial, persistent and scalable RDF Graph Database products. The company provides solutions to combine unstructured and structured data using W3C standard RDF for creating new Web 3.0 applications as well as identifying new opportunities for Business Intelligence in the Enterprise. AllegroGraph, Franz’s flagship product, is a high-performance graph database capable of storing and querying billions of RDF statements. AllegroGraph is unique in that it provides robust transactional support and unlike other RDF stores it allows for many clients to perform inserts, deletes, and queries concurrently, even on databases with billions of triples/quads.

Robert Stanley:  IO Informatics’ software tools apply semantic technology to integrate heterogeneous data more efficiently and dynamically than previously possible. This technology solves information integration, knowledge and project management problems that were impractical to address in the past.  The Sentient software suite integrates data, applications, databases, and instruments into one secure, interoperable environment. Users can access, query, and apply data from multiple internal and external sources regardless of original format and location. Administrators and end-users can create effectively targeted, data-driven informatics applications without programming. This makes it possible to create and apply knowledge more efficiently and effectively than was previously possible.

Tony Shaw:  Why is there such interest in the Pharma world in Semantic technologies? What can Semantics do that could not be done (or done cost effectively) before?

Robert Stanley:  Pharma and life sciences have an unusually high barrier to entry for effective and timely data integration due to the complex, changing and interrelated nature of scientific data and applications. Semantic technologies remove cost, time, flexibility and extensibility barriers for enterprise data integration and application delivery in environments characterized by large, complex, changing datasets and integration requirements.

Semantic technologies make it possible to create the data model required “on the fly”, to integrate multiple data sources, modify and add new relationships and data sources as needed. This enables the delivery of mission-critical Pharma applications faster and more efficiently than was possible before. Applications can now be delivered that weren’t possible before due to time and cost constraints for data integration. Semantic technologies excel at delivering high value integration applications that simply were not practical enough to create, maintain or extend using traditional object oriented and relational technologies.

The Sentient Suite allows for quick, simple integration and knowledge exploration at a competitive price. Technological advances, including cost effective triple-stores like Franz, have allowed for performance and pricing in line with non-semantic methods, with results that are far more flexible and dynamic than from traditional integration methods.

Tony Shaw:  Can you please explain the AllegroGraph-Sentient framework that you developed?

Jans Aasman:  The AllegroGraph-Sentient framework allows users to integrate, analyze and use complex data at a fraction of the cost and time typically required.  This greatly reduces the cost and lag-time associated with traditional data warehouses.  AllegroGraph provides the scalable storage component for ‘RDF-semantic triples’ and Sentient allows you to visually explore semantic networks of information, and even compose queries by simple mouse clicks instead of writing complicated queries. Sentient has integrated with AllegroGraph in order to capitalize on the scalability and unique features provided by AllegroGraph.

Tony Shaw:  What problems did it solve?  Why was it successful?  What is it about the AllegroGraph-Sentient framework that yielded such productivity gains?

Robert Stanley:  This has been a very successful integration, and Franz is a great company to work with. For example, working together we have been able to deliver data integration far more rapidly than anticipated. In a useful customer case, we were able to usefully connect data that was challenged by a lack of precise and consistent connections and deliver an innovative application in less than 6 weeks, where traditional technologies projected 6 months to do the same job. Due to the efficiency that semantic technologies make possible, we were able to add another dataset and application within the original project time frame! You can read about this in several articles, press releases, and publications about our joint relationship.  

The combination of IO Informatics’ Sentient software and Franz AllegroGraph creates a strong life science integration framework. Sentient connects to complex life science data sources such as Laboratory Information Management Systems (LIMS), Chromatography Data Systems (CDS) and others too numerous to list, to create targeted semantic integration of this data. AllegroGraph v4 provides complementary data access and transformation capabilities combined with high performance storage, processing, reasoning and retrieval of industry scale semantic data to complete the integration and high performance application equation. Together these tools create an unmatched ability to integrated and deliver useful, production-ready applications that start from complex life science data and end in elegant high performance applications that are tightly targeted to meet our customers’ needs.

Tony Shaw:  How could this framework and/or these technologies apply to other industries?

Jans Aasman:  The same approach could be used for Department of Defense applications, financial applications, telecom applications, etc. We are currently exploring user interest that has arisen since the partnership announcement with IO and recent webinar.

Robert Stanley:  These products are completely horizontally applicable.  Semantic technologies are particularly suitable wherever there are growing integration and application needs based on complex, changing, interrelated data sources.

Tony Shaw:  What kinds of problems don’t work well for these kinds of systems?

Robert Stanley:  Large isolated datasets that are intended for heavy processing don’t really require semantic technologies. Storage and processing of massive gene sequencing data is well suited to more traditional relational database storage, as one example.  Relational databases and file storage will remain for storing and processing data. This applies particularly well to data in a static schema or data which requires little or no connections with other data.

Semantic technologies are more well suited to creating rich interconnections between multiple data sources.  We have good examples where we take the distilled output from gene sequencing databases (for example variation between genes related to a disease) and connect them to experimental results from other data sources, such as protein and metabolic data sources. Semantic databases are great at storing high value data with their interconnections, to create rich “knowledge bases” that also provide a useful back up of mission critical information. Semantic technologies can also be good for connecting “raw” data to “distilled” information – for example, with Pfizer we connected experimental report information back to the original experimental data.

Tony Shaw:  How can the results of the IDEA pilot be shared with other parts of the organization?

Robert Stanley:  This is a great strength of semantic technologies.  Once data has been accessed and mapped to the semantic data model using Sentient and stored as semantic “triples” within AllegroGraph it becomes uniquely well suited for integration with new data sources and for extension to new applications. For example, we are reviewing the possibility of extending the original 3 data sources integrated within the original pilot – for Compound Report Verification and Stability Analysis – to new manufacturing and modeling data and applications to create a Compound Manufacturing Modeling and Prediction application. Semantic technologies make this sort of integration and extension easy and makes the sharing of results practically useful!

Tony Shaw:  What skills are required to develop and use technology like this? What backgrounds are useful for developing semantic architects and programmers? Where do you find those people?

Jans Aasman:  Experienced database people will feel well at home in the world of semantic modeling once they can make the switch in their head from structured relational databases to more fluid semantic databases. Developers that are now being brought up in the No SQL or “Big Data” tradition and learn how to support massive web scale databases.

People with a background in rules and logic also do well in this space, they will have more in the reasoning and query part of the semantic web. The hardest thing for us (in the semantic web community) is to make this technology so easy that the typical web developer and Javascript programmer can use our technologies effortlessly in the applications that they are building.

Robert Stanley:  In recent years ontology expertise and “SPARQL” (semantic query language) expertise have been seen as required skills to develop and use technology like this.  This is an area where IO Informatics has been putting a lot of work – to lower the barrier to entry for use.  We recently demonstrated our Visual SPARQL technology in a webinar with Franz.  This makes it easy for a non-expert to create very powerful semantic queries, using a visual format to generate complex SPARQL query code.  Similarly, our Knowledge Explorer and Web Query components provide UIs and workflows that reduce the need for advanced expertise to create and most importantly to use semantic applications.

That said, knowledge of data mapping, ontologies and queries – ideally with semantic experience – is useful.  You can find experts like these at Franz and IO Informatics, and with our tools we can help you become an expert quickly!  
We have had biology researchers who aren’t experts become thrilled with their ability to use our tools, even for some relatively expert integration tasks. Informatics and computer science departments are also turning out more and more leaders in these areas. Stanford and W3C are notable resources.

About the Author(s)