This case study explains the development and deployment of the Immunisation Explorer, a newly created business application within UCB Group that has been developed to exploit the semantic services provided through the Metatomix Semantic Platform.
This application brings data together from a varied range of systems into a consolidated view as new antibodies are registered, allowing the scientists to start to answer critical questions, such as;
“What immunisation regime produced this antibody?”
Through the deployment of this application, UCB have been able to rapidly integrate data from many different sources ranging from spreadsheets to large Oracle databases to help their business users address dual objectives:
What started as a tentative step to explore the capabilities of semantic technology has now blossomed into a giant leap by helping users make informed decisions.
UCB is a global biopharmaceutical company based in Belgium, with operations in more than 40 countries and revenues of €3.6 billion in 2007. The company is a recognized leader in treatments for allergy and epilepsy, and in the rapidly emerging field of antibody research, particularly in conjunction with proprietary chemistry.
Within all biopharmaceutical companies, the cost and effort invested in new entity discovery, both chemical and biological entities, is immense. Targeting research into the most productive areas and effectively utilizing available resources are two key objectives for any company in the biopharmaceutical market. The problem isn’t a lack of data, but rather an overload of raw data, spread across entirely different IT systems, with no easy way of understanding it as a whole.
Through the deployment of the Metatomix Semantic Platform, UCB is able to rapidly integrate data from a rich discovery process achieved through a combination of semantic modelling, non-invasive data gathering from existing data sources and rule-driven business process-led behavior. A central capability of the Metatomix platform is the enactment of policy-based behavior that responds to what is known, at any point throughout the query. The policy engine is configured to know what data sources are available and is able to trigger the appropriate query, receive data from that data source, transform it into resource description framework (RDF) and make it available to the case for assessment.
This architecture is illustrated in the diagram below:
This process enables many different sources to provide a consolidated view of all relevant information, enabling UCB to address dual objectives:
What started as a tentative step to explore the capabilities of semantic technology has now blossomed into a giant leap by helping users make informed decisions.
The Project
The starting point for the use case was the registration of a new antibody following its sequencing. At this point, the scientists want to be able to view all related information and have many different questions asked. One of the most important questions being:
“What immunization regime produced this antibody?”
However, at the point of registering a new antibody, very little is known. It isn’t possible to raise queries against multiple data sources about an antibody, as not enough information is known to be able to furnish the queries. A knowledgeable user could traverse the different systems by connecting the dots, but this is hugely time-consuming, even if the user has been given comprehensive data access.
The Immunization Explorer
The business use case developed within UCB takes advantage of the enrichment framework, and defines what has been called the “Immunization Explorer” application. This application is the first point along the entire antibody research life-cycle, and it will extend this application to embrace many other similar entry points where scientists can look across all the relevant data.
The Immunization Explorer starts with the registration of a new antibody. The application creates a case and proceeds to collect all the relevant information associated with this case by enacting a number of iterative queries through the different data sources. This information is able to trace back through secondary testing and primary testing to the immunization regime that initiated the project. This is known as the antibody enrichment cycle.
This cycle constructs a consolidated view of all relevant information associated with any newly registered antibody through the different phases of the project. In this way, scientists are able to evaluate what immunization regimes are leading to the production of antibodies, with and without the right properties. Scientists are able to track which sample is the source of the new antibody, and identify the culture plate and associated assay plates which contain samples from the same source.
This is an example of semantic model-based integration working in conjunction with a process-centric rules engine to create an application that can respond to the level of knowledge that is known at any point and drive enrichments based on this level of knowledge. This enrichment cycle navigates through and collects data from a wide range of systems, transforming the data into RDF and assembling it into a single data model within the Metatomix Semantic Platform. The resulting model is then available to be analyzed in many different ways by the user. This is illustrated below:
Triggering the Antibody Enrichment Cycle
The antibody enrichment cycle is triggered in one of two ways. The first method is an automated back-end process that passes the list of candidate antibodies through to the Metatomix Semantic Platform. This technology responds by pre-preparing the information for a user through the creation of individual cases for each antibody. The information is also enriched for each user so it is ready to be queried through the User Interface.
The second method allows a user to enter queries directly through the User Interface. In this scenario, the user input triggers the antibody enrichment cycle, causing a case to be created and for this to trigger the call-out to the different data sources, the conversion of the different data into RDF and the presentation of the consolidated information back to the scientist in the User Interface.
Creating the Ontological Model
A range of ontologies have been developed to support the data integration requirements within the antibody research area which provide a common model across both new biological and chemical entities.
The concepts defined in these ontologies cover the concepts relating to the data surrounding experiments, tests, test results, and the immunization regime. As well as, the project life-cycle concepts relating to stages of the project and people aspects, such as who is working on the project and their reporting structure.
Collectively, these ontologies create a single conceptual model within which all the disparate data can be understood within a common framework, both to allow scientists to look across all relevant data with each experiment and to allow project managers to stay current on each projects progress.
A subset of these concepts and their relationships are illustrated below:
Bringing the Data Together
Following the creation of the ontological model, each data source is mapped to ontologies so data can be collected, transformed and inserted as instance data that is understood within the common model. In this way, it becomes immaterial as to which data source is the source for any particular piece of information as all data can be seen, accessed and interpreted in a single consolidated view.
For each data source a process chain is constructed using a library of pre-built utilities supplied as part of the Metatomix Semantic Platform that provide connectivity and data transformation methods. This significantly increases the speed with which data integration can be achieved.
Acting on the Data
The Policy Engine, provided with the Metatomix Semantic Platform, provides a wizard-based method for constructing rules that can assess a level of knowledge at any point and can configure necessary actions to be taken based on this knowledge.
Policies are constructed in order to control a set of service requests that invoke specific data queries, the collection of data from different data sources and the transformation of data into RDF within the common model.
A Single Application with Different Uses
As described above, the Immunization Explorer provides a consolidated view of all information relating to an antibody, collected as a result of its registration.
At the same time as this information is being assembled, further data queries are made into a range of other systems that explore which users are working on a project, and the projects status. Determining the status is often through interpolation across data sources and inferring the stage a project has reached. For example, detecting that a proposed project does not yet have a start date can be interpreted with the status “awaiting ordering of animals.”
This information is collected, interpreted and presented in the Immunization Scheduler Interface, which is used by project managers, rather than scientists.
Conclusion
UCB began with the idea that using semantic technology could help solve the problem of efficiently and cost effectively bringing together large amounts of raw data. With the help of Metatomix Semantic Technology, the Immunization Explorer project was completed within two months and is going into production.
There has been great enthusiasm engendered within the business community to extend semantic technology similar to that used with the UCB/Metatomix project across other enterprises. Semantic technology has proven to effectively bring disparate data together within an enterprise and continued success stories like the UCB/Metatomix project further show the strong potential this technology possesses.