We have witnessed over the years the progression from basic machine languages, to higher-level procedural languages, and then to object-oriented languages. Each advance introduced dramatic improvements in software capabilities that resulted in major leaps forward in fulfilling information technology requirements.
We are again on the verge of another major advance in the evolution of software technology that may bring great value to organizations and other information technology beneficiaries.
Semantic technology gives us the ability to define the meaning of data; by providing language constructs that closely reflect how we think and reason. It does this in a way that a machine can understand, by utilizing a very formalistic and descriptive language that is based upon predicate logic. Predicate logic is a form of deductive logic used in computer science that expresses all arguments in machine-friendly unambiguous mathematical terms. We use these semantic language constructs to build an . An ontology is a very sophisticated model that defines how data inter-relates with other data; it is also a classification system for data. Defining an ontology is a fundamental feature of semantic technology.
With the advent of semantic technology we are able to effectively transform information, as a raw material, into knowledge. Whereas information is discrete data about a specific subject; knowledge is information that is linked or synthesized together in a valuable way. The distinction to be made here is that information may reside in disparate and heterogeneous formats, such as unstructured text, diverse file formats, executable software models, and web pages. Using conventional technologies we often face great challenges to integrate and untap the value of the information contained in these data structures. Semantic technology attempts to overcome these limitations by providing mechanisms to map data to a common domain model in order to support greater interoperability and integration of data, resulting in knowledge.
Secondly, semantic technology also provides mechanisms to load disparate data content into semantic data stores that will enable immediate data integration and allow advanced query capabilities. A semantic data store is a data repository that understands how data is inter-related by applying the data classification and relationship rules defined in the ‘ontology model’.
Perhaps the most powerful aspect of semantic technology is its ability to apply rules and inferencing logic to the data in a semantic data store in order to discover data relationships that would otherwise not be as easily detected when using other conventional technologies. Inferencing is the ability to arrive at a conclusion based upon a set of rules applied to data. Inferencing, for example, can follow a set of dependency relationships defined across a set of data elements to show how all of the data elements are truly tied together. An ideal use case for inferencing is impact analysis. Another use case might be in identifying suspicious associations of data and transactional patterns that could result in improved fraud detection.
Semantic technology has migrated beyond the domain of academia and research institutions and is now penetrating organizations for commercial use. This article will explore some of the innovative language constructs that are foundational to semantic technology in order to demonstrate the power that semantic technology offers.
1. Semantic Languages Stack
There are 3 major XML-based Semantic languages that have been recommended by the World Wide Web Consortium (W3C). The Resource Description Framework (RDF) is the primary language that all other semantic languages are based upon. RDF Schema (RDFS), which is based upon RDF, offers greater specialization in describing data relationships than does RDF. Web Ontology Language (OWL), which is based upon RDFS, provides even greater expressivity than RDFS. There is also SPARQL, which is the W3C recommended semantic query language. The challenge is in knowing and understanding how to use a particular language; or a combination of languages for a specific purpose.
Figure 1 Semantic Language Stack
1.1 Triples
A core feature of semantic technology is that it is based upon the notion of a ‘subject’, a ‘predicate’, and an ‘object’. This ‘semantic’ structure closely resembles how we think and talk. Just like in basic grammar, the subject identifies the entity of interest. The predicate identifies how the subject is associated with the object in a meaningful way. These three values form the basis of the ‘triple’. In Figure 2, “David Newman architects a system,” is an example of a triple. The triple is the functional equivalent of a row in a relational database. Semantic queries are performed against a set of triples in a semantic data store. More about this later.
Figure 2 Triples
1.2 Resource Description Framework (RDF)
RDF, the foundational semantic language, is based upon the notion of resources. Resources, quite simply, correspond to classes, and typically reference real world things. Each resource is specified by a Web addressable Uniform Resource Identifier (URI), allowing resources to be uniquely named and distributed over the Web.
Properties provide a meaningful context by which we can understand relevant information about the resource. Properties can be thought of as verbs that express actions performed by a resource or nouns that reflect attributes of a resource. A statement is a way of specifying how a property describes a resource. Figure 2 is an example of RDF statements, showing properties (architects, author) which are displayed in a ‘triple format’. RDF can also be specified in other formats, such as XML (Figure 3).
Figure 3 RDF XML Format
1.3 RDF Schema (RDFS)
RDF, by itself is fairly limited in utility, although it can be used to populate a triple store with basic statements. However, the real power of semantic technology begins to come into play with the use of RDFS. Perhaps the best way to introduce the capabilities of RDFS is to show where Object Technology and RDFS intersect and then show where RDFS takes off at a 90 degree angle.
The main similarity is that both languages support the concepts of classification and inheritance. Classes, within Object Technology, contain methods which perform specific behaviors. In contrast, classes within RDFS do not respond to requests, but rather are associated with other classes or values by virtue of properties. Properties in RDFS are classes in their own right; and they are defined globally and independently to the classes that they provide associations for. A property such as ‘owns’ may be associated with many unrelated and different classes. For example, ‘owns’ may be independently associated with the class Company and with the class Person. In RDFS it is possible to alter a property without needing to alter the class that it describes. This is in contrast to Object Technology, where a property may take the form of an attribute that is tightly bound to and encapsulated within a particular class.
Inheritance in RDFS effects classes as well as properties. RDFS provides language constructs such as rdfs:Class and rdfs:subClassOf; as well as rdf:Property and rdfs:subPropertyOf.
Another major distinction between RDFS and Object Technology is that inheritance in Object Technology is all about propagating behaviors defined for a superclass down to all classes within the inheritance hierarchy; while inheritance in RDFS means that a subclass is a member of the same group as the superclass for purposes of inferencing. In RDFS a subclass cannot override a superclass as it can in Object Technology.
1.4 Inferencing
However, now that we that we have defined a number of foundational concepts we are now better positioned to understand the crowning capability of semantic technology, inferencing. As discussed earlier, inferencing is a logical process that arrives at a conclusion. Another way of stating this is that inferencing generates a super-set of data that is based upon and logically derived from a sub-set of data. Let’s consider how inferencing plays a role in inheritance. In Figure 4 below we see that BusinessUnit is defined as an rdfs:subClassOf Organization, and WidgetMarketingGroup is of rdf:type BusinessUnit. Without the benefit of inferencing there would only be a single asserted triple within the triple store that would look like the below statement.
If we ran a query asking for instances of Organization; we would receive a reply that no instances were found. However, if we invoked inferencing; an Inference Engine, that accompanies the Semantic system, would examine the ontology and apply the superclass rule that climbs the inheritance tree in order to generate a new inferred triple that would look like the below statement.
Thus, after inferencing, the query would reveal that WidgetMarketing is an instance of an Organization, because the inferred triple has been inserted into the triple store as the missing piece of the puzzle.Figure 4 rdfs:subClassOf
1.5 Properties
RDFS also adds extra expressivity to the definition of properties. Properties, as stated above, are classes in their own right, and can support inheritance using rdfs:subPropertyOf. A property can express a relationship between classes as well as describe behaviors and attributes of classes. It does this by defining an rdfs:domain, which is typically an existing class in the ontology; as well as an rdfs:range, which can also be a class or a value. The domain of the property can be visualized as the subject of a statement; and the range of the property can be visualized as the object of the statement.
In the below example in Figure 5, the property ‘providesService’ has a domain of ‘SystemUnit’, and a range of ‘BusinessUnit’. This can be restated as a SystemUnit providesService to a BusinessUnit. This allows us to ask the question, which system units provides services to which business units? We also define the converse property ‘servicedBy’, which has a domain of ‘BusinessUnit’ and a range of ‘SystemUnit’. This allows us to ask the obverse question; which business units are serviced by which system units? In our discussion of OWL, the next language in the stack, we will see how these properties can be further leveraged by inferencing.
Figure 5 RDFS Properties
1.6 Web Ontology Language (OWL)
We now have graduated to OWL the most robust of the semantic languages. As we move up the language stack the ability to take advantage of inferencing grows dramatically. There are many advanced OWL capabilities such as ‘restrictions’, which are powerful but complex language constructs, that are beyond the scope of this article. We will introduce several fundamental capabilities of OWL to demonstrate its special value.
1.6.1 owl:inverseOf
The owl:inverseOf is a language construct that allows us to specify the property that is the opposite or reverse of the property being defined. In Figure 5 above, ‘providesService’ and ‘servicedBy’ are properties that are the inverse of each other. This feature allows us to define properties that will automatically reveal the reverse direction in a set of relationships.
When populating the semantic data store with statements, the system only needs to specify a triple for one of the properties, let’s say ‘providesService’. When inferencing is run, the ‘reasoner’ will automatically insert a second triple that will provide a statement for the ‘servicedBy’ property. Once inferencing is completed, we can then ask questions about either side of the inverse relationship, which the system will understand and respond to.
1.6.2 owl:equivalentClass
The owl:equivalentClass comes in handy when merging ontology models or when using the power of semantic technology to perform data mapping between different sets of disparate data elements in order to translate into a common domain model. In the example below, this statement allows us to equate all instances of Message with all instances of ServiceEvent, which was defined in some-other ontology file that was merged with ours. Thus we can then infer that a ServiceEvent is also a subclass of Activity.
Another use case for owl:equivalentClass can be described by a Web Service application that allows clients to submit messages containing data elements that do not conform to the standard message interface. The Web Service application would use the ontology to map the non-standard data elements to the standard domain model. The use of owl:equivalentClass provides the mapping capability that is needed to correlate the non-standard data element name to the standard data element name so that the application can understand and process the non-standard data element.
1.6.3 owl:TransitiveProperty
The owl:TransitiveProperty is perhaps one of the more robust language constructs in OWL. This is conceptually equivalent to the concept of transitivity in mathematics. In essence, if A has a relation to B, and B has a relation to C, then A has a relation to C. Transitivity therefore allows us to traverse a chain of relationships from one resource to another. Some examples of utilizing transitivity include being able to travel up an ancestry tree as well as being able to determine impact analysis where there are complex dependency relationships across a system.
The below example shows that the transitive property ‘invokes’ associates one activity to another. By applying a transitivity rule to this statement, we are able to identify how one activity invokes another activity up and down (invokedBy) a chain of dependencies.
1.7 Semantic Information Grid
As statements begin to populate the semantic data store; more and more relationships among data can be mined and discovered by using the power of properties and inferencing. In Figure 6 below, each of the classes and properties can be further queried to reveal additional relationships and associations to other resources, forming an ever expanding grid of information and value.
Figure 6 Semantic Grid
2. SPARQL Query Language
No overview of semantic technology languages would be complete without mention of the W3C recommended semantic query language called SPARQL, which stands for Simple Protocol and RDF Query Language. SPARQL is used to define queries that can use vocabulary from an ontology in order to query the contents of a triple store. In SPARQL the query parameters are defined as sets of RDF graph patterns. SPARQL has a remote resemblance to SQL; but is actually quite unique. A basic example is listed below.
The SPARQL query first obtains the set of all Application Systems that use HTTP. It then obtains all of the organizational SystemUnits that own each of the selected Application Systems. Because SystemUnits are hierarchical; inferencing will be used to display the parent organizations as well.
3. Business Adoption Opportunities
There are many business use cases for which semantic technology may be well suited. Some areas of potential opportunity are listed below for further investigation.
1. Fraud Detection and Risk Management
2. Knowledge Management
3. Asset Management (particularly Configuration Management Database)
4. Impact Analysis
5. Customer Integration
6. Advanced Search Capabilities
7. Records Management Indexing
8. Business Intelligence
9. Service Oriented Architecture
10. Product Rules Catalog
4. Recommendations
It is evident from industry analysis that semantic technology is no longer an embryonic technology. It is evolving rapidly; with new, as well as some mature vendors entering the space. The sooner we understand and evaluate it; the sooner we can begin to leverage its strengths for competitive advantage.
5. References
D. Allemang and J. Hendler, Semantic Web for the Working Ontologist, Burlington, MA, Morgan Kaufmann, 2008
G. Antoniou and F. van Harmelen, A Semantic Web Primer, 2nd Edition, Cambridge, Massachusets, The MIT Press, 2008
M. Rebstock, J. Fengel, H. Paulheim, Ontologies-Based Business Integration, Berlin, Springer-Verlag, 2008
Note: The diagrams displayed in this article were generated using TopBraid Maestro from TopQuadrant, Inc.