My first encounter with semantic web technologies grew out of my desire for a persistence mechanism that modeled graphs with more ease and freedom than relational based models. I was specifically interested in any persistent store that would allow me to create new structures without having to fiddle with the intricacies of a rigid schema. I found my solution in Jena, a Java framework for building Semantic Web applications.
Jena provided me with just the type of storage mechanism I was looking for. During my first attempt at writing an application I began to find Jena’s RDF focused Java api to be flexible yet distant from the OOP (Object Oriented Programming) model I was familiar with. Instead of just setting properties I had to first lookup the property node, create a node representing the data value, then apply that property node and value to a particular individual in the model. My code tended to be composed of java objects along with many lines of boilerplate code to either write to or read from a Jena model. My instincts began to tell me that much of the binding code I was writing could be minimized with the right tool if it existed. And thus began my research into the various tools and techniques for binding java objects to RDF.
At first glance OOP developers will find OWL (Ontology Web Language) familiar. It has classes that inherit from other classes. It has properties that can be related to particular classes using range and domain. It also uses familiar data types in the way of xml schema (integers, strings, dates) which OOP developers might find similar to primitives. And finally, it has individuals that are declared as instances of classes. But beware. The similarities between OWL and OOP are only skin deep.
Along with the similarities come subtle differences that might go unnoticed to the eager OOP aficionado. Notably, multiple inheritance is allowed in OWL/RDF. This obviously poses a problem in some cases if your intention is to mirror an ontology with a one-to-one OWL class to Java class binding. An even more interesting feature that OWL has over OOP (the Java flavor) is that OWL allows class based restrictions to be declared on properties. For example, a Java class representing a car can throw an exception if you add too many passengers; however, an OWL class can declare in a well defined way to all interested parties that it only accepts at most 4 passengers. But by far the most striking difference between the two models is that OWL is property focused while Java is class focused. OWL properties can inherit other properties, Java’s cannot. OWL properties can be applied to disjoint classes while Java’s properties are attached to only one class and its descendents. When I introduce Java developers to OWL I usually tell them that where Java is object focused, OWL is property focused. Any tool that attempts to bind java objects to RDF must take into account these differences. In ORM (Object Relational Mapping) developers have recognized that there is an impedance mismatch between objects and relational data models. Similarly, OOP and OWL have their own impedance mismatch. Tools can ease the pain, but it’s still important for developers to be aware of these differences and code accordingly.
If you are intrigued and would like to investigate the possibilities of expanding your OOP world with semantic technologies there are several open source tools available today. I’ve divided them up into two major categories; Annotation based tools which take advantage of Java’s inherent annotation feature to bind to RDF and Code generators, which generate java code based on a given OWL or RDF schema document.
So(m)mer (https://sommer.dev.java.net/) is a very easy to use library for mapping Plain Old Java Objects (POJOs) to RDF graphs and back. What I like about So(m)mer tool is that it’s bindings to RDF are declared as Java annotations. Here’s a simple example of binding the java class “Agent” to the foaf class by the same name:
@rdf(foaf + "Agent") public class Agent { public static final String foaf = "http://xmlns.com/foaf/0.1/"; @rdf(foaf + "mbox") private Collection<URI> mboxes; … }
Notice that the @rdf annotation is reused both at the class level and on the attributes, keeping things simple. No verbose XML configuration files or complicated mapping languages, just annotations on your java classes. It’s so utterly simple you have to wonder how it all works. The magic is made possible via aspect oriented programming techniques, specifically the Javassist byte code library. Under the hood So(m)mer recognizes that you’ve annotated a class, and ads program logic that binds it to a Sesame RDF model. At the time of writing So(m)mer only supported Sesame (Sesame is another Java platform for working with RDF).
Another tool specifically meant for Sesame is Elmo. Elmo goes beyond simply binding objects to RDF and claims to be a role based Java persistent Bean pool. But at its core it’s a binding tool between Java and RDF and like So(m)mer is fairly simple from a programmer’s point of view. It uses the same technique, Java annotations, as well as the same annotation named @rdf at the class and method level. Additionally, Elmo supports a richer declaration model such as annotations @inverseOf and @disjointWith which apply OWL entailments from the Java object model to an RDF graph.
For those of you who enjoy HP’s Jena api, there is Jenabean (http://code.google.com/p/jenabean/). Jenabean is annotation based as well. The project team is currently working on a JPA (Java Persistence Architecture) styled programming model to further reduce the gap between RDF and the Java community at large. Jenabean uses Java introspection techniques and dynamic proxies to bind Java classes to the Jena RDF api, as well as assisting SPARQL query authors with parameterized queries (similar to Java’s PreparedStatement api). Jenabean only depends on Jena and the JPA api. In other words, it does not utilize any byte code tool sets. To get a feel for how programming with Jenabean might look, here’s a simple example:
EntityManagerFactory factory = Persistence.createEntityManagerFactory ("tws:blank"); EntityManager em = factory.createEntityManager(); MusicGenre jazz = new MusicGenre(); jazz.id = URI.create ("http://example.org/genre/jazz"); jazz.description = "Jazz Music"; em.persist(jazz);
And the declaration of the MusicGenre class:
package test.jpa; @Entity public class MusicGenre { @Id URI id; String description; }
Notice that as annotated it nearly looks identical to any JPA bound java class. The only obvious difference is that our id is a URI, which is rare in normal JPA situations. The resulting RDF looks like this:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://thewebsemantic.com/javaclass> a owl:AnnotationProperty . <http://test.jpa/MusicGenre> a rdfs:Class ; <http://thewebsemantic.com/javaclass> "test.jpa.MusicGenre" . <http://example.org/genre/jazz> a <http://test.jpa/MusicGenre> ; <http://test.jpa/description> "Jazz Music"^^xsd:string .
There are 4 well known tools that fit this category. RDFReactor, Kazuki, and Owl2Java. I believe Elmo has a code generator feature as well. Assuming you are starting with a well written schema, code generators offer the advantage of convenience. The time required to write Java classes by hand and annotate them so that they bind to a very sophisticated OWL ontology could be considerable enough to warrant code generation. I would like to point out some of the difficulties regarding code generators and RDF in the wild. Many of the common vocabularies well known in semantic web circles have vague property specifications. Some common examples are FOAF and Dublin Core. If you delve into either of these you’ll find that many properties have the range of rdfs:Literal. Furthermore, just given machine readable portion of the spec it’s not clear if some relationships are singular or plural. In order to auto generate a java class, it’s very important to know which type to use when binding. For example, in the Dublin Core terms schema, <http://purl.org/dc/terms/created> is declared as having range xsd:Literal. The best a code generator can do is assume the value is a string, however, any human will realize it’s mapping to Java would be better represented by java.util.Date since it represents the date/time when something was created. In cases like that it’s better to hand craft the class and its mappings. If your schema is detailed and specific about property ranges, then a code generator might be tremendously useful and time saving.
Since we’re on the topic of Java and the semantic web I’d like to mention another exciting project, rdf2go (http://rdf2go.semweb4j.org). This project intends to create a common abstraction api between java and triple stores in general, allowing developers to write an application against Jena, and then easily change the underlying RDF engine to Sesame, or vice versa. Hopefully this will gain momentum and perhaps become a JSR specification some day.
Java is a great language to use when coding the semantic web. Considering the vast amount of information stored and accessible via Java app servers it’s clear that the semantic web will benefit by making it as easy as possible for Java developers to provide and consume RDF. To their credit, Java developers have answered the call with a plethora of open source tools to make this happen. One of the most advanced and bleeding edge SPARQL engines, Jena ARQ, is written in Java, as well as other mainstays like Sesame, Mulgara, and the Pellet reasoner. If you’re a java developer don’t hesitate to get started coding the semantic web.