SPARQL by Example – Part I • Q & A with Lee Feigenbaum

Thanks to everyone who attended the SPARQL By Example Web cast or who has watched the archived recording of it. There was a tremendous level of enthusiasm during the one hour presentation, and as a result we did not have the chance to answer all of the excellent questions that participants submitted. Below, I’ve tried to summarize most of the unanswered (and some of the answered) questioons and provided some explanations and pointers to further information. Also, please note that due to the popularity of the first session, we’ll be holding a continuation Web cast on Thursday, January 22, at 1:00 PM EST / 10:00AM PST. During that Web cast, we’ll continue our example-driven look at some of the more advanced features of SPARQL. I hope you can join us then!

Lee Feigenbaum

About SPARQL Endpoints

We had several questions about SPARQL endpoints. A SPARQL endpoint is any URL on the Web that implements the SPARQL protocol. Generally speaking, this means that if the URL is a SPARQL endpoint, then we can send queries to it by issuing requests to a URL that looks like: Note that the query itself is passed to the endpoint as a URL-encoded string.

The SPARQL protocol is defined as an abstract interface that can be implemented over HTTP GET, HTTP POST, or SOAP. (The above example would work for a SPARQL endpoint that implements the protocol over HTTP GET.) An endpoint will normally return the results of a SPARQL query using the SPARQL Query Results XML Format, a simple XML format for returning a table of variables and their values that satisfy a query. Many SPARQL endpoints also support other return formats via content negotiation, such as a JSON result format or various RDF serializations.

In the tutorial, we ran our queries by going to a Web page and pasting the queries into a form. Those Web forms are not themselves SPARQL endpoints, but when we submit the forms the queries are being submitted to SPARQL endpoints. Many public SPARQL endpoints provide this type of human-friendly form for designing, developing, and debugging SPARQL queries.

In the tutorial, we also saw two types of SPARQL endpoints in action. When we ran queries against Tim Berners-Lee’s FOAF file, we used a generic SPARQL endpoint. This type of endpoint sits somewhere on the Web and goes out to retrieve RDF data from elsewhere on the Web to run a query. Because a generic SPARQL endpoint will query against arbitrary RDF data, we must specify the URL of the graph (or graphs) to run the query against. We do this either using the input boxe provided on the human-friendly forms, or using the SPARQL FROM clause. We also saw specific SPARQL endpoints such as DBPedia and DBTune. These endpoints are hardwired to query against a fixed dataset. Because a specific SPARQL endpoint will always query against the same data, we do not need to use the FROM clause when writing queries for these endpoints.

SPARQL and Reasoning

A few participants asked questions about the interaction between SPARQL and reasoning. In other words, for example, when I write a SPARQL query to search for all mammals, will I receive results for human beings that are not also explicitly typed as mammals? The short answer is that while some SPARQL implementations do inform their results via RDFS or OWL reasoning, many do not. The SPARQL standard does not require that query results take any reasoning into account.

For a more detailed answer, please see these two answers in the SPARQL FAQ.

Learning About an RDF Dataset

An insightful question cropped up a few times during the Web cast: How do we know what type of data lurks behind a SPARQL endpoint? How do we know what predicates (relationships) exist to be queried for? How do we know what types (classes, the objects of an rdf:type predicate) exist?

In many cases, we know via an out-of-band source. Perhaps a SPARQL endpoint also publishes documentation of their dataset, along the lines of the music ontology used by the dataset we looked at. Other datasets build on well-known vocabularies, such as the core RDF and RDFS terms, or the common FOAF and Dublin Core vocabularies. And still other times we find ourselves writing SPARQL queries to access datasets that we (or our software applications) have created ourselves, and therefore we simply know what we want to query for with SPARQL. These out-of-band scenarios are really no different from how we know what databases, tables, and columns to query for when constructing an SQL query.

On the other hand, a significant part of the appeal of the Semantic Web in general, and of SPARQL in particular, is the ability to start with nothing but a SPARQL endpoint and to dive in and learn about the data lurking behind the endpoint. The basic mechanism by which we can do this is by writing queries that use variables to find all of the predicates and all of the types that exist in a dataset, and then to pick out interesting predicates and types and use open-ended queries to explore the structure of the data. Dean Allemang has written a blog post on this exact subject, so I’ll gladly reference his writing on using SPARQL to explore an unknown dataset.

SPARQL Language / Features

A few quick hits here to address some lingering questions:

  • SPARQL FROM clauses do not have a JOIN construct the way SQL queries do. This is because the graph model over which SPARQL queries naturally joins data together. That is, what would be a SQL inner join is expressed implicitly in SPARQL simply by including two triple patterns that reference a common variable (such as ?known in one of our early examples). In fact, the ease with which joins are written in SPARQL is one reason that SPARQL is particularly well-suited to writing queries that bring together data from multiple sources.
  • SPARQL contains the UNION keyword for “OR”ing together multiple triple patterns. The presentation includes an example of this in action.
  • The SPARQL OPTIONAL keyword is the equivalent of a SQL outer join.
  • One of the built-in SPARQL filter functions performs regular expressing matching. We could use that to limit results to just those with email addresses by adding: FILTER(regex(?email, "@w3\.org")) to our query.
  • The a keyword in SPARQL is an abbreviation for the common predicate rdf:type that relates a resource to its semantic type/class.

I’m sure there are other questions that I have not managed to address here. Please drop me a line with any other questions. You can also check out the SPARQL FAQ that I maintain. Thanks!