HTTP PATCH and Tracking RDF Changes

Last week’s announcement that HTTP PATCH has been adopted as an official verb via RFC 5789 has generated a lot of excitement (and questions). As a summary, the intention of each verb is:

  • POST to create a new resource when the client cannot predict the identity on the origin server (think a new order)
  • PUT to override the definition of a specified resource with what is passed in from the client
  • PATCH to override a portion of a specified resource in a predictable and effectively transactional way (if the entire patch cannot be performed, the server should not do any part of it)

The goal is to convey the intent of the patch more clearly than is possible with the more generic POSTing of modifications. Specific patch diff formats will emerge for modifying plain text, HTML, XML, etc.

The Semantic Web community has been interested in this capability for quite some time, particularly with respect to modification of RDF graphs. SPARQL/Update is a proposed (currently just a W3C Member Submission) language for remotely updating RDF Graphs. It does not directly specify how the requests will be transferred, but the updates themselves look something like this (from the spec):

Add a title and creator relationship to a book: 

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA
{ 
   <http://example/book3> dc:title "A new book" ;
                 dc:creator  "A.N.Other" .
}

Remove any records associated with older books:

PREFIX dc:  <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
DELETE
{ ?book ?p ?v }
WHERE
  { ?book dc:date ?date .
    FILTER ( ?date < "2000-01-01T00:00:00"^^xsd:dateTime )
    ?book ?p ?v
  }

The SPARQL 1.1 Protocol and Uniform HTTP Protocol for Managing RDF Graphs (currently W3C Working Drafts) will map these updates into HTTP verbs to create or modify named graphs. The graphs are named either through their own URI or a graph query parameter.

As an example, an HTTP GET would act as a query into a named graph. An HTTP PUT of an enclosed body of RDF would translate to the equivalent SPARQL/Update (from the spec):

DROP SILENT GRAPH <graph_uri>
CREATE SILENT GRAPH <graph_uri>
INSERT DATA [ INTO <graph_uri> ] 
{ .. RDF payload .. }

As a PUT represents an idempotent overwrite action, it must first drop the existing graph should it exist, create a new one and then dump the specified contents into the new graph.

An HTTP POST should be used to additively extend the RDF graph with the specified body and a DELETE would remove the graph. There is no support yet for a PATCH-level view of RDF graphs.

Barring standardization on this PATCH front, developers have had to create their own implementations.

The Talis platform supports ChangeSets which allow the identification of collections of modifications to apply to an existing store. Here we see the ordered removal of an old title and the addition of a new one:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:cs="http://purl.org/vocab/changeset/schema#">
  <cs:ChangeSet rdf:about="http://example.com/changesets#change">
    <cs:subjectOfChange rdf:resource="http://example.com/res#thing"/>
    <cs:createdDate>2006-01-01T00:00:00Z</cs:createdDate>
    <cs:creatorName>Anne Onymous</cs:creatorName>
    <cs:changeReason>Change of title</cs:changeReason>
    <cs:removal>
      <rdf:Statement>
        <rdf:subject rdf:resource="http://example.com/res#thing"/>
        <rdf:predicate rdf:resource="http://purl.org/dc/elements/1.1/title"/>
        <rdf:object>Original Title</rdf:object>
      </rdf:Statement>
    </cs:removal>
    <cs:addition>
      <rdf:Statement>
        <rdf:subject rdf:resource="http://example.com/res#thing"/>
        <rdf:predicate rdf:resource="http://purl.org/dc/elements/1.1/title"/>
        <rdf:object>New Title</rdf:object>
      </rdf:Statement>
    </cs:addition>
  </cs:ChangeSet>
</rdf:RDF>
ChangeSets themselves can be named and linked to establish a formal ordering and set of dependencies to apply. You will notice that ChangeSets use reification to identify the individual statements to add or remove. Nathan has developed a PATCH-friendly Graph Update Ontology (GUO) that does not require reification and is intended to be used with the SPARQL/Update language. While he did not start from Tim Berners-Lee and Dan Connelly’s DELTA work, he appears to have come across it along the way.
 
From the spec, we see:
 
@prefix guo: <http://webr3.org/owl/guo#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
_:u1 a guo:UpdateInstruction ;
     guo:target_graph <http://webr3.org> ;
     guo:target_subject <http://webr3.org/r/Special_Document> ;
     guo:delete _:d1 ;
     guo:insert _:i2 .
_:d1 dcterms:title "Draft Special Document"@en .
_:i2 dcterms:title "Special Document"@en ;
     dcterms:published "2010-03-18T15:26:13Z" .
     
to identify the deletion of one triple and the addition of two other triples. This could easily be transformed into two SPARQL/Update statements:
 
DELETE DATA FROM <http://webr3.org> {
  <http://webr3.org/r/Special_Document> <http://purl.org/dc/terms/title> "Draft Special Document"@en .
}
INSERT DATA INTO <http://webr3.org> {
  <http://webr3.org/r/Special_Document> <http://purl.org/dc/terms/title> "Draft Special Document"@en ;
  <http://purl.org/dc/terms/published> "2010-03-18T15:26:13Z" .
}
in addition to these simple graph manipulations, Nathan describes the
more complicated scenarios of merging multiple resources in to single
resource:
@prefix guo: <http://webr3.org/owl/guo#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
_:u1 a guo:UpdateInstruction ;
      guo:target_subject <http://example.com/resource/London> ;
      guo:insert <http://dbpedia.org/resource/London>,
                 <http://mpii.de/yago/resource/London>,
                 <http://statistics.data.gov.uk/id/eer?name=London> .
merging multiple named graphs:
@prefix guo: <http://webr3.org/owl/guo#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
_:u1 a guo:UpdateInstruction ;
      guo:merge <http://dbpedia.org/data/Linked_Data>,
                <http://dbpedia.org/data/Semantic_Web> .
and inserting the resulting graph from a SPARQL CONSTRUCT query:
<rdf:RDF xmlns="http://webr3.org/owl/guo#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <UpdateInstruction>
        <insert rdf:resource="http://dbpedia.org/sparql?query=construct+{+%3Fs+%3Fp+%3Fo+}+where+{..."/>
        <target_subject rdf:resource="http://webr3.org/r/linked_to_haxe"/>
    </UpdateInstruction>
</rdf:RDF>
Nathan has also started creating scripts to generate diffs of RDF resources. These connect back to the newly endorsed HTTP PATCH verb by being able to detect these changes and re-apply them elsewhere incrementally and in order. This is useful, particularly in the Linked Data space to track newly-acquired, modified and accumulated knowledge from distributed sources. His current implementation is in PHP and ARQ2. He has made progress in supporting the PATCH verb and indicates his plans to open source his work. There are currently a series of demos available here.
 
For example, generating diffs between two related graphs such as this and this yields this diff result which could be used to bring the original sources in alignment.
 
 
Another fun demo leverages the Memento project to add the concept of time to the state associated with Web resources. For example, a Memento for the DBPedia resource on Tim Berners-Lee and the current version differ as such.
 
While these are not standards yet, they easily interact with the current SPARQL/Update language drafts which soon will be. We can probably expect similar concepts to be adopted by standards bodies in the future, but they also represent an exciting vision into what is possible now.

 

 

 

Comments

HTTP PATCH and Tracking RDF Changes

Brian,

I was rather surprised to see that you are comparing two RDF documents which are XML and generating a text file which needs to be parsed – why not represent the diff in XML?

Perhaps you would find either XSLT or XQuery Update suitable formats for patching XML.

For interest, I put your sample through our diff engine which generates and XML delta, see http://www.deltaxml.com/free/compare/ The diff process was not aware of the semantics of RDF, treating it just as XML. This generic delta could be converted into XSLT or XQuery Update.