Put Your Desktop in the Cloud to Support the Open Government Directive and Data.gov/semantic

Disclaimer:  This article does not reflect the views of the U.S. Environmental Protection Agency and does not constitute endorsement by the EPA of the standards or products mentioned.

A Semantic Cloud Computing Desktop/Mobile Apps with Linked Open Data consists of the following:

  1. A database of “things” referenced by URL’s (e.g. Twitter);
  2. A free Wiki (Deki Express) that was a “fork” from MediaWiki that evolved to a platform (web-services with a wiki interface) that further evolved to a Cloud Computing Internet Operating System Desktop; and
  3. A semantic publishing environment that supports use on Mobile Apps (e.g., iPhone, iPad) and Linked Open Data through MindTouch Extensions (e.g., App Catalog and Deki Mobile), conversion of the MySQL database to an RDF triple-store (e.g., DBpedia), and use with spreadsheet tools (e.g., Cambridge Semantics, Extentech Sheetster). 

Now that Google and other search engines are reorienting rankings to favor inclusion of semantics and RDFa, this becomes a very strong argument for Linked Open Data for the government.

This paper describes the overall use case submitted to the Federal Cloud Computing Advisory Committee and three progressive uses cases for developing applications. This paper recommends continued work on actionable data publishing (e.g. data catalogs using RDF) of EPA and US federal government data with context, provenance, and quality information. This paper is part of the author’s Open Government Directive Plan (see http://semanticommunity.net).

Introduction

The UK Government recently unveiled the open data portal, http://data.gov.uk, containing hundreds of datasets from across all areas of government. Several of these were selected for conversion to RDF using a combination of manual analysis and automated extraction. The resulting triples are published using Linked Data standards and are queryable using SPARQL services provided by the Talis Platform (1).

The Norwegian SERES (Semantic Back Office Solution) supports their agencies wish for a common approach for metadata modeling, on the way to a national metadata registry across agencies. Several agencies have shown interest in using SERES to model their internal vocabularies. SERES is based on a MOF metamodel and can be exported in various formats including RDF/OWL (2).

RPI reports on bringing together Web 2.0 and Linked Data to create “social data networks” in which communities of citizens will be able to interact to jointly solve societal problems, the use of Semantic Web technologies (RDF, RDFS, SPARQL and RDFa) to facilitate the development of open government applications, and a semantic wiki that supports government “subject matter experts” in helping Web developers, and eventually the public, to better understand the meaning of the data (3).

Model Driven Solutions suggests that applying linked open data (LOD) to architectural information provides a mechanism to support open government while improving inter- and intra- government collaboration and data sharing (4).

This discussion reports on efforts that parallel all of the above as follows: A Data.gov/semantic that provides an ontology that can be exported to RDF/OWL in multiple ways; a metadata registry across a U.S. government agency and the U.S. government; a bringing together of Web 2.0 and Web 3.0; and a use case that applies linked open data to architectural information.

Concept and Context

Tim Berners-Lee has outlined four principles about Linking Open Data (5) paraphrased as follows:

      •  Use URIs (like URLs) to identify things.

      •  Use HTTP URIs so that these things can be referred to and looked up (“dereference”) by people and user agents.

      •  Provide useful information (i.e., a structured description — metadata) about the thing when its URI is dereferenced.

      •  Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

The purpose of Data.gov (6) is to increase public access to high value, machine readable datasets generated by the executive branch of the federal government. The purpose of our Data.gov/semantic (7) is to implement the principles of Linking Open Data to Data.gov. U.S. Federal CIO Vivek Kundra has launched a major Computing Initiative (8), addressed the economic benefits of cloud computing at a recent forum (9), and encouraged submission of use cases to the Federal Cloud Computing Advisory Committee which the author has done (10).

The highlights of this use case are as follows:

Business Need and Cloud Services Delivery and Deployment Model: This solution meets all five essential characteristics, three service models, and four deployment models in the NIST definition (11).

Description: A web-services platform with a Wiki interface using a Web-oriented Architecture (WoA) implemented in open source software provided by MindTouch using the Amazon Cloud that supports statelessness, low coupling, modularity, and semantic interoperability (12).

Life Cycle Phase: All phases handled by MindTouch/Amazon with users providing suggestions to the open source development community.

Cost Savings/Avoidance: This was done at no cost because the piloting was started two years ago when the Deki Wiki Cloud Platform was free. See current MindTouch Cloud offering (13).

Qualitative Benefits: This allowed completion of the Open Government Directive requirements well before the deadlines and for the production of a Data.gov/semantic example.

Lessons Learned: I would like to see us pilot having government employees “put their desktop in the cloud” as not only a way to save infrastructure costs and increase collaboration, but also a way to preserve the artifacts of their career so when they retire the people have a record (14)!

Use Cases

The author has previously (15) outlined three principal use cases as follows:

1. Start with the Data

2. Continue with Data Architecture, Modeling, and Flows

3. Finish with Ontology-driven Systems Engineering

1. Start with the Data:

      •  Expose the data and the metadata.

      •  Re-purpose with well-defined URLs and structure.

      •  Convert to RDF and publish to the Linked Open Data Cloud (in process).

      •  Example: EPA Rulemaking Gateway (16).

Convert to RDF and publish to the Linked Open Data Cloud (in process):

Three ways:

1.  MindTouch Extensions (RDFa).

          •  See MindTouch Application Architecture (12).

2.  Conversion of the MySQL database to an RDF triple-store (e.g. DBpedia) (17)

3.  Use with spreadsheet tools (e,g. Cambridge Semantics (18) and Sheetster (19).

2. Continue with Data Architecture, Modeling, and Flows:

      •  Do a SCOPE-type assessment (20).

      •  Use Visio to capture the metadata and relationships.

      •  Convert to RDF and publish to the Linked Open Data Cloud (in process).

      •  Example: EPA Climate Change Architecture Workshop, February 24, 2010 (21).

Do a SCOPE-type assessment:

      •  Interviews (42 so far – potentially 120) of Subject Matter Experts and Program Leaders.

      •  Mapped Activities on EPA Organizational Chart.

      •  Summarized in Visio Diagrams.

      •  Outputs in Excel for All Objects in Diagrams.

Pilot Open Data Registry (22) to publish to RDF and Linked Open Data Cloud:

      •  Open Data Registry will play a role similar to that of VeriSign, launching a new top-level domain .data in 2011 and providing a DNS-like service for authoritative lookup of URIs from across the Semantic Web. Access to data will always be free – whether sought by consumers, enterprises or software applications.

      •  In mid-2010, Open Data Registry will also be launching a website dedicated to fostering and supporting an innovation commons for developing open standards for the Green Economy. Government agencies, international standards bodies and industry associations are invited to publish their standards in open data formats (e.g. RDF, OWL) building upon the Internet Product Code’s extensible schema and upper-level ontology, which will be open source licensed (GNU GPL v3).

      •  The innovation commons will also offer community members a large pool of open source Web 2.0 collaboration tools, software applications, online training, and resources for remixing standards metamodels, unit process models, life cycle assessment methodologies and other green accounting methods.

See Convergence of Semantic Naming and Identification Technologies, April 27, 2006 (23).

 3. Finish with Ontology-driven Systems Engineering:

      •  SIRI (24) points the way to the next level of semantic web apps in which the semantic web becomes more than just a “web of data,” but also a “web of services” with intelligent UIs. Today, people are discovering the value of linked (open) data. That is, connecting information across the web creates new value and is more than simple aggregation. But, what SIRI demonstrates is that we can also talk about the value of linked (open) services that put this information to work for people. You are no longer merely searching or browsing, no longer just retrieving information; rather, you’re solving human problems. That’s using the web in a whole new way. Now, you might ask, are semantic web standards quite ready for this new breed of applications? If not, then it’s probably a good time to update them so we can have “semantic APIs” to make building them easier (Mills Davis).

      •  The SIRI Team (25) architected and implemented this immensely complex mash-up of, practically, all the state-of-the-art technologies associated with human-computer interaction, ontology engineering, semantics, speech recognition, service-oriented computing, cloud infrastructure, … etc. and is easily one of the most compelling examples of the possibilities and promise that ontology-driven system engineering will bring us in the future (Peter Yim).

Next Steps

The US Office of Management and Budget wants agencies need to move towards “self-actuating data sets” (e.g. RDF) (26).

The W3C egov Government Linked Demo Projects (27) have asked us to:

•  Suggest catalogs and datasets that are candidates for DCAT (28) and OPM (29) and other vocabularies and suggesting desired linkages and how to leverage VOID (30).

•  Propose existing published open gov data for interlinking and any required means of achieving that.

•  Determine what work is demonstrative of principles or best practices relevant to open gov linked data and collateral authors/owners.

•  For all, describe a ‘user story’ exercising these that can be used as an input to communicate the work of this project.

In response, I have suggested the following demos:

•  Google for: epaontology wetlands

•  Search within http://epaontology.wik.is/ for: wetlands

•  Search at http://www.sdi.gov for: wetlands

reporting of user stories at:

•  Linked Data on the Web (LDOW2010), Raleigh, NC, April 27, 2010 (31).

•  The Third International Provenance and Annotation Workshop, Troy, NY, June 15-16, 2010 (32).

and continuing to do actionable data publishing of EPA and US federal government data with context, provenance, and quality information in MRS’s: Microsites, resource directories, searchable collections (33).

Acknowledgements

The author acknowledges gratefully Dean Allemang, Cory Casanave, Mills Davis, Li Ding, David Eng, Lee Feigenbaum, Aaron Fulkerson, Jim Hendler, Ralph Hodgson, Kevin Kirby, Kevin Jackson, Bob Marcus, John McMahon, Richard Murphy, Brand Niemann, Jr., Barry Nussbaum, Tony Shaw, Jeff Stein, George Strawn, George Thomas, and Pete Tseronis.

References

1. http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&proposalid=2923

2. http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&proposalid=2984

3. http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&proposalid=2950

4. http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&proposalid=2785

5. http://en.wikipedia.org/wiki/Linking_Open_Data

6. http://data.gov

7. http://epaontology.wik.is and http://federaldata.wik.is EPA’s 2008 Report on the Environment as a Data.gov/semantic Product: The best EPA content in the best cloud platform for semantic data publishing that provides the context, provenance, and quality for each environmental indicator and its data set (s). Census Bureau’s Annual Statistical Abstract as a Data.gov/semantic Product: The best federal government content in the best cloud platform for semantic data publishing that provides the context, provenance, and quality for each statistical indicator and its data set (s).

8. http://www.whitehouse.gov/blog/streaming-at-100-in-the-cloud/

9. http://www.brookings.edu/events/2010/0407_cloud_computing.aspx and http://www.slideshare.net/kvjacksn/the-economic-gains-of-cloud-computing

10. http://federalcloudcomputing.wik.is/@api/deki/files/181/=Federal_Cloud_Computing_Use_Case_-_BrandNiemann04122010.doc

11. http://csrc.nist.gov/groups/SNS/cloud-computing/index.html

12. http://www.mindtouch.com/index.php?title=Technology&highlight=technology

13. http://cloud.mindtouch.com/

14. http://blogs.archives.gov/online-public-access/?p=1039&cpage=1#comment-906

15. http://federalcloudcomputing.wik.is/@api/deki/files/176/=BrandNiemann03032010.ppt

16. http://yosemite.epa.gov/opei/RuleGate.nsf/ and http://epaerulemaking.wik.is/Rulemaking_Gateway

17. http://dbpedia.org/About

18. http://www.cambridgesemantics.com/

19. http://extentech.com/estore/product_detail.jsp?product_group_id=230

20. http://networkcentricity.wik.is/NCOIC_SCOPE_Version_1.0

21. http://epaenterprisearchitecture.wik.is/Climate_Change_Architecture_Workshop%2c_February_24%2c_2010

22. http://semanticommunity.wik.is/Best_Practices/Convergence_of_Semantic_Naming_and_Identification_Technologies?

23. Open Data Registry: http://federaldata.wik.is/@api/deki/files/169/=ODR_Overview_Feb-10.pdf

24. http://siri.com/

25. http://siri.com/about/team

26. http://labs.systemone.net/wikipedia3

27. http://www.w3.org/egov/wiki/Projects/GLD_Demo/Meetings/2010-04-09

28. http://vocab.deri.ie/dcat-overview

29. http://www.slideshare.net/SteveHitchcock/keepit-course-3-provenance-and-opm-based-on-slides-by-luc-moreau

30. http://rdfs.org/ns/void/html

31. http://events.linkeddata.org/ldow2010/

32. http://tw.rpi.edu/portal/IPAW2010

33. Jeffrey Levy, Director Web Communications, Office of Public Affairs, Keynote on The Big Picture, EPA Web Working Group Conference, April 13, 2010.

 

Abstract

About the Author(s)


Comments

Update on Data Quality for Linked Open Data

I am schooled as a scientist and statistican in the basic scientific method and the Data Quality Objectives (DQO) process, which is the formal mechanism for implementing the scientific method and identifying important information that needs to be known in order to make decisions based on the outcome of the data collection itself – e.g. was the data collected and handled in such a way to produce the information you need to make a decision and, in this case, were multiple data sets, collected by multiple processes, not all of which you control, done in such a way to make linking (mashups) meaningful for decision making – a tall order. My approach – feeling my way forward – has been to start with say all the high-quality environmental data EPA controls and has had peer-reviewed and metadata created for (the Report on the Environment), use statistical visualization tools (e.g. S-PLUS for Spotfire) to do those controlled mashups for our Statistics Users Group to look at (e.g. see http://epadata.wik.is/@api/deki/files/184/=BrandNiemann04282010.ppt – my presentation was very well received and will be presented at our national meeting next month) and see how they suggest we proceed. I think this needs the support and input from the statistics community of experts to ultimately succeed with decision makers or it will be dismissed as just (but really neat) a semantic technology thing.