One Part Code, One Part XML, Two Parts Knowledge

Submit New Article

Last Modified On :   September 17, 2008 4:44 PM PDT
Rate
 


Introduction

In previous articles I have applauded XML's ability to provide for diverse data needs. We've seen how to use different XML technologies for application processing of the XML data. There has been a lot of interest recently in systems that can provide access to the many varieties of an organization's data in an integrated manner. Such tools have many names, "information portals," "content management suites," and so forth, but by far the most vague and most intriguing buzzword is "knowledge management."

The mechanics of such systems are designed to provide unified access to the menagerie of data we manage: schedules, user profiles, reports, documents, memos, messages, operational records, financial records, and so forth. The theory is that if we have all this data at hand, cross-referenced, with sophisticated search and retrieval tools, we should be able to capture a bit of our brain function in our information systems. Regardless of your enthusiasm or skepticism for the idea, we probably all recognize the value of having tools to bind various information resources together.

This article discusses a key companion technology to XML toward the goal of knowledge management: the Resource Description Framework (RDF). One of the most successful applications of RDF is RDF Site Summary (RSS), which we'll use as an example.

RDF 101

One of the goals of knowledge management is to organize and represent the concepts people hold in their mind. I would argue that these different concepts are a major cause of the divergence in applications, and a large part of the reason why software integration is such a difficult challenge. One of the most straightforward things that can be done to address this problem is to find universal identifiers for concepts, so that we can explicitly and unambiguously express relationships between them.

So for instance, Mark in marketing might think of a customer as any person or organization who has purchased or is soon likely to purchase a product from the company. Julie in customer relations, however, expects that someone must have purchased a product before being considered a customer. If asked to provide requirements for a customer profile application, they inevitably have fundamental differences in their approaches. These divergent concepts, interacting in the matrix of other divergent issues, cause all sorts of confusion in software and elsewhere.

If we can identify these separate concepts, we have a better chance of avoiding ambiguity than if we just use the term "customer." We could then make a statement that Julie's "customer" equals Mark's "customer" plus Mark's concept of "prospect." Creating universal identifiers can be a surprisingly difficult endeavor, but luckily the Web has introduced a universal identifier with which we've all become familiar: the URL. Recognizing the potential, the IETF, the organization behind HTTP and URLs, created a superset for URLs known as Uniform Resource Identifiers (URIs). The "U" is also considered to mean "Universal" in many quarters (including key World Wide Web Consortium (W3C) documents) because of its typical use. URIs provide an extensible means of identifying resources. Resources can be any concept we might want to identify. One could e ven identify Mark and Julie's concepts of "customer" using URIs.

With this building block in place, the next step is to express relationships between URIs, as well as other properties we might want to ascribe to these URIs. The W3C has developed just such a facility: RDF (Resource Description Framework). RDF is a system for expressing statements between URIs. For instance, to go back to our memo example from a previous article, we might want to state that Uche Ogbuji is the author of the article. If the company owns the domain UsuraHouse.com, it might reserve the URL base http://usurahouse.com/internal/memos/ for identifying memos. This has the advantage that the company controls this network location. Let us say the memo itself is stored as 761523.xml. Its URL (and thus URI) would then be http://usurahouse.com/internal/memos/761523.xml. RDF would then allow us to make the statement as follows:

(http://usurahouse.com/internal/memos/761523.xml , http://usurahouse.com/kmprops/author , mailto:uogbuji@usurahouse.com )

The memo is the subject of the statement, and the predicate of the statement is a property specially designed for Usura House memos. Properties in RDF are also URIs and not merely names (as in object-oriented systems) to avoid the confusing terminology that RDF is designed to address. The object of the statement is another URI: an e-mail box in URL form, which is a common way to give people identifiers (in reality, however, they tend to identify whoever can gain access to the mailbox, not a person).

And that is RDF in a nutshell. Everything else is a matter of detail.

Elements of RDF Statements

RDF statement subjects and predicates are always URIs, but objects can be strings or other literals, which is how we can express properties on resources as well as relationships between resources. For instance, we might specify the title of the memo in RDF as follows:

(http://usurahouse.com/internal/memos/761523.xml , http://usurahouse.com/kmprops/author , "With Usura Hath No Man a House of Good Stone" )

The basis of RDF is this chained set of statements, but as a companion technology to XML, it has an XML representation. Listing 1 shows how information about the memo might look in this representation.

Listing 1

<?xml version="1.0"?> 

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:u="http://usurahouse.com/kmprops/"

>

<u:Memo

rdf:about="http://usurahouse.com/internal/memos/761523.xml">



<u:title>With Usura Hath No Man a House of Good

Stone</u:title>

<u:author>

<rdf:Description

rdf:about="mailto:uogbuji@usurahouse.com">

<u:fullname>Uche Ogbuji</u:fullname>

</rdf:Description>

</u:author>

</u:Memo>

</rdf:RDF>

 

The top-level element (rdf:RDF) helps set RDF statements apart from regular XML to aid the RDF parser. It also defines key namespaces. RDF uses XML namespaces as much as XSLT does, but in slightly different ways. The u:Memo element declares a resource, giving its URI with the rdf:about attribute, and giving its type as the concatenation between the namespace URI attached to the prefix u and the local name Memo.

The elements within define properties for the resource. The first one makes the title statement exactly as shown in the previous example. The second element makes the author statement, and uses another resource declaration as the object. Again the resource is given by the rdf:about attribute, but rdf:Description is a special RDF element for declaring resources without necessarily considering them to be of a particular type. Finally, we use a property element, u:fullname, to attach a property to this resource. In this simple way, we've chained two statements: "The author of the memo is identified by this e-mail address, and his full name is Uche Ogbuji".

The basic data model for RDF is a graph, and Figure 1 illustrates the graph equivalent to Listing 1.

 

Figure 1. Graph Equivalent of Listing 1 (click for larger image)

RDF in Action: RSS

RSS is a popular usage of RDF. It is a lightweight content syndication format, meaning a way to communicate news flashes, Web site updates, event calendars, software updates, featured content collections, items on Web-based auctions, and so forth, to others who might be interested in the content on a given Web site.

RSS is often used in portals to receive summaries of content items for linking. It was originally created by Netscape to power the now-defunct Netcenter* portal. There is a version of the technology, RSS 0.9, that does not use RDF. However, RDF 1.0 is the more successful variant, and is used by a huge number of sites, including Slashdot.org*, Freshmeat.net*, XML.com*, the Motley Fool*, O'Reillynet.com* (Meerkat), Wired News*, and Linux Today*. Listing 2 is an example of an RSS document.

Listing 2

<?xml version="1.0"?> 

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns="http://purl.org/rss/1.0/" >

<channel

rdf:about="http://www.xml.com/xml/news.rss">

<title>XML.com</title>

<link>http://xml.com/pub</link>

<description>

XML.com features a rich mix of information and services

for the XML community.

</description>

<image

rdf:resource="http://xml.com/universal/images/xml_tiny.gif"

/&g t;

<items>

<rdf:Seq>

<rdf:li

resource="http://www.xml.com/pub/a/2000/10/11/rdf/"/>

</rdf:Seq>

</items>

</channel>

<image

rdf:about="http://xml.com/universal/images/xml_tiny.gif">



<title>XML.com</title>

<link>http://www.xml.com</link>

<url>http://xml.com/universal/images/xml_tiny.gif</url>



</image>

<item

rdf:about="http://www.xml.com/pub/a/2000/10/11/rdf/">

<title>4RDF: A Library for Web

Metadata</title>

<link>http://www.xml.com/pub/a/2000/10/11/rdf/</link>



<description>

One of the jewels in the crown of Python's XML support is

the

4Suite collection of libraries, the most recent addition to

which

is 4RDF, a library for the parsing, querying, and storage of

RDF.

</description>

</item>

</rdf:RDF>


RSS organizes content into channels. For example, all content updates from a given site, such as 4Suite.org*, might be a channel. Some of the things that can be associated with a channel include:

  • title: the identifying phrase for the content source, for instance "4Suite.org: the community site for open-source XML tools"
  • image: perhaps you're running a portal and you want to display the Slashdot logo beside all the headlines from that site
  • description: a friendly description of the overall channel
  • link: a URL to the actual source providing the RSS feed
  • items: individual descriptions of separate content of interest from the channel, such as a story, an event, a news clip, a software release, and so forth.

The items are listed using a special RDF container, rdf:Seq (for "sequence") that can be used to declare an ordered list of resources.

The item is the core unit of content in RSS. Items usually provide a summary of some wider Web-accessible content of interest. Items are defined by properties such as:
  • title: an identifying phrase for the content item, for instance "4Suite 0.11.1 released"
  • description: a friendly summary/description of the content referred to
  • link: a URL to the original content item referred to. This link is usually also used as the identifier for the content item.

Conclusion

RDF is an important tool for expressing properties and relationships among all sorts of data, including data expressed in XML. There are many advanced features of RDF that enable an impressive variety of uses. The major benefit of RDF is that it is very extensible, as is XML. For instance, you can define a customized set of properties for RSS to express exactly what they want, as we saw with the Usura House example. XML and RDF are an excellent pairing for building applications that work toward knowled ge management.

About the Author

Uche Ogbuji is a Computer Engineer, co-founder and CEO of Fourthought, Inc.*, a software vendor and consultancy specializing in open, standards-based XML solutions, especially as applicable to problems of knowledge management. He has worked with XML for several years, co-developing 4Suite*, an open-source platform for XML processing. He writes many articles and speaks at many conferences on the practical use of XML. A Nigerian immigrant, Mr. Ogbuji currently resides and works in lovely Boulder, Colorado.


Read other articles in this series: Part 1, Part 2, Part 3, Part 4