Introduction
XML was a development of the document management community: a way to put all their hard-won architectures on the wide, enticing Web, but when it burst on to the scene, the headlines proclaimed a new king of general-purpose data formats. Programmers pounced on XML to replace their frequent and expensive invention of one-off data formats, and the specialized parsers that handled these formats. Web architects seized XML as a suitable way to define content so that presentation changes could be made cleanly and easily. Database management experts adopted XML as a way to exchange data sets between systems as part of integration tasks. Supply-chain and business interchange professionals embraced XML as an inexpensive and flexible way to encode electronic transactions. When so many disparate disciplines find themselves around the same table, something special is bound to happen.
XML itself is not very special. It represents some refinement, some insight, and many important tradeoffs, but precious little innovation. It has, however, become one of the most important new developments in information systems in part because of the fact that so many groups have come to work with XML, but also because it has focused people's attention to important new ways of thinking about application development.
In this five-part series I shall look at some of this new thinking. Primarily, I shall be discussing where and how XML is best used by application programmers in order to solve implementation issues, but I shall also discuss how XML can help fundamentally change how applications are developed: a change for the better.
Before reading this article, you should be familiar with XML basics, and the basics of application development, regardless of language.
In the Beginning
Computers were designed to compute projectile ballistics in military applications. Surely, rapid computation of the best way to direct ordnance brought about many applications in defense that were not possible before. Similarly, computers have enabled novel capabilities in medicine, transportation, mineral prospecting, large-scale finance, and complex manufacturing, to name a few areas. This is all well and good, except that with the advent of the personal computer, the Internet, and the spread of communications capabilities, the overwhelming scopes of common computer applications have become more mundane in scope. Rather than beating at complex equations until results fall out, most of today's everyday computation involves the expression, exchange, constraint and classification of descriptions and messages.
This is not a problem in itself, but the real problem is that most of these latter computer applications are still designed and developed as if they were beating on numbers and equations. Developers squint mightily at every detail of programming logic, seeking as much correctness as possible, and then they casually toss together the shape of the data that the programs will manipulate. Then the whimsy of human beings steps in during implementation and maintenance, and all the requirements on which the careful modeling was based turn out to be inaccurate or obsolete. Because so much of the effort has been invested in sophisticated program logic design, the result is the
notorious expense of developing software, which has only increased as the power of computers has increased.
Even worse, as real-world changes require applications to be scrapped and new ones to be built in their place (a very common occurrence, as we all know), the fact that the data was a second thought becomes a burning liability because it is, in fact, only the data that we need to maintain from the obsolete applications. The so-called legacy problem is almost always a matter of needing access to data that is tightly coupled to an earlier generation of applications. The reason the demand for COBOL programmers is still strong decades after the language's heyday is that there is so much critical data sitting in COBOL applications that are themselves far past their best-before date.
New Solutions for Age-Old Issues
The two pillars of modern N-tier application are object-oriented development and relational database technology. These are of proven aptitude for dealing with well-structured data: data that can be effectively analyzed to determine its long-term shape and role. Unfortunately the vast majority of real-world data does not fit into such neat, predictable packages. Messages, reports, memos, profiles, catalogs and other data primarily designed for human consumption tend to contain most of the important information required by applications, but they are not amenable to relational or object-oriented modeling. The promise and initial success of XML is centered around its ability to handle such data, known as semi-structured data. Because of the convergence of disciplines around XML's table, it has earned an impressive array of tools for integration with more traditional, structured data processing.
In the modern N-tier system, there is a data store, a set of middle tiers which marshal data and apply business logic, and a presentation layer, which puts up the user interface. The presentation layer has taken on additional sophistication with the idea of re-purposing content for, say Wireless interfaces or in business-to-business transactions. Figure 1 illustrates this system.
In the data store, XML takes on its most basic capability: to render semi-structured data in a natural way. Traditionally, people have just tossed things they couldn't easily come up with a structure for into a character large object (CLOB). This is unfortunate, because it means that a lot of the context and metadata that is important to making sense of the data is lost, unless painstakingly reconstructed in the middle tier. By using native XML data repositories alongside RDBMS systems, or just using XML extensions for RDBMS systems, one can keep the well-structured data (financials, operational tables, etc.) in its natural habitat, and the same for the semi-structured data.
Meanwhile, the relationships and key metadata in the XML data can be managed using advanced XML technologies such as Resource Description Framework (RDF) and XLink to off-load a great deal of responsibility from the overworked middle tier.
The middle tier will probably have to do some manipulation of the XML stored in the back end, and so technologies such as the Document Object Model (DOM)
and Simple API for XML (SAX) have emerged to fit XML processing naturally into traditional programming languages. Although XSLT, a specialized language for processing XML, is, so far, most often used in the presentation layer to convert XML to HTML, it can also be used for a wide variety of processing tasks, often with far more facility than SAX or DOM through traditional languages. There is often a requirement for distribution of processing in the middle tier, which is one place where SOAP, an XML format for exchanging messages over Internet protocols, would be a great fit: both for structured and for semi-structured data.
The presentation tier combines the hard-core number and pattern crunching that suits traditional tools so well with the descriptions and messages that are part of any system where human processing still matters. Again a technology such as XSLT allows us to generate HTML, which is the king of current user interfaces because of the success of the Web. XSLT can also make transforms to other XML formats for which the next generation of browsers is emerging, and even to print-ready forms such as PDF and TeX.
The costs involved in sprinkling XML technologies throughout the tiers in this way is the training and integration work that is required. The benefit is tremendous productivity gain that naturally comes with using the right tools fo the right job. And in fact, in a good number of cases, it would be quite sufficient to use only the XML technologies across tiers, and not have to use traditional application development tools at all.
Interestingly, most desktop Intel architecture-based computers, thanks to good old Gordon Moore's law, have far more computing power than the EDSACs and ENIACs that made life so miserable for enemy soldiers on the front. Anyone who ever played the classic game Scorched Earth in fury over the computer player's ability to factor in wind and terrain and land the nuke right on your tank. Intel computers also come with powerful memory (and hence string) processing primitives that make description and messaging applications zip, and there is a lot of power at our grasp to lower the cost of software development and deployment once we change the way we think about computer application development. In this matter, XML is proving itself the most powerful change agent to have come along.
Conclusion
It must be said that much of this is still quite controversial. Object-oriented experts have long insisted that their methodology itself provides a great deal of flexibility and extensibility, of code and data. But it seems that this claim doesn't bear practical scrutiny. One of the most important fruits of the purported flexibility of object-oriented systems, code reuse, has been notoriously hard to achieve even in organizations that have emphatically adopted the best accepted practices of object-oriented development. Certainly developers have come to understand that object-oriented systems require great effort and care before even original system requirements can be successfully met, never mind change orders.
And there are also no end of relational purists, Fabian Pascal being one of the more pitched voices in this crowd, who feel that the problem with modern software development is not that we have failed to transcend relational model limitations in search
of something more flexible, but that we have failed to apply the relational calculus itself in all its purity (they, for instance, claim thgat SQL is a pale shadow of relational form). The problem is that the astonishing hacks that relational folks advocate for shoe-horning semi-structured data into tidy entity-relational diagrams are just too much for contemplation.
XML technologies are still taking shape, but they have already had a profound effect on the software development environment. Developers are discovering bit by bit that separating the management of descriptions from the processing of more complex computations can help speed development, just as they discovered earlier that the separation of presentation logic from core application logic improves flexibility. As the mass of XML tools and techniques continue to grow, this process will continue to grow. Some systems will be little more than well-designed XML data with some XSLT to provide a Web interface. Some systems will require a strong partnership between XML technologies, relational systems, and object-oriented programming languages. In all cases, some of the difficulties over maintaining and adapting application should be mitigated by adoption of XML technologies.
In the remainder of this series, I shall examine some of these XML technologies in detail, illustrating exactly where and how they fit into modern application development practice.
About the Author

Uche Ogbuji is a Computer Engineer, co-founder and CEO of Fourthought, Inc., a software vendor and consultancy specializing in open, standards-based XML solutions, especially as applicable to problems of knowledge management. He has worked with XML for several years, co-developing 4Suite*, an open-source platform for XML processing. He writes many articles and speaks at many conferences on the practical use of XML. A Nigerian immigrant, Mr. Ogbuji currently resides and works in lovely Boulder, Colorado.
Read other articles in this series:
Part 2,
Part 3,
Part 4,
Part 5