Skip to main content

A new feature to be considered for XML 2.0

Alain Couthures's picture

Preparing for flying to XML Prague 2010, I was thinking about corresponding objectives. Of course, at Prague, I will talk about XForms and XSLTForms but XML deserves more!

Ah! ça ira, ça ira, ça ira… (1790, French Revolution song, title and theme of the refrain inspired by Benjamin Franklin!!). Now, we know that XML 1.0 is a Révolution. And we’re very busy implementing products. Maybe we should keep an eye on Ancien Régime people. They have the money, they have the power. They try to reduce XML growth arguing about the divine superiority of their older technologies. Starting to think about XML 2.0 would prove to others the vivacity of XML.

First of all, I think we should stop to talk of "XML documents" as there are "HTLM pages" for the Web. "XML documents" sounds like word-processing-only. An XML so-called "document" is nothing but a small serialized database (Relational databases are not good for storing word processing documents but it isn’t a constraint for XML). XML is a multipurpose tree structure where all consistent data are in it: yes, it’s a small database by itself with its specific query languages (XPath extended with XQuery in more and more contexts).

OK, let’s try to enrich XML potential. Relational databases aren’t just 2D data tables: there are relations expressed as constraints such as keys resulting into extra columns or extra tables. About XML, there are axes: parent-child axis and siblings axis. Siblings axis is good to naturally provide an ordered list of elements. Parent-child axis is a kind of non-symmetric relation but a child can only have one parent and a parent cannot distinguish its children except from their names or their properties. Whether an element should be a parent of another one or its child might depend on what is to be performed with them: sometimes, we have to choose the most optimized approach.

As I already suggested 2 years ago (http://lists.xml.org/archives/xml-dev/200802/msg00419.html), allowing multiple named parent-child axes would be a simple but major improvement. As Michael Kay answered, it sounds like "reinventing the network (Codasyl) model", adding that "… the strict hierarchical approach of XML is fine for messages and for documents, but it really isn't ideal for persistent data. The poor handling of relationships outside the hierarchy has always been a weakness" (http://lists.xml.org/archives/xml-dev/200802/msg00421.html ).

So it’s one of my contributions to XML community to say it again.

-Alain

Re: A new feature to be considered for XML 2.0

Kurt Cagle's picture

I'd avoid this as a discussion about XML 2.0 - as has been proved more than once, XML has been so heavily integrated at this stage into the world's "operating system" that any discussion concerning a "new" XML will only cause massive outbreaks of hives among IT managers worldwide.

However, I would argue that this is fundamentally a schematic issue, and one that does need to be resolved - quickly. Schema as it exists right now has at best an imperfect notion about external pointers within the scope of ID and IDREF, and in practice there's none. The closest mechanism to providing something similar is schematron, which would effectively define a relational entity between disparate nodes, and even then only assuming very specific Schematron processors (e.g., XSLT2 based ones).

This becomes manifest in such facets as dynamic enumerations and code lists, in which a given node has a binding that is tied directly to terms that are defined within a dynamically retrieved data feed. Arguably this may have been the foundation of RDF and from there to OWL, but XML and OWL have diverged far enough from one another that in practice there's very little effective overlap, even when they use the same syntax (even that's not a given anymore).

I don't know if this will be discussed in Prague, but I think its a worthwhile paper to bring up for Balisage, especially the pre-conference Symposium. We're reaching a stage (have passed it some time ago, for that matter) where this issue needs to be resolved, as I anticipate that future XML development will increasingly be along a distributed axis rather than a centralized one.

Re: A new feature to be considered for XML 2.0

Dominique Rabeuf's picture

Within an XML document the keys and keyrefs identity constraints provide with non hierarchic relationships, but to be valid a keyref cannot refer to element(s) outside the document.

Furthermore, within a database, it is preferable to split large XML documents into a set of smaller documents if you have to make frequent updates.

Something like Extended Schema for multiples documents within a collection seems to be needed.

Note that with Mark Logic a data base is a set of forests holding the documents, with eXist  data base are seing are set of collections. 

So schema extension for multiple documents handling is a real need.

In the absence of recommendations, each supplier will build its own solution and we will have to deal with various implementations as with the many DDL dialects of relationnal models.

But these data bases handle XML and Non-XML documents. In quite natural situations, XML elements may refer to Non-XML resources (eg images files)

Keys and keyrefs features seem to be not well known, perhaps because they cannot be expressed in RelaxNG

Re: A new feature to be considered for XML 2.0

Alain Couthures's picture

I agree that different enhancements can be included in XML 2.0.

This blog entry is focusing on relationships. My point of view is that keys and keyrefs are just workarounds while multiple named parent-child axes would be meaningful.

Re: A new feature to be considered for XML 2.0

Dominique Rabeuf's picture

You are landing in general graph theory, formal computing theory is not yet available to deal with. That is the major mathematic enigm about P-NP problem. OWL2 is triying to classify computability http://www.w3.org/TR/owl2-profiles/ (Open problem since tens of years)

http://www.claymath.org/millennium/P_vs_NP/ (Millenium Main Logic Problem)

But, in the mean time, if you can tell to XML DB designers (ML eXist) that XQuery should be enhanced with taking into account document collections and via a standardized way, it would be great. In other words, schemas taking into account relationships outside a document without using cumbersome things as redefine directive

Re: A new feature to be considered for XML 2.0

Alain Couthures's picture

Adding the possibility to name and define different parent-child relations is not difficult to implement.

A notation such as "child('myrelation')::" in XPath !

Theory is nice only when it proves that what you know to be good is valid.

Re: A new feature to be considered for XML 2.0

Dominique Rabeuf's picture

You are right in the definition, but involvement in the navigation in this mode is quite difficult to implement. We (humans) have a father and a mother and this fact is not taken into account in XML and more generally in trees models