Subscribe to RSS feed

«

»

Aug
25

Common XQuery mistakes

Michael Kay recently covered common mistakes made using XSLT, which made me think it maybe useful to do the same treatment with XQuery. A lot of the points Michael Kay raised in his article also directly applies to XQuery, that is:

  • new users struggle working with trees
  • new users typically misunderstand impact of side effects from the usage of external functions
  • new users consistently have issues with matching elements with namespaces

In no particular order of importance I present the most common errors I see users make when learning XQuery.

FLWOR must be obeyed

There is a whole class of mistakes related to developers insisting on using the FLWR structure in a free form way, that is use For, Let, Where, Order or Return in any kind of order.

I think another contributing factor to these class of problems is because FLWOR structure can easily span across several pages of code, making it hard to track things down.

When confronted with these kinds of errors, its clear to me that the developer needs more foundational grounding in the basics of the language itself, so I tend to refer developers to pick up a book on XQuery or check out Michael Kay's 'An Introduction to XQuery FLWOR expression'.

You always need to do something else

let $n := 1
return
if ($n = 1) then
  "one"

 Newcomers to XQuery are always bemused by the above code throwing an error, complaining about the lack of an 'else' branch. Unlike a lot of other programming languages, the conditional if statement within XQuery always requires the 'else' branch to be declared, as the following example illustrates. 

let $n := 1
return
if ($n = 1) then
  "one"
else
  ()

I initially viewed this requirement as a parochial bit of syntatic convention but I have come around to it; if anything the enforced presence of 'else'  makes refactoring slightly easier and also a constant reminder to think about how to handle the alternate condition.

Dynamic evaluation is desired

let $xml := <foo><bar>text</bar><baz>text</baz></foo>
for $el in ('bar', 'baz')
return
$xml/$el/text()

Michael Kay pointed out in his article that people often expect to be able to perform these kind of textual substitution (e.g. 'bar' or 'baz') within XPath. Most XQuery implementations have some non standard evaluate type function to help with dynamic evaluation. Its probably best to resist the lure of using eval functions until you have no choice.

update: it was rightly pointed out to me that I should illustrate how to do this without eval

let $xml := <foo><bar>text</bar><baz>text</baz></foo>
for $el in ('bar', 'baz')
return
$xml/*[fn:local-name() eq $el]/text()

Curly, curly, curly braces

let $a := 'test'
return
<test>{$a}</test>

In XQuery, curly braces contain an expression to be evaluated and replaced by the result. In the above example its clear where curly braces need to be used, but in heavily nested FLWR structures it can be easy to lose track of curly braces.

Such 'dangling' or 'orphaned' syntax is common throughout many programming languages (be it braces, parenthesis, etc ...) but I find the problem is compounded in XQuery because the language's mixture of syntax styles doesn't readily lend itself to nice compact visual layout.

The desire to return multiple elements is strong

let $a := 'test'
return
<el1>{$a}</el1><el2>{$a}</el2>

When eXist XML Database processes the above XQuery it throws the following error

Cannot compile xquery: org.exist.xquery.XPathException: err:XPST0003 in line 4, column 5: expecting EOF, found '>'

XQuery processor Zorba also throws the same error

Query: <>, line 4, column 5: [XPST0003] syntax error, unexpected "'>'", expecting end of file

These are perfectly valid error messages, and whilst the code [XPST0003] may hold little meaning even for experienced xquery users, I do know that when new users of XQuery encounter this message it consistently confuses them

The trick of course is to realize that XQuery (and XPATH) uses sequences and in a return statement you cannot output such xml as it has no document root and is thusly invalid xml.

let $a := 'test'
return
(<el1>{$a}</el1>,<el2>{$a}</el2>)

Reformulating the output you want to return into a sequence solves the problem. Like with the aforementioned curly braces, it can get tedious to know where to use parenthesis and commas in heavily nested code.

attributes when you mean string

let $node := <meta name="author" content="James Fuller"/>
return
element {$node/@name}{
   $node/@content
}

Somewhat confusingly outputs an attribute when many users expect text.

<author content="James Fuller"/>

XSLT users get tripped up on this all the time, the following is what we want

let $node := <meta name="author" content="James Fuller"/>
return
element {$node/@name}{
   fn:string($node/@content)
}

Does the right thing, by outputting the @content string data into the element, but why did the element name (in the constructor) work in both examples ?

This is because XQuery has automatically reduced the element name to its most primitive datatype. The rules by which XQuery makes such decisions are somewhat non intuitive and I have found the only way to learn it is by experience.

I advise students of mine to always use fn:string to signal your intention that you want an attribute's text value.

Lack of template matching makes one think

XQuery's lack of XSLT powerful template matching forces XQuery developers to learn algorithms which they maybe unfamiliar with e.g. to emulate template matching in XQuery you will need to resort to using recursive functions that 'walks the tree'.

xquery version "1.0";

declare function local:treewalker ($html) {

let $children := $html/*
return
if(empty($children)) then ()
else
for $c in $children
return
( element {name($c)}{
$c/@*,
$c/text(),
local:treewalker($c)
})
};

let $xml := <html info="test">
<body>
<a/>
<b>
<c info="test1">test</c>
</b>
<p>teststs</p>
</body>
</html>
return
local:treewalker($xml)

The local:treewalker($xml) function should return a copy of the $xml. The quicker you pick up these algorithms the more efficient your programming will become, so don't delay learning them.

Here is an article I wrote some time ago which may help pick up some more advanced algorithms for use with XQuery.

Comparing things properly

let $a := 'test'
let $b := 'test'
return
fn:compare($a,$b)

 

The fn:compare function will return a zero if $a and $b strings match and either 1 or -1 depending on if a comperand is greater or less then a comperand. If you are wondering how strings are 'lesser' or 'greater' then here is a few examples illustrating this.

let $a := 'abc'
let $b := 'ab'
return
fn:compare($a,$b)

should return a 1 and

let $a := 'ab'
let $b := 'abc'
return
fn:compare($a,$b)

will return a -1.

Comparing strings, numbers or other datatypes in XQuery can get confusing for new users. Instead of using fn:compare its better to use either the set of normal comparison operators (=, !=, <, <=, >, >=) or value comparison operators (eq, ne, lt, le, gt, and ge) to compare things but you need to comprehend just what is being compared with all these operators.

The first thing to remember is that normal comparison operators work on sequences.

let $a := (1,2,3)
let $b := (3)
return
$a = $b

this expression returns true, because $a = $b is a sequence comparison which is true if the intersection is not empty (e.g. both the $a and $b variable contains a 3 value in the sequence).

The right way to compare values is to use the value comparison operators (eq, ne, lt, le, gt, and ge) which a purpose built to compare single values rather then sequences.

let $a := "a"
let $b := "a"
return
if( $a eq $b) then
  'string values matched'
else
  'string values do not match'

There are more dark corners lurking within comparison operators and you may also find that your specific XQuery processor have different opinions of what it should do, for example, MarkLogic XQuery processor also compares sequences using value comparison operators which can trip up users expecting the same behavior across all XQuery processors.

Comparison is a lengthy topic and there is a more indepth treatment on the subject at  XQuery.com.

XPath true and false question

The plain string values of 'true' or 'false' have no meaning in XQuery ( and XPath for that matter). The boolean values are represented as xs:boolean('true') and xs:boolean('false') or  can also use fn:true() or fn:false() functions.

My users always expected these words to be reserved for some reason.

Empty namespaces conundrum

declare default element namespace "http://www.w3.org/1999/xhtml";

<html>
<body>
test
</body>
</html>

In XQuery, you can define the default namespace, so that all un-prefixed output elements appear in the default namespace. This is achieved using the 'declare default element namespace' declaration in the example.

There is an issue using this facility as once you have declared a default element namespace its difficult to then declare an element with no namespace.

declare default element namespace "http://www.w3.org/1999/xhtml";
declare namespace no-namespace = " ";

<html>
<body>
<no-namespace:element>This element has no namespace</no-namespace:element>
</body>
</html> 

Strange indeed and just to make a point, we can always select elements in no namespace

declare default element namespace "http://www.w3.org/1999/xhtml";
declare namespace no-namespace = " ";

let $xml := <html>
<body>
<no-namespace:element>This element has no namespace</no-namespace:element>
</body>
</html>

return

$xml//*[namespace-uri() eq " "]

Due to its more convulated nature, this kind of problem is usually encountered by an XQuery developer only after they have used the language for some time.

Do you have any common xquery mistakes or gotchas?

I know a few more issues but they are not as common ... anyone else want to contribute to this list ?

12 comments

  1. Vyacheslav Zholudev says:

    1) Concerning dynamic evaluation section, probably it’s worth mentioning in this article that you can do what was intended by:
    let $xml := texttext
    for $el in (‘bar’, ‘baz’)
    return
    $xml/*[fn:local-name() eq $el]/text()

    i.e. not using non-standard means for dynamic evaluation

    2) Sometimes users execute queries like:
    let $x := “text”
    return

    {$x}

    and expect result to be like:

    text

    but get:
    text

    Here “declare boundary-space preserve;” may help them if users care about whitespaces

    1. James Fuller says:

      thx, I updated article as per your suggestion

  2. Vyacheslav Zholudev says:

    oops, I can’t edit my comment, reposting 2)…
    2) Sometimes users execute queries like:

    let $x := “text”
    return
    <foo>
    {$x}
    </foo>

    and expect result to be like:
    <foo>
    text
    </foo>

    but get:
    <foo>text</foo>

    Here “declare boundary-space preserve;” may help them if users care about whitespaces

  3. Kurt Cagle says:

    Jim,

    Superb article. Thanks for posting it.

    1. James Fuller says:

      thx Kurt and thank you for getting xmltoday back up and running

  4. Dominique Rabeuf says:

    XQuery
    Powerful but not yet a full functional language
    XQuery and Functional Programming

    XQuery specification do not yet consider that a function may have a function as argument and may returns a function.

    1. Kurt Cagle says:

      XQuery 1.1 will. Functional programming is one of the major additions to the language.

      1. Dominique Rabeuf says:

        I had not yet taken a careful look at XQuery 1.1
        I hope this version will soon be implemented by eXist and MarkLogic.

  5. Andrew Welch says:

    Hi James,

    For “Comparing Things Properly” I think calling them “value comparison” and “normal/sequence” comparison operators doesn’t help the newbie…

    The terms “set operators” and “atomic operators” are more descriptive – the dev can say to themselves “am I comparing sets or atomics here” and use the appropriate one… which you cant do with normal/value.

    For example, the common gotcha is using $a != $b when it should be not($a = $b) for set comparisons. If the variables are atomics, then it should be “$a eq $b” or “$a ne $b”. The atomic comparisons are preferred because they’ll catch a mistake if either operand is a sequence of than more 1 item, and will be more efficient when comparing single items… so you could say they are the “normal” operators :)

    Sure they are less friendly terms, but once you learn set comparison and what an atomic is, you understand the language much better.

    cheers
    andrew

    1. James Fuller says:

      I (amicably) disagree with you … newbies often come to a new language because they have a problem they want to solve using the language and whilst its imperative to give correct advice, overloading them with terms (however valid) upfront distracts them from their original goal.

      Of course there must be some ‘upfront’ investment in learning a language, no language is just immediately usable, getting the balance right as an instructor can be difficult.

      The cognitive ‘work’ for an XQuery newbie is to understand that there are things called sequences and they contain values … I have found introducing secondary terms like ‘atomics’ or ‘set’ generally can be a source of more confusion and I try to introduce these finer concepts later on.

      Note I wrote this article for newbies having common problems with XQuery and in hope (via search engine) they would find this article, not for experienced XQuery users.

  6. Martin Probst says:

    I agree with Andrew, the not($x = $y) vs. $x != $y thing is a major problem for many users. Another nice one is //foo[bar 1] versus //foo[bar[. 1]].

    Another major problem is that many people try to solve something somehow in XQuery without really knowing the language, and then they get bitten by XQuery’s permissive syntax. For example I recently saw this:

    //foo[bar is null]

    Where somebody with an SQL background meant to say //foo[empty(bar)]. It is surprisingly hard to write an XQuery that gives a syntax error, even if you try. This lack of errors/strict syntax means that novice users will end up with queries that silently mask errors.

  7. Martin Probst says:

    I agree with Andrew, the not($x = $y) vs. $x != $y thing is a major problem for many users. Another nice one is //foo[bar < 5 and bar > 1] versus //foo[bar[. < 5 and . > 1]].

    Another major problem is that many people try to solve something somehow in XQuery without really knowing the language, and then they get bitten by XQuery’s permissive syntax. For example I recently saw this:

    //foo[bar is null]

    Where somebody with an SQL background meant to say //foo[empty(bar)]. It is surprisingly hard to write an XQuery that gives a syntax error, even if you try. This lack of errors/strict syntax means that novice users will end up with queries that silently mask errors.

    Kurt: it would be really nice if this form would escape &lt;’s.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>