Introduction to XPath

An XPath expression consists of two parts: a context node and a selection pattern. The context node is the context from which the selection pattern begins. Referring to books.xml from the previous section, consider this XPath expression:

book/author If this expression were executed at the root level (its context), all <author/> nodes would be returned because the <book/> element is a child of the document element and contains an <author/> element. This expression is not very specific, so all <author/> elements are returned.

What if you want to retrieve only the <book/> element that has a specific ISBN? The XPath

expression would look like this:

The book part of the expression describes which element to retrieve. Inside of the square

brackets is a condition that this element must match. The @isbn part represents the isbn

attribute (@ being short for attribute). So, this expression reads "find the book elements that have an isbn attribute of '041777781'." XPath expressions can also be very complex. Consider the following expression:

This expression reads, "find the book elements that have author elements whose text contains the string 'McPeak'." Since this is a more complicated expression, it helps to break it down, working from the outside towards the inside. Removing all conditions, you have this expression: book[...] First, you know that a <book/> element will be returned since it is the outermost element; next come the conditions. Inside the first set of brackets, you notice the <author/> element: author[...] You now know you are looking for a book element with a child <author/> element. However, the children of the <author/> element need to be checked as well because the expression doesn't end there: contains(text(),'McPeak')

The contains() function takes two arguments and returns true if the first string argument

contains the second string argument. The text() function returns all the text in the given

context, so the text contents of the <author/> element are passed as the first argument in

contains(). The second argument passed to contains() is the search text, in this case

'McPeak'. Important Note that the contains() function, like all XPath functions, is casesensitive.

The resulting node set is one <book/> element, because there is only one book with an author (or coauthor) whose name is McPeak.

As you can see, XPath is a useful language that makes finding specific nodes in XML data

rather simple. It is no wonder Microsoft and Mozilla implemented XPath in their browsers for client-side use.