Prefer XPathNavigator

I spoke last night at the Central Pennsylvania Users Group in Harrisburg, PA. They have a great group and about 35 folks turned out despite snow and nasty weather. Judy Calla is the group lead and gave a nice little beginners talk on debugging and error handling in .NET applications. I then jumped in with a talk on querying XML data in .NET.

One of the key points I always try to draw out in that talk is to get people familiar with the XPathNavigator model and the differences between the various types of XML documents in .NET (XmlDocument, XmlDataDocument, XPathDocument).

It is a natural fit for people who have worked with MSXML before to settle in and use the XmlDocument class (the W3C DOM implementation in .NET), never going beyond the methods and properties exposed by XmlNode and its derived classes to do their work.

However, I try to get people familiar with the fact that the preferred model for working with XML data in memoryin .NET is working through the XPathNavigator, since it can be used across all three of the document types, and especially since it will take on an even more significant role in .NET 2.0 with the introduction of the modifiable XPathDocument. More on that in a minute. I also point out that you can layer an XPathNavigator implementation on top of any hierarchical data that you control, giving it an XML like working model, even though it may have nothing to do with XML itself.

The XPathNavigator base class ( and the concrete implementations providedby each of the document types) provides a consistent and clean model for navigating and querying XML data. To get one, you just call CreateNavigator on the underlying document instance. What you get back is effectively a cursor into the document nodes that you can move around with the various MoveXXX methods, or you can query the current node and all sub-nodes with XPath expressions. When you query through this model, you have the choice of just passing an XPath expression as a string to the Select or Evaluate method (the former returns a node set result, the latter returns a value result – bool, number, string –if that is what the XPath expression is expected to evaluate to), or you can pre-compile the expression for faster execution if the query will be made more than once.

The XPathDocument class in .NET 1.X is a lighterweight object model than the one in the XmlDocument class and will have less of a footprint in memory for the same XML document in most cases. The big decision point in which of those two document types to pick currently is whether you need write access to the nodes you are dealing with. If the answer is yes, you only have one choice currently, and that is to work with the XmlDocument class. You can and still should do so through an XPathNavigator, but the underlying nodes are still XmlNodes instead of XPathNodes, and are therefore write access. If you are just looking to query and navigate the data to perform processing, then XPathDocument is the better choice because of the lighter weight object model.

In .NET 2.0, the big thing to be aware of is that the XPathDocument class becomes read/write. But more important than that is that any changes you make to the document (modifying, inserting, or deleting nodes) are tracked by the document in a similar fashion to the way the DataSet tracks changes. This means that you can then use the XPathDocument to perform updates to the underlying data store from whence it came. That is huge.

So bottom line, if you are not using XPathNavigator today for working with your XML documents in .NET, you should be. Look into and get used to the model. It will give you better consistency and a migration path to move you XML document processing code from the DOM today to the XPathDocument in .NET 2.0 with minimal changes.