I have a file that essentially a list of XPaths like so:
/Options/File[1]/Settings[1]/Type[1]
/Options/File[1]/Settings[1]/Path[1]
/Options/File[1]/Settings[2]/Type[1]
/Options/File[1]/Settings[2]/Path[1]
/Options/File[2]/Settings[1]/Type[1]
/Options/File[2]/Settings[1]/Path[1]
I need to grab the values from the elements pointed to from these XPaths in moderate sized XML file (~3-5MB). Using XPathSelectElement works well, but is extremely slow. Is there a quicker way to do the same with Linq to XML or even manually traversing the XML?
In a related question, is the index value in the XPath and the order of elements returned from an XElement guarenteed to be the same? For instance, will these return the same:
xdoc.XPathSelectElement("/Options/File[1]/Settings[2]);
xdoc.root.Elements("File").ElementAt(0).Elements("Settings").ElementAt(1);
It did seem to match up, at least for my data. But even with a large dataset I would not say that it is guaranteed that the indexes are always the same between the methods.
If this method holds up, it seems to be at least an order of magnitude faster than using XPathSelectElement - K J 2012-04-03 22:12
I think this is what I am going to go with. I am certain there could be more performance improvements, such as Alexei's suggestion, but this is already at least 10 times faster in my limited tests.
private XElement GetElementFromXPath(XDocument xDoc, string xPath)
{
string[] nodes = xPath.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
XContainer xe = xDoc.Root;
for (int i = 1; i < nodes.Length; i++)
{
string[] chunks = nodes[i].Split(new char[] { '[', ']' });
int index = 0;
if (Int32.TryParse(chunks[1], out index))
xe = xe.Elements(chunks[0]).ElementAt(index - 1);
}
return (XElement)xe;
}
This assumes that all elements other than the root are listed along with their index number in the XPath (which is true for my scenarios).
Indexed XPath (n-th child) is normally slow due to need to traverse all children up to the one you need. To check - for relatively large file try to pick first child and last child and compare the differences (repeat ~1000 times for each and use StopWatch to measure).
If you XPath's all like you've shown you may be able to do selection manually by caching the child nodes as you iterate.
Order of elements in XML is significant, so normal XML API will always keep order of elements. Note that order of attributes is not significant for XML, so order of attributes may not be the same across queries (unlikely, but theoretically possible) and across different APIs.
I just had a similar problem as you: i had a horrible performance with selecting some nodes in a medium sized xml file (3 MB), using a bunch of indexed XPath expressions.
But in contrary to your solution i didn't have an index in every part of the XPath expression. So i tried to ditch the LINQ to XML using XPath (XElement.XPathSelectElement
) but instead used an XPathNavigator
by creating an XPathDocument
and calling CreateNavigator()
. On the navigator i used SelectSingleNode
Using XElement.XPathSelectElement
it took me 137.3 seconds to do all the selects (the rest of the program only used about 3 seconds by the way).
Using XPathNavigator.SelectSingleNode
the selects now need 1.2 seconds int total... that's a factor of almost 115
So if anyone needs faster XPath queries and doesn't want to parse the queries himself: don't use LINQ to XML if possible, it seems to be implemented horribly performance wise.
will these return the same
? why don't you try - L.B 2012-04-03 21:38