Why XmlTextReader is skipping nodes when WhitespaceHandling is set to None or Significant - 【StackMirror】|c#|xml-serialization|.net-2.0

I'm pulling my hair out on this. I do some manual deserialization using XmlReader - nothing serious, done that zilion times. But this is something I can't figure out.

This is sample xml file

<?xml version="1.0" encoding="utf-8"?>
<Theme name="something" version="1.0.0.0">
  <Thumbnail length="1102">[some base64 encoded data]
</Thumbnail>
  <Backgrounds>
    <string>Themes\something\Backgrounds\file1</string>
    <string>Themes\something\Backgrounds\file2</string>
    <string>Themes\something\Backgrounds\file3</string>
  </Backgrounds>
  <Stickers>
    <string>Themes\something\Stickers\stick1</string>
    <string>Themes\something\Stickers\stick1</string>
    <string>Themes\something\Stickers\stick1</string>
  </Stickers>
  <PreviewImages>
    <string>Themes\something\Preview\rh_01.jpg</string>
    <string>Themes\something\Preview\rh_02.jpg</string>
    <string>Themes\something\Preview\rh_03.jpg</string>
  </PreviewImages>
</Theme>

This is deserialization code (a bit simplified):

public void ReadXml(System.Xml.XmlReader reader)
{       
    /* Read attributes - not important here */

    while (reader.Read())
    {
        Console.WriteLine("Main: {0} {1}", reader.NodeType, reader.Name);
        switch (reader.Name)
        {
            case Xml.Elements.Thumbnail:
                this._thumbnail = Xml.DeserializeBitmap(reader);
                Console.WriteLine("Inner: {0} {1}", reader.NodeType, reader.Name);
                break;
            case Xml.Elements.Backgrounds:
                this._backgrounds = Xml.DeserializeListOfStrings(reader);
                break;
            case Xml.Elements.Stickers:
                this._stickers = Xml.DeserializeListOfStrings(reader);
                break;
            case Xml.Elements.PreviewImages:
                this._previewImages = Xml.DeserializeListOfStrings(reader);
                break;
        }

        if (reader.NodeType == System.Xml.XmlNodeType.EndElement
                && reader.Name == Xml.Root)
            break;
    }
}

The problem:

After this._thumbnail is deserialized, the reader is positioned on closing element of Thumbnail node. Then reader.Read() at the beginning of while loop is called... and the reader gets positioned on starting element of a string node. The Backgrounds element is skipped! Why?

This happens when the reader is the XmlTextReader and it's WhitespaceHandling property is set to WhitespaceHandling.None or WhitespaceHandling.Significant.

If it is set to WhitespaceHandling.All everything works as expected. After calling reader.Read() the reader is positioned on starting element of Backgrounds node.

[EDIT] I've added two debug lines to the example code.

With WhitespaceHandling.All I get this:

Main: Whitespace 
Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element Backgrounds
Main: Whitespace 
Main: Element Stickers
Main: Whitespace 
Main: Element PreviewImages
Main: Whitespace 
Main: EndElement Theme

With WhitespaceHandling.Significant I get this:

Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element string
Main: Text 
Main: EndElement string
Main: Element string
Main: Text 
Main: EndElement string
Main: Element string
Main: Text 
Main: EndElement string
Main: EndElement Backgrounds

[EDIT 2] Adjusted debug output a bit to be more readable.

As you can see, the debug output for WhitespaceHandling.Significant ends on </Backgrounds>. That's because my Xml.DeserializeListOfStrings does not yet check if it's positioned correctly and "accidentally" reads document to the end. But that's not the scope of this question.

2012-04-05 18:41
by SiliconMind

It's weird that for WhitespaceHandling.All, there's no Main: Whitespace line between Inner: EndElement Thumbnail and Main: Element Backgrounds, given that there's a line break in your XML - Michael Liu 2012-04-05 21:48

The cause of my headache is XmlReader.ReadElementContentAsBase64 method that I use to deserialize <Thumbnail> node. I was experimenting with it in a loop:

private static byte[] ReadBytes(System.Xml.XmlReader reader)
{
    byte[] buffer = new byte[128];
    int length = XmlConvert.ToInt32(reader[Xml.Attributes.Length]);

    using (MemoryStream ms = new MemoryStream(length))
    {
        int count = 0;

        do
        {
            count = reader.ReadElementContentAsBase64(buffer, 0, buffer.Length);
            ms.Write(buffer, 0, count);

        } while (ms.Length < length);

        return ms.GetBuffer();
    }
}

However MSDN says that:

If the count value is higher than the number of bytes in the document, or if it is equal to the number of bytes in the document, the XmlNodeReader reads all the remaining bytes in the document and returns the number of bytes read. The next ReadElementContentAsBase64 method call returns a zero and moves the reader to the node following the EndElement node.

If you call Read before all of the element content is consumed, the reader may behave as if the first content was consumed and then the Read method was called. This means that the reader will read all the text until the end element is encountered. It will then read the end tag node, read the next node, and then position itself on the next subsequent node.

It seems that despite reading to the end of element's content (I know data length so theoretically I can do that), the XmlReader did not consider that I've "consumed" all of the element's content. That caused some unexpected behaviour described in MSDN.

The XmlReader behaved the same with WhietespaceHandling.All and WhietespaceHandling.Significant. My code worked with WhietespaceHandling.All because after last call to XmlReader.ReadElementContentAsBase64, the reader was skipping non significant whitespace. If source xml file would contain no newlines and tabs, my code would fail with WhietespaceHandling.All too.

The solution is to modify while loop to make one additional call to XmlReader.ReadElementContentAsBase64 after all bytes are red. The downside of this approach is that after that additional call the reader is moved to the node following the EndElement node.

do
{
    count = reader.ReadElementContentAsBase64(buffer, 0, buffer.Length);
    if (count > 0)
        ms.Write(buffer, 0, count);

} while (count > 0);

One could also use XmlTextReader.ReadBase64 method to read whole element content at once, but I'm forced to use only XmlReader base as my class implements IXmlSerializable, so this method is not available for me.

2012-04-05 23:05
by SiliconMind