I'm pulling my hair out on this. I do some manual deserialization using XmlReader - nothing serious, done that zilion times. But this is something I can't figure out.
This is sample xml file
<?xml version="1.0" encoding="utf-8"?>
<Theme name="something" version="1.0.0.0">
<Thumbnail length="1102">[some base64 encoded data]
</Thumbnail>
<Backgrounds>
<string>Themes\something\Backgrounds\file1</string>
<string>Themes\something\Backgrounds\file2</string>
<string>Themes\something\Backgrounds\file3</string>
</Backgrounds>
<Stickers>
<string>Themes\something\Stickers\stick1</string>
<string>Themes\something\Stickers\stick1</string>
<string>Themes\something\Stickers\stick1</string>
</Stickers>
<PreviewImages>
<string>Themes\something\Preview\rh_01.jpg</string>
<string>Themes\something\Preview\rh_02.jpg</string>
<string>Themes\something\Preview\rh_03.jpg</string>
</PreviewImages>
</Theme>
This is deserialization code (a bit simplified):
public void ReadXml(System.Xml.XmlReader reader)
{
/* Read attributes - not important here */
while (reader.Read())
{
Console.WriteLine("Main: {0} {1}", reader.NodeType, reader.Name);
switch (reader.Name)
{
case Xml.Elements.Thumbnail:
this._thumbnail = Xml.DeserializeBitmap(reader);
Console.WriteLine("Inner: {0} {1}", reader.NodeType, reader.Name);
break;
case Xml.Elements.Backgrounds:
this._backgrounds = Xml.DeserializeListOfStrings(reader);
break;
case Xml.Elements.Stickers:
this._stickers = Xml.DeserializeListOfStrings(reader);
break;
case Xml.Elements.PreviewImages:
this._previewImages = Xml.DeserializeListOfStrings(reader);
break;
}
if (reader.NodeType == System.Xml.XmlNodeType.EndElement
&& reader.Name == Xml.Root)
break;
}
}
The problem:
After this._thumbnail
is deserialized, the reader
is positioned on closing element of Thumbnail node. Then reader.Read()
at the beginning of while
loop is called... and the reader
gets positioned on starting element of a string node. The Backgrounds element is skipped! Why?
This happens when the reader
is the XmlTextReader
and it's WhitespaceHandling
property is set to WhitespaceHandling.None
or WhitespaceHandling.Significant
.
If it is set to WhitespaceHandling.All
everything works as expected. After calling reader.Read()
the reader
is positioned on starting element of Backgrounds node.
[EDIT] I've added two debug lines to the example code.
With WhitespaceHandling.All
I get this:
Main: Whitespace
Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element Backgrounds
Main: Whitespace
Main: Element Stickers
Main: Whitespace
Main: Element PreviewImages
Main: Whitespace
Main: EndElement Theme
With WhitespaceHandling.Significant
I get this:
Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element string
Main: Text
Main: EndElement string
Main: Element string
Main: Text
Main: EndElement string
Main: Element string
Main: Text
Main: EndElement string
Main: EndElement Backgrounds
[EDIT 2] Adjusted debug output a bit to be more readable.
As you can see, the debug output for WhitespaceHandling.Significant
ends on </Backgrounds>
. That's because my Xml.DeserializeListOfStrings
does not yet check if it's positioned correctly and "accidentally" reads document to the end. But that's not the scope of this question.
The cause of my headache is XmlReader.ReadElementContentAsBase64
method that I use to deserialize <Thumbnail>
node. I was experimenting with it in a loop:
private static byte[] ReadBytes(System.Xml.XmlReader reader)
{
byte[] buffer = new byte[128];
int length = XmlConvert.ToInt32(reader[Xml.Attributes.Length]);
using (MemoryStream ms = new MemoryStream(length))
{
int count = 0;
do
{
count = reader.ReadElementContentAsBase64(buffer, 0, buffer.Length);
ms.Write(buffer, 0, count);
} while (ms.Length < length);
return ms.GetBuffer();
}
}
However MSDN says that:
If the count value is higher than the number of bytes in the document, or if it is equal to the number of bytes in the document, the XmlNodeReader reads all the remaining bytes in the document and returns the number of bytes read. The next ReadElementContentAsBase64 method call returns a zero and moves the reader to the node following the EndElement node.
If you call Read before all of the element content is consumed, the reader may behave as if the first content was consumed and then the Read method was called. This means that the reader will read all the text until the end element is encountered. It will then read the end tag node, read the next node, and then position itself on the next subsequent node.
It seems that despite reading to the end of element's content (I know data length so theoretically I can do that), the XmlReader
did not consider that I've "consumed" all of the element's content. That caused some unexpected behaviour described in MSDN.
The XmlReader
behaved the same with WhietespaceHandling.All
and WhietespaceHandling.Significant
. My code worked with WhietespaceHandling.All
because after last call to XmlReader.ReadElementContentAsBase64
, the reader
was skipping non significant whitespace. If source xml file would contain no newlines and tabs, my code would fail with WhietespaceHandling.All
too.
The solution is to modify while loop to make one additional call to XmlReader.ReadElementContentAsBase64
after all bytes are red. The downside of this approach is that after that additional call the reader
is moved to the node following the EndElement node.
do
{
count = reader.ReadElementContentAsBase64(buffer, 0, buffer.Length);
if (count > 0)
ms.Write(buffer, 0, count);
} while (count > 0);
One could also use XmlTextReader.ReadBase64
method to read whole element content at once, but I'm forced to use only XmlReader
base as my class implements IXmlSerializable, so this method is not available for me.
WhitespaceHandling.All
, there's noMain: Whitespace
line betweenInner: EndElement Thumbnail
andMain: Element Backgrounds
, given that there's a line break in your XML - Michael Liu 2012-04-05 21:48