I'm processing xml in Java and I have the following code:
dbf.setValidating(false);
dbf.setIgnoringComments(false);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setNamespaceAware(true);
DocumentBuilder db = null;
db = dbf.newDocumentBuilder();
db.setEntityResolver(new NullResolver());
_logger.error("Before processing the input stream");
processXml(db.parse(is));
Where (is) is an InputStream.
This is resulting in the error:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8
Which sounds like an error resulting from getting the wrong encoding. I would like to set the encoding on the InputStream but I am not sure how. I found ways to set the encoding on an InputSource or an InputStreamReader but then the db.parse does not take a reader/InputSource.
What is the best way to fix this?
Thanks!
DocumentBuilder.parse
can take an InputSource
. See the javadocs.
So you should try wrapping your InputStream
in an InputReader
(where you can specify the character set) and then create an InputSource
based on that.
It's a bit convoluted, but these things happen in Java.
Something along the lines of