Setting the encoding on an inputstream

Go To StackoverFlow.com

1

I'm processing xml in Java and I have the following code:

  dbf.setValidating(false);
  dbf.setIgnoringComments(false);
  dbf.setIgnoringElementContentWhitespace(true);
  dbf.setNamespaceAware(true);

  DocumentBuilder db = null;
  db = dbf.newDocumentBuilder();
  db.setEntityResolver(new NullResolver());
  _logger.error("Before processing the input stream");
  processXml(db.parse(is));

Where (is) is an InputStream.

This is resulting in the error:

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8

Which sounds like an error resulting from getting the wrong encoding. I would like to set the encoding on the InputStream but I am not sure how. I found ways to set the encoding on an InputSource or an InputStreamReader but then the db.parse does not take a reader/InputSource.

What is the best way to fix this?

Thanks!

2012-04-05 23:19
by user220755
This should only happen if your XML is malformed (missing encoding information). Things tend to break when you specify the encoding instead of letting the parser determine it from the document via well-defined rules. Of course, if the document is corrupt, an XML parser isn't going to read it. GIGO - erickson 2012-04-06 01:30


2

DocumentBuilder.parse can take an InputSource. See the javadocs.

So you should try wrapping your InputStream in an InputReader (where you can specify the character set) and then create an InputSource based on that.

It's a bit convoluted, but these things happen in Java.

Something along the lines of

2012-04-05 23:47
by Don Roby
Ads