How to replace \\u by \u in Java String

Go To StackoverFlow.com

3

I have a string of the format:

"aaa\\u2022bbb\\u2014ccc"

I'd like to display the two special charactes, but to be able to do that, I have to first convert the string to this format:

"aaa\u2022bbb\u2014ccc"

I've tried writing this, but it gives a compilation error:

String encodedInput = input.replace("\\u", "\u");

This has got to be something straightforward, but I just cannot get it. Any ideas?

2012-04-05 20:55
by OceanBlue
Have you tried replace("\\\\u", "\\u") - Amir Pashazadeh 2012-04-05 20:58
possible duplicate of Howto unescape a Java string literal in JavaJames Montagne 2012-04-05 21:02
@AmirPashazadeh : That was one of the things I tried. Seems to return an identical String - OceanBlue 2012-04-05 21:02
“of the format ...\uxxxx...” is underdefined. There are lots of different string literal formats that use \u escapes, all with slightly different rules. You must know which of them you have, in order to decode them correctly. Is it a literal from Java source code? Is it JSON? Is it something else - bobince 2012-04-07 10:07


4

Unfortunately I do not know of a sort of eval.

    String s = "aaa\\u2022bbb\\u2014ccc";
    StringBuffer buf = new StringBuffer();
    Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
    while (m.find()) {
        try {
            int cp = Integer.parseInt(m.group(1), 16);
            m.appendReplacement(buf, "");
            buf.appendCodePoint(cp);
        } catch (NumberFormatException e) {
        }
    }
    m.appendTail(buf);
    s = buf.toString();
2012-04-05 21:23
by Joop Eggen
+1 Thankyou! Replaces \uXXXX by \uXXXX - OceanBlue 2012-04-06 15:27


3

In addition to escaping your escapes -- as other people (e.g. barsju) have pointed out -- you must also consider that the usual conversion of the \uNNNN notation to an actual Unicode character is done by the Java compiler at compile-time.

So even once you sort out the backslash escaping issue, you may very well have have further trouble getting the actual Unicode character to display because you appear to be manipulating the string at run-time, not at compile-time.

This answer provides a method to replace \uNNNN escape sequences in a run-time string with the actual corresponding Unicode characters. Note that the method has some TODOs left with regard to error handling, bounds checking, and unexpected input.

(Edit: I think the regex-based solutions provided here by e.g. dash1e would be better than the method I linked, as they are more polished with regards to handling unexpected input data).

2012-04-05 21:03
by Mike Clark


2

Try

Pattern unicode = Pattern.compile("\\\\u(.{4})");
Matcher matcher = unicode.matcher("aaa\\u2022bbb\\u2014ccc");
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
    int code = Integer.parseInt(matcher.group(1), 16);
    matcher.appendReplacement(sb, new String(Character.toChars(code)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
2012-04-05 20:58
by dash1e
Nope, doesn't work :- - OceanBlue 2012-04-05 21:01
Maybe this works no - dash1e 2012-04-05 21:19
@dash1e Perhaps you want Pattern unicode = Pattern.compile("\\\\\\\\u(.{4})"); and Matcher matcher = unicode.matcher("aaa\\\\u2022bbb\\\\u2014ccc");, then I think you can handle the double backslashes in his example: aaa\\u2022bbb\\u2014ccc. (I think he literally has double slashes in the string as it exists in memory. The fact that he puts quotes around his example string is confusing of course. - Mike Clark 2012-04-05 22:14


0

You need to escape your escapes:

System.out.println("aaa\\u2022bbb\\u2014ccc".replace("\\\\u", "\\u"));
2012-04-05 21:00
by barsju


0

String input = "aaa\\u2022bbb\\u2014ccc";
String korv = input.replace("\\\\u", "\\u");
System.out.println(korv);

=>

aaa\u2022bbb\u2014ccc

This because "\" is a special character in a string, so you need to quote it as well. "\" == "\".

2012-04-05 21:00
by bos
Ads