This is a complete noob question....
but why do some strings in python appear as:
{u'foobar: u'bar}
while others appear as:
{foobar: bar}
are they equivalent? How do you convert between the two?
The u
prefix means the string is Unicode..
http://docs.python.org/reference/lexical_analysis.html
Refer to section 2.4.1:
A prefix of 'u' or 'U' makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.
As you can see, Python will be able to compare strings of various encodings automatically:
>>> a = u'Hello'
>>> b = 'Hello'
>>> c = ur'Hello'
>>> a == b
True
>>> b == c
True
You can learn more about Unicode strings in Python (as well as how to convert or encode strings) by referring to the documentation.
No, They are not Equivalent
The "u" that prefixes the string means it's Unicode. Unicode was designed to be an extended character set to accommodate languages that aren't English. You can read this entertaining and non-technical history of Unicode.
http://www.reigndesign.com/blog/love-hotels-and-unicode/
As Lattyware mentions, in Python 3.x, all strings are Unicode.
If you're working with Python 2.x, especially for the web, it's worth making sure that your program handles Unicode properly. Lots of people like to gripe about websites that don't support Unicode.
Using u'string'
defines that the string is of unicode type.
>>> type('hi')
<type 'str'>
>>> type(u'hi')
<type 'unicode'>
You can read all about it in the uncode documentation page.