regexp: tw- prefixed words with optional quotation marks

Go To StackoverFlow.com

0

I am struggling to build a regexp to catch words starting in tw (or Tw, or TW) whether they are in between quotes (single or double) or not. So far '\b[tT][wW][a-zA-Z0-9]*' catches all the tw, Tw, and TW starting words but misses the ones in btw single or double quotes. It finds both tweeple and TWEEPLE but not 'tweeple' nor "TWEEPLE".

Help much appreciated.

2012-04-05 16:19
by jrichalot


2

The \b in your string is being interpreted as a backspace character, not the sequence \b which would be interpreted as a word boundary by the regex engine. Change your string to a raw string literal or escape the backslash and it should work:

>>> re.findall(r'\b[tT][wW][a-zA-Z0-9]*', ' "TWEEPLE" tweeple ')
['TWEEPLE', 'tweeple']

Here is an example of the difference:

>>> 'abc\b'
'abc\x08'
>>> print 'abc\b'
abc
>>> r'abc\b'
'abc\\b'
>>> print r'abc\b'
abc\b
2012-04-05 16:25
by Andrew Clark
Good catch on the \b not being a literal. And I didn't realize it includes quotes - makes sense though. Deleting my answer - Dan Breen 2012-04-05 16:44
Ads