How to use re.compile to search for strings with common areas in the middle

Go To StackoverFlow.com

0

I search for the following lines coming from a messy HTML file:

    <span id="fooPack1_xpl01_name11">150.00 FTL</span>
    <span id="fooPack1_xpl02_name11">350.00 FTL</span>
    <span id="fooPack1_xpl03_name11">250.00 FTL</span>
    <span id="fooPack1_xpl04_name11">230.00 FTL</span>

I use BeautifulSoup and re to search and find the strings:

     tags = soup.find_all('span', id=re.compile(r'[fooPack1_xpl04_name11]\d+'))

But obviously the common part of that string is at the beginning and at the end, with the part changing always in the middle. How can I restructure my re search pattern so that it searches for "fooPack1_xpl"+(different string)+"_name11"

Thanks.

// EDIT //

When I query the following:

    <span id="FullView1_spl02_Stack_4">03/04/12</span>
    <span id="FullView1_spl03_Stack_4">01/03/11</span>
    <span id="FullView1_spl04_Stack_4">02/25/02</span>
    <span id="FullView1_spl05_Stack_4">07/16/04</span>
    <span id="FullView1_spl01_Stack32">999.00 SPL</span>
    <span id="FullView1_spl02_Stack82">150.00 XPP</span>
    <span id="FullView1_spl03_Stack82">350.00 XPP</span>
    <span id="FullView1_spl04_Stack82">450.00 XPP</span>
    <span id="FullView1_spl05_Stack82">550.00 XPP</span>
    <span id="FullView1_spl06_Stack82">650.00 XPP</span>
    <span id="FullView1_spl07_Stack22">888.00 SPL</span>
    <span id="FullView1_spl202_stckFriendName">Red Car</span>
    <span id="FullView1_spl203_stckFriendName">Green Car</span>
    <span id="FullView1_spl204_stckFriendName">Blue Car</span>

with:

     foo=soup.findAll('span', id=re.compile(r'FullView1_spl\d+_stack82'))

I get the following result:

    <span id="FullView1_spl204_stckFriendName">Blue Car</span>
    <span id="FullView1_spl02_Stack82">150.00 XPP</span>
    <span id="FullView1_spl03_Stack82">350.00 XPP</span>
    <span id="FullView1_spl04_Stack82">450.00 XPP</span>
    <span id="FullView1_spl05_Stack82">550.00 XPP</span>
    <span id="FullView1_spl06_Stack82">650.00 XPP</span>

Obviously, I do not need the top element to be detected. So this is the only problem.

2012-04-04 23:35
by mbilyanov


0

You're almost there. You want to search for fooPack1_xpl followed by digits followd by _name11, so how about:

re.compile(r'fooPack1_xpl\d+_name11')

Note that I just put a \d+ for where you expect the digits, and the literal string you were searching for otherwise.

2012-04-04 23:39
by mathematical.coffee
Hello, I am trying to first find the lines that have those strings in the HTML file, and then extract only the "230.00 FTL" part. The original question is here: http://stackoverflow.com/questions/10019954/extracting-a-specific-string-out-an-html-document But by changing the re.compile section, it does not seem like I am getting the right lines : - mbilyanov 2012-04-05 00:00
What is the point of asking this question when you have already asked a duplicate question back there? I have answered the specific question you asked, that is, how to match 'fooPack1xpl' + digits + 'name11'. Details on how to extract are already given in the answer to your previous question - mathematical.coffee 2012-04-05 00:01
Sorry. I just did not want to make that other topic too busy. Meanwhile I had a chance to get more information about the regex library - mbilyanov 2012-04-05 15:08
Actually, ignore me, I think it works. I am getting what I need with that pattern. Thanks for the help - mbilyanov 2012-04-05 17:08
Ads