Parsing text in python issue - 【StackMirror】|python|file|parsing|text

I am trying to figure out how to grab the number of clients on each of the AP lines from this text file, which appears to be fine for the ones that have a 'Num of clients' associated with it (on the next line), but you can see if no clients are associated to it, it doesn't print a num of clients.

I cant figure out the logic to check the next line to see if 'num of clients' is present, and then go back to the current line. If its present, it should proceed to the next line and grab the client number. If there is no 'num of clients' line, I'm trying to set the clients to 0.

I have a file that consists of the following:

wireless-detail.txt

RUDY>show wireless ap detail on AP1 | include clients
RUDY>show wireless ap detail on AP2 | include clients
 Num of clients       : 8  
RUDY>show wireless ap detail on AP3 | include clients
 Num of clients       : 21
RUDY>show wireless ap detail on AP4 | include clients
RUDY>show wireless ap detail on AP5 | include clients
 Num of clients       : 2

Right now I have the following:

for line in file:   
    if "AP" in line:
        ap = re.search('AP[0-99]+', line)
        print ap.group(0),
    elif "Num of clients" in line:  
        clients = re.search('[0-99]+', line)
        print '- ' + clients.group(0)

It currently prints the following:

AP1 AP2 - 8
AP3 - 21
AP4 AP5 - 2
AP6 - 5
AP7 - 2
AP8 - 5
AP9 - 5

What is the best method to have it check the next line to see if the AP should be set to 0 clients or not?

Edit: FWIW - I was trying file.next() to read the next line, which seems to work, but I couldn't go back to the previous line :/

Edit #2: I wish I could upvote you all. Thanks everyone for this! Incredible how there are so many ways to do it and I couldn't figure out 1 of them!!!!

2012-04-03 21:19
by jgilmour

I'd suggest you query your wireless controller platform using one of programmatic interfaces it probably has, e.g. SNMP, XML-RPC/SOAP. Then you could graph the association counts - MattH 2012-04-03 21:22

Hi Matt, I was originally trying to do it via SNMP, but the queries for association counts are listed via the controller, and I dont have ips assigned to APs. : - jgilmour 2012-04-03 23:01

you're overthinking it. Don't use regular expressions. they're slow, buggy, and you're pattern is very well defined. Do something like...

for line in file:
    if "AP" in line:
        i = line.find('AP')
        splitLine = line[i+2:].split('|')
        val = splitLine[0]
        print val,
    elif "Num of clients" in line:
        splitLine = line.split(':')
        num = splitLine[1]
        print '- ' + num

2012-04-03 21:28
by noa

This solution produces notably unreadable output, uses a keyword as a variable name, and offers poor reusability (extracted data can't be used after print). : - hexparrot 2012-04-03 23:09

heres a short method using regular expressions. Note the re.MULTILINE flag

s='''RUDY>show wireless ap detail on AP1 | include clients
RUDY>show wireless ap detail on AP2 | include clients
 Num of clients       : 8  
RUDY>show wireless ap detail on AP3 | include clients
 Num of clients       : 21
RUDY>show wireless ap detail on AP4 | include clients
RUDY>show wireless ap detail on AP5 | include clients
 Num of clients       : 2'''
import re
print re.findall(r'(AP\d) \| include clients(?:$\n Num of clients {7}: (\d))?',s,flags=re.M)

the (?:$\n Num of clients {7}: (\d))? makes an non-capturing group, and with the ? at the end, it is optional. If it dosent capture, the second matching group will be empty, as it is for 1 and 4.

the " {7}" means 7 spaces

prints this:

  [('AP1', ''), ('AP2', '8'), ('AP3', '2'), ('AP4', ''), ('AP5', '2')]

2012-04-03 22:29
by apple16

Awesome! Thank you so much - jgilmour 2012-04-03 23:01

If you want your data in a more structured format such as a dictionary, you can try this out:

with open('wireless-detail.txt', 'r') as fp:
    access_points = {}
    ap = None
    for line in fp:
        if 'AP' in line:
            ap = line[line.find('AP'):line.find('|')].strip()
            access_points[ap] = 0
        elif "Num of clients" in line:
            access_points[ap] = int(line.split(':')[1].strip())

print access_points

Returns:

{'AP2': 8, 'AP3': 21, 'AP1': 0, 'AP4': 0, 'AP5': 2}

I agree with previous solutions that re is an unnecessary complication for this sort of task, as your file is dependably outputted. One benefit of this approach is you also get information on known access points that have no connected users, e.g., AP1 = 0 (in int() form, too!)

2012-04-03 22:39
by hexparrot

If you are going to use RE, you could do something like this. A variable is used to determine whether or not you need the count line. If the count line is available, you use it, otherwise it is set to - 0.

need_count = 0
for line in file:   
    if "AP" in line:
        if need_count:
           print '- 0'
        ap = re.search('AP[0-99]+', line)
        print ap.group(0)
        need_count = 1
    elif "Num of clients" in line:  
        clients = re.search('[0-99]+', line)
        print '- ' + clients.group(0)
        need_count = 0

2012-04-03 21:31
by Furbeenator

This is probably overkill, but here goes...

I made a library with a few functions like this which come in handy often:

def fileLineGroups (f, count=2, truncate=True):
    lines = []
    for line in f:
        lines.append(line)
        if len(lines) == count:
            yield lines
            lines = lines[1:]
    if not truncate:
        if lines:
            yield lines

This will iterate over the lines of an open file handle, yielding groups of lines. It defaults to groups of 2, so it will return [line1, line2], then [line2, line3], etc. If you leave truncate on, it won't return the final group if it doesn't have count lines in it. This lets you do things like for a, b in fileLineGroups(f) without erroring out on the last one if it's an odd count.

Now you can do something like this:

import re
def getAPClientCounts (filepath):
    with open(filename) as f:
        for line1, line2 in py.fileLineGroups(f):
            match1 = re.search('AP\d*', line1)
            if match1:
                match2 = re.search('Num of clients.*: (\d*)', line2)
                if match2:
                    yield match1.group(), match2.groups()[0]

for ap, count in getAPClientCounts('wireless-detail.txt'):
    print 'AP with name %s has %s clients' % (ap, count)

AP with name AP2 has 8 clients
AP with name AP3 has 21 clients
AP with name AP5 has 2 clients

2012-04-03 22:42
by Gary Fixler