I have a text file, which looks like this:
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
I want to grab nodes only if they are free. For that I have to make a regexp which will match node(..)
only if the following line has state = free
. Can You help me with this?
Edit:
Nothing works so far. May be because I'm not reading in the file, but
proc = subprocess.Popen("pbsnodes", stdout=subprocess.PIPE)
listOfFreeNodes = proc.stdout.read()
Could it harm the solutions some-how? Here's the full pbsnodes
output:
node01
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node01 2.6.27.19-5-01,nusers=0,idletime=861913,totmem=16432576kb,availmem=16=free,jobs=,varattr=,rectime=1333641123
node02
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node02 2.6.27.19-5-nusers=2,idletime=5357510,totmem=16432576kb,availmem=1617ree,jobs=,varattr=,rectime=1333641107
node03
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node03 2.6.27.19-5-s=1,idletime=8564681,totmem=16432576kb,availmem=16029924kobs=60966.hpchead.linux,varattr=,rectime=1333641119
node04
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node04 2.6.27.19-5-01,nusers=0,idletime=8564678,totmem=16432576kb,availmem=1e=free,jobs=,varattr=,rectime=1333641124
node05
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node05 2.6.27.19-5-01,nusers=0,idletime=2072593,totmem=16432652kb,availmem=1=free,jobs=,varattr=,rectime=1333641091
node06
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node06 2.6.27.19-5-s=1,idletime=9038,totmem=16432576kb,availmem=16200960kb,p,varattr=,rectime=1333641096
node07
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node07 2.6.27.19-5-s=1,idletime=8564671,totmem=16432576kb,availmem=16173848kobs=,varattr=,rectime=1333641134
node08
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node08 2.6.27.19-5- 21356,nsessions=5,nusers=1,idletime=8564604,totmem=1643219260329746,state=free,jobs=,varattr=,rectime=1333641095
node09
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node09 2.6.27.19-5-01,nusers=0,idletime=8564648,totmem=16432552kb,availmem=1e=free,jobs=,varattr=,rectime=1333641126
node10
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node10 2.6.27.19-5-2,nsessions=5,nusers=1,idletime=6821493,totmem=16432552kb036941,state=free,jobs=,varattr=,rectime=1333641133
node11
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node11 2.6.27.19-5-01,nusers=0,idletime=8564599,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641120
node12
state = free
np = 8
properties = alpha,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node12 2.6.27.19-5-01,nusers=0,idletime=8564627,totmem=16432556kb,availmem=1e=free,jobs=,varattr=,rectime=1333641121
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-01,nusers=0,idletime=6839072,totmem=20506268kb,availmem=2e=free,jobs=,varattr=,rectime=1333641130
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
status = opsys=linux,uname=Linux node14 2.6.27.19-5-,nusers=1,idletime=8568052,totmem=24635060kb,availmem=206free,jobs=66481.hpchead.linux,varattr=,rectime=1333641132
node15
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
status = opsys=linux,uname=Linux node15 2.6.27.19-5-,nusers=1,idletime=8567636,totmem=24635012kb,availmem=212free,jobs=66482.hpchead.linux,varattr=,rectime=1333641092
node16
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66481.hpchead.linux, 1/66481.hpchead.linux,chead.linux, 6/66481.hpchead.linux, 7/66481.hpchead.linux
status = opsys=linux,uname=Linux node16 2.6.27.19-5-=1,idletime=8564418,totmem=24634928kb,availmem=20700104kbbs=66481.hpchead.linux,varattr=,rectime=1333641117
node17
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
jobs = 0/66482.hpchead.linux, 1/66482.hpchead.linux,chead.linux, 6/66482.hpchead.linux, 7/66482.hpchead.linux
status = opsys=linux,uname=Linux node17 2.6.27.19-5-s=1,idletime=6824915,totmem=24634928kb,availmem=20598068kbs=66482.hpchead.linux,varattr=,rectime=1333641113
node21
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66483.hpchead.linux, 1/66483.hpchead.linux,chead.linux, 6/66483.hpchead.linux, 7/66483.hpchead.linux.hpchead.linux
status = opsys=linux,uname=Linux node21 2.6.27.19-5-,nusers=1,idletime=8569176,totmem=26790348kb,availmem=203e=free,jobs=66483.hpchead.linux,varattr=,rectime=13336411
node22
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66475.hpchead.linux, 1/66475.hpchead.linux,chead.linux, 6/66475.hpchead.linux, 7/66475.hpchead.linux.hpchead.linux
status = opsys=linux,uname=Linux node22 2.6.27.19-5-users=1,idletime=8569178,totmem=26790348kb,availmem=21384free,jobs=66475.hpchead.linux,varattr=,rectime=1333641118
node23
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66484.hpchead.linux, 1/66484.hpchead.linux, 2/66484.hpchead.linux, 3/66484.hpchead.linux, 4/66484.hpchead.linux, 5/66484.hpchead.linux, 6/66484.hpchead.linux, 7/66484.hpchead.linux, 8/66484.hpchead.linux, 9/66484.hpchead.linux, 10/66484.hpchead.linux, 11/66484.hpchead.linux
status = opsys=linux,uname=Linux node23 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=10309 10370,nsessions=2,nusers=1,idletime=8569255,totmem=26790348kb,availmem=20165484kb,physmem=24685876kb,ncpus=12,loadave=12.01,gres=,netload=21742922098,state=free,jobs=66484.hpchead.linux,varattr=,rectime=1333641120
node24
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66485.hpchead.linux, 1/66485.hpchead.linux, 2/66485.hpchead.linux, 3/66485.hpchead.linux, 4/66485.hpchead.linux, 5/66485.hpchead.linux, 6/66485.hpchead.linux, 7/66485.hpchead.linux, 8/66485.hpchead.linux, 9/66485.hpchead.linux, 10/66485.hpchead.linux, 11/66485.hpchead.linux
status = opsys=linux,uname=Linux node24 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=11157 11218,nsessions=2,nusers=1,idletime=8569254,totmem=26790348kb,availmem=21489804kb,physmem=24685876kb,ncpus=12,loadave=12.05,gres=,netload=18486923435,state=free,jobs=66485.hpchead.linux,varattr=,rectime=1333641114
node25
state = job-exclusive
np = 12
properties = blade
ntype = cluster
jobs = 0/66469.hpchead.linux, 1/66469.hpchead.linux, 2/66469.hpchead.linux, 3/66469.hpchead.linux, 4/66469.hpchead.linux, 5/66469.hpchead.linux, 6/66469.hpchead.linux, 7/66469.hpchead.linux, 8/66469.hpchead.linux, 9/66469.hpchead.linux, 10/66469.hpchead.linux, 11/66469.hpchead.linux
status = opsys=linux,uname=Linux node25 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=6711 6772,nsessions=2,nusers=1,idletime=8569282,totmem=26790348kb,availmem=21082316kb,physmem=24685876kb,ncpus=12,loadave=12.00,gres=,netload=15199518313,state=free,jobs=66469.hpchead.linux,varattr=,rectime=1333641095
Edit:
Thanks to all those who have answered.
This should return the correct node value(s)
r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)'
This uses a positive lookahead to peek past the end of line, but not capture anything it finds. It only matches the node value.
l = re.findall(r'node\d+(?=[^\n]*\n\s*state\s*=\s*free)', s)
print l
>>> ['node13']
Edit: Inspired by a comment from @hexparrot, I realized there is a simpler way. This regex r'node\d+(?=\s*state\s*=\s*free)'
is simpler, also works, even though it does not explicitly search for a newline (since the \s
includes EOL characters). HOWEVER... it also does not guarantee that the state=free
will be found on the following line, as stated in the OP's requirements. It would also match node99 state=free
on the same line. So explicitly searching for the \n
better meets the OP's requirements.
['node0']
Adobe 2012-04-05 15:34
[^\n]+
since there won't necessarily be any non-newline characters before the newline - machine yearning 2012-04-05 15:40
['node0', 'node0', 'node0', 'node0', 'node0', 'node0', 'node0', 'node0', 'node0', 'node1', 'node1', 'node1', 'node1']
- so it can't see the second figure - Adobe 2012-04-05 15:43
r
at r'regexp'
before that. Probably that was the mistake (of mine) - Adobe 2012-04-05 15:58
^(node[\d]+).*?(\bstate\b) ?= ?free
hexparrot 2012-04-05 17:18
\n
in the regex is because the requirement called for finding a state==foo
on the following line. Unless I'm mistaken (always possible), the only way to fulfill that requirement is to look for the \n
. Now the horse is officially dead! Cheer - alan 2012-04-08 00:03
Regex is sometimes a little more hefty than is necessary if you can depend on your generated file is dependably constructed (as in, follows the same format as you've shown).
Thus, here's an approach that uses simple iteration:
with open('yourfile.txt', 'r') as fp:
node_dict = {}
node = None
for line in fp:
if line[0:4] == 'node':
node = line.strip()
node_dict[node] = 0
elif "state" in line:
node_dict[node] = line.split('=')[1].strip()
print node_dict
Returns
{'node13': 'free', 'node14': 'job-exclusive'}
It's then very easy to get just the 'free' nodes:
>>> print [k for k,v in node_dict.items() if v == 'free']
['node13']
node_dict["free"][0]
gets you the first free node - kindall 2012-04-05 18:26
I'd suggest parsing the text into a python structure first and then manipulate that structure. Regular expressions are too complicated and too fragile for this job. Consider:
doc = """
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default etc
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
"""
data = {}
lastkey = None
for line in map(str.strip, doc.splitlines()):
if ' = ' in line and lastkey:
k, v = line.split(' = ', 1)
data[lastkey][k] = v
elif len(line):
lastkey = line
data[lastkey] = {}
This creates a dictionary like this:
{'node13': {'np': '8',
'ntype': 'cluster',
'properties': 'beta,eightcores',
'state': 'free',
'status': 'opsys=linux,uname=Linux node13 2.6.27.19-5-default etc'},
'node14': {'np': '8',
'ntype': 'cluster',
'properties': 'beta,eightcores',
'state': 'job-exclusive'}}
which you can manipulate in a normal python way:
free_nodes = [v for v in data.values() if v['state'] == 'free']
You can use the re.DOTALL flag so that .
matches everything including newline. Here is a sample
>>> st="""
node13
state = free
np = 8
properties = beta,eightcores
ntype = cluster
status = opsys=linux,uname=Linux node13 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64,sessions=? 15201,nsessions=? 01,nusers=0,idletime=6837317,totmem=20506268kb,availmem=20259728kb,physmem=20506268kb,ncpus=8,loadave=0.00,gres=,netload=17130666575,se=free,jobs=,varattr=,rectime=1333639375
node14
state = job-exclusive
np = 8
properties = beta,eightcores
ntype = cluster
"""
>>> re.findall("(node\d+).*?state.*?free",st,re.DOTALL)
['node13']
Please note, this can also be done without regex
>>> stlines=st.splitlines()
>>> [stlines[i] for i in xrange(0,len(stlines)-1) if stlines[i+1].partition("=")[-1].strip() == 'free']
['node13']
>>>
Note*** If you need a more robust regex, as Francis have shown in his example you can use the below
>>> re.findall("(node\d+).*?state[ ]*=[ ]*free",st,re.DOTALL)
['node13']
>>>
state.*?free
match the string "statedsjc3(*@N(*RNWNWNSD*S*(Y#N(F#*(DFM#(#N(#$($#(#$N(#(*free"
- machine yearning 2012-04-05 15:38
.*
matches his string with DOTALL on. Your regex is incorrect because it matches a super-set of the desired strings - machine yearning 2012-04-05 15:42
I agree with @thg435 that regex is too powerful for this job. I'd prefer a really simple solution:
lines = data.split('\n')
num_lines = len(lines)
[lines[i] for i in range(numlines - 1) if 'state = free' in lines[i+1]]
This really captures the essence of what you want to do: if the next line (lines[i+1]
) contains the desired text, the current line (presumably the name of the node) goes into the list.
free
... what about the left-hand side - machine yearning 2012-04-05 16:24
It's often easier to look backward than to look ahead. So don't think about getting the current line when the next line contains something; you want to get the previous line when the current line contains something. Framed in these terms, it is easy to conceive and implement:
def find_free_node(doc):
prevline = ""
for line in doc.splitlines():
if line.strip() == "state = free" and previine.startswith("node"):
return prevline.strip()
prevline = line
Another way is to keep track of what node you're in rather than what the previous line was. This will work even if the state = free
line doesn't immediately follow the node name line.
def find_free_node(doc):
node = ""
for line in doc.splitlines():
if line.startswith("node"):
node = line.strip()
elif line.strip() = "state = free" and node:
return node
To me, these are a lot clearer than multiline-regex-based solutions.
state=free
- machine yearning 2012-04-05 15:30