Im parsing the source of a website and Im using this regex:
/page\.php\?id\=([0-9]*)\"\>(.*)\<\/a\>\<\/span\>/.match(self.agent.page.content)
self.agent.page.content
contains the source of the page fetched by mechanize. The regex basicly works but in the secound match it does fetch more then it should because there are more then one <\/a\>\<\/span\>
in the source and the regex uses the last one so I get a bunch of html crap. How can I tell the regex to use the first match as an "end marker"?
.* is greedy, whereas .*? is non-greedy. Try:
/page\.php\?id\=([0-9]*)\"\>(.*?)\<\/a\>\<\/span\>/.match(self.agent.page.content)