How to check for end of tag in Python using minidom?

Go To StackoverFlow.com

0

I'm trying to create an expression from an XML. Reading from top node I want to put the node one by one into a stack, once I hit a closing tag I want to pop all elements in the stack. How do I check the end of a tag ?.

TIA,

John

Answer:

OK, I think I've the solution, using a recursive function like this:

def findTextNodes(nodeList):
    for subnode in nodeList:
        if subnode.nodeType == subnode.ELEMENT_NODE:
            print("element node: ",subnode.tagName)
            # call function again to get children
            findTextNodes(subnode.childNodes)
            print('subnode return: ', subnode.tagName)
        elif subnode.nodeType == subnode.TEXT_NODE:
            print("text node: ",subnode.data)

When the 'subnode return' it hits the closing tag!.

Thanks everybody!.

2012-04-03 19:59
by JohnX


1

minidom builds the whole DOM in memory. Therefore it will not inform you when a end tag is encountered

1) You can consider swtich to http://docs.python.org/library/pyexpat.html and use the xmlparser.EndElementHandler to watch for the end tag. You will also need to use StartElementHandler to build your stack.

2) Take advantage of the DOM tree that minidom produces: Just select the nodes from it. (without any use of stack)

2012-04-03 20:09
by Anthony Kong
Hey Anthony, 1) unfortunately this is a inherited project I can't change to other module for now. 2) the XML format doesn't have a static format can be any repeated format - JohnX 2012-04-03 20:12
@JohnX If it is the case, you might wanna check this one out: http://stackoverflow.com/questions/1596829/xml-parsing-with-python-and-minido - Anthony Kong 2012-04-03 20:38
Thanks Anthony!. I've edited my post to include a solution - JohnX 2012-04-03 21:09


1

minidom builds a DOM. There aren't tags in a DOM, as the XML has been fully parsed into nodes. A node in the DOM represents the entire XML element.

What it sounds like you want are simply the node's children (or children of type ELEMENT_NODE perhaps).

Since you're talking about pushing them onto and popping them off of a stack, it sounds like you want them in the reverse of the order in which they appear in the document. In which case you probably want something like reversed([child for child in node.childNodes if child.nodeType == child.ELEMENT_NODE]).

If you want all children (including the node's children's children and so on) then a recursive solution is simplest.

2012-04-03 20:35
by kindall
Yeah I was thinking it has something like libXml XMLELEMENTDECL but it doesn't, anyway I think I've found a solution. Thanks Kindall! - JohnX 2012-04-03 21:09
Ads