Programatically clean/ignore namespaces in XML - python -
i'm trying write simple program read financial xml files gnucash, , learn python in process.
the xml looks this:
<?xml version="1.0" encoding="utf-8" ?> <gnc-v2 xmlns:gnc="http://www.gnucash.org/xml/gnc" xmlns:act="http://www.gnucash.org/xml/act" xmlns:book="http://www.gnucash.org/xml/book" {...} xmlns:vendor="http://www.gnucash.org/xml/vendor"> <gnc:count-data cd:type="book">1</gnc:count-data> <gnc:book version="2.0.0"> <book:id type="guid">91314601aa6afd17727c44657419974a</book:id> <gnc:count-data cd:type="account">80</gnc:count-data> <gnc:count-data cd:type="transaction">826</gnc:count-data> <gnc:count-data cd:type="budget">1</gnc:count-data> <gnc:commodity version="2.0.0"> <cmdty:space>iso4217</cmdty:space> <cmdty:id>brl</cmdty:id> <cmdty:get_quotes/> <cmdty:quote_source>currency</cmdty:quote_source> <cmdty:quote_tz/> </gnc:commodity>
right now, i'm able iterate , results using
import xml.etree.elementtree et r = et.parse("file.xml").findall('.//')
after manually cleaning namespaces, i'm looking solution either read entries regardless of namespaces or remove namespaces before parsing.
note i'm complete noob in python, , i've read: python , gnucash: extract data gnucash files, cleaning xml file in python before parsing , python: xml.etree.elementtree, removing "namespaces" along elementtree docs , i'm still lost...
i've come solution:
def strip_namespaces(self, tree): nspopen = re.compile("<\w*:", re.ignorecase) nspclose = re.compile("<\/\w*:", re.ignorecase) in tree: start = re.sub(nspopen, '<', tree.tag) end = re.sub(nspopen, '<\/', tree.tag) # pprint(finaltree) return
but i'm failing apply it. can't seem able retrieve tag names appear on file.
Comments
Post a Comment