Programatically clean/ignore namespaces in XML - python -


i'm trying write simple program read financial xml files gnucash, , learn python in process.

the xml looks this:

<?xml version="1.0" encoding="utf-8" ?> <gnc-v2      xmlns:gnc="http://www.gnucash.org/xml/gnc"      xmlns:act="http://www.gnucash.org/xml/act"      xmlns:book="http://www.gnucash.org/xml/book"      {...}      xmlns:vendor="http://www.gnucash.org/xml/vendor"> <gnc:count-data cd:type="book">1</gnc:count-data> <gnc:book version="2.0.0"> <book:id type="guid">91314601aa6afd17727c44657419974a</book:id> <gnc:count-data cd:type="account">80</gnc:count-data> <gnc:count-data cd:type="transaction">826</gnc:count-data> <gnc:count-data cd:type="budget">1</gnc:count-data> <gnc:commodity version="2.0.0">   <cmdty:space>iso4217</cmdty:space>   <cmdty:id>brl</cmdty:id>   <cmdty:get_quotes/>   <cmdty:quote_source>currency</cmdty:quote_source>   <cmdty:quote_tz/> </gnc:commodity> 

right now, i'm able iterate , results using

import xml.etree.elementtree et  r = et.parse("file.xml").findall('.//')  

after manually cleaning namespaces, i'm looking solution either read entries regardless of namespaces or remove namespaces before parsing.

note i'm complete noob in python, , i've read: python , gnucash: extract data gnucash files, cleaning xml file in python before parsing , python: xml.etree.elementtree, removing "namespaces" along elementtree docs , i'm still lost...

i've come solution:

def strip_namespaces(self, tree):      nspopen = re.compile("<\w*:", re.ignorecase)     nspclose = re.compile("<\/\w*:", re.ignorecase)      in tree:         start = re.sub(nspopen, '<', tree.tag)                   end = re.sub(nspopen, '<\/', tree.tag)      # pprint(finaltree)     return 

but i'm failing apply it. can't seem able retrieve tag names appear on file.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -