python - `pyparsing`: iterating over `ParsedResults` -
i've started using pyparsing evening , i've built complex grammar describes sources i'm working effectively. easy , powerful. however, i'm having trouble working parsedresults. need able iterate on nested tokens in order they're found, , i'm finding little frustrating. i've abstracted problem simple case:
import pyparsing pp word = pp.word(pp.alphas + ',.')('word*') direct_speech = pp.suppress('“') + pp.group(pp.oneormore(word))('direct_speech*') + pp.suppress('”') sentence = pp.group(pp.oneormore(word | direct_speech))('sentence') test_string = 'lorem ipsum “dolor sit” amet, consectetur.' r = sentence.parsestring(test_string) print r.asxml('div') print '' name, item in r.sentence.items(): print name, item print '' item in r.sentence: print item.getname(), item.aslist() as far can see, ought work? here output:
<div> <sentence> <word>lorem</word> <word>ipsum</word> <direct_speech> <word>dolor</word> <word>sit</word> </direct_speech> <word>amet,</word> <word>consectetur.</word> </sentence> </div> word ['lorem', 'ipsum', 'amet,', 'consectetur.'] direct_speech [['dolor', 'sit']] traceback (most recent call last): file "./test.py", line 27, in <module> print item.getname(), item.aslist() attributeerror: 'str' object has no attribute 'getname' the xml output seems indicate string parsed wish, can't iterate on sentence, example, reconstruct it.
is there way need to?
thanks!
edit:
i've been using this:
for item in r.sentence: if isinstance(item, basestring): print item else: print item.getname(), item but doesn't me much, because can't distinguish different types of string. here expanded example:
word = pp.word(pp.alphas + ',.')('word*') number = pp.word(pp.nums + ',.')('number*') direct_speech = pp.suppress('“') + pp.group(pp.oneormore(word | number))('direct_speech*') + pp.suppress('”') sentence = pp.group(pp.oneormore(word | number | direct_speech))('sentence') test_string = 'lorem 14 ipsum “dolor 22 sit” amet, consectetur.' r = sentence.parsestring(test_string) i, item in enumerate(r.sentence): if isinstance(item, basestring): print i, item else: print i, item.getname(), item the output is:
0 lorem 1 14 2 ipsum 3 word ['dolor', '22', 'sit'] 4 amet, 5 consectetur. not helpful. can't distinguish between word , number, , direct_speech element labelled word?!
i'm missing something. want is:
for item in r.sentence: if (item number): elif (item word): else etc. ... should approaching differently?
r.sentence contains mix of strings , parseresults, , parseresults support getname(). have tried iterating on r.sentence? if print out using aslist(), get:
['lorem', 'ipsum', ['dolor', 'sit'], 'amet,', 'consectetur.'] or snippet:
for item in r.sentence: print type(item),item.aslist() if isinstance(item,pp.parseresults) else item gives:
<type 'str'> lorem <type 'str'> ipsum <class 'pyparsing.parseresults'> ['dolor', 'sit'] <type 'str'> amet, <type 'str'> consectetur. i'm not sure answered question, shed light on go next?
(welcome pyparsing)
Comments
Post a Comment