pyparsing multiple lines optional missing data in result set -
i quite new pyparsing user , have missing match don't understand:
here text parse:
polraw=""" set policy id 800 "untrust" "trust" "ip_10.124.10.6" "mip(10.0.2.175)" "tcp_1002" permit set policy id 800 set dst-address "mip(10.0.2.188)" set service "tcp_1002-1005" set log session-init exit set policy id 724 "trust" "untrust" "ip_10.16.14.28" "ip_10.24.10.6" "tcp_1002" permit set policy id 724 set src-address "ip_10.162.14.38" set dst-address "ip_10.3.28.38" set service "tcp_1002-1005" set log session-init exit set policy id 233 name "the name 527 ;" "untrust" "trust" "ip_10.24.108.6" "mip(10.0.2.149)" "tcp_1002" permit set policy id 233 set service "tcp_1002-1005" set service "tcp_1006-1008" set service "tcp_1786" set log session-init exit """
i setup grammar way:
kpol = suppress(keyword('set policy id')) num = regex(r'\d+') ksvc = suppress(keyword('set service')) ksrc = suppress(keyword('set src-address')) kdst = suppress(keyword('set dst-address')) svc = dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) addr = dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) exit = suppress(keyword('exit')) eol = lineend().suppress() p_svc = ksvc + svc + eol p_src = ksrc + addr + eol p_dst = kdst + addr + eol x = kpol + num('pid') + eol + optional(zeroormore(p_svc)) + optional(zeroormore(p_src)) + optional(zeroormore(p_dst)) z in x.searchstring(polraw): print z
result set such as
['800', 'mip(10.0.2.188)'] ['724', 'ip_10.162.14.38', 'ip_10.3.28.38'] ['233', 'tcp_1002-1005', 'tcp_1006-1008', 'tcp_1786']
the 800 missing service tag ???
what's wrong here.
thanks advance laurent
the problem seeing in expression, dst's looked after having skipped on optional svc's , src's. have couple of options, i'll go through each can sense of going on here.
(but first, there no point in writing "optional(zeroormore(anything))" - zeroormore implies optional, i'm going drop optional part in of these choices.)
if going svc's, src's, , dst's in order, refactor zeroormore accept of 3 data types, this:
x = kpol + num('pid') + eol + zeroormore(p_svc|p_src|p_dst)
this allow intermix different types of statements, , collected part of zeroormore repetition.
if want keep these different types of statements in groups, can add results name each:
x = kpol + num('pid') + eol + zeroormore(p_svc("svc*")| p_src("src*")| p_dst("dst*"))
note trailing '*' on each name - equivalent calling setresultsname listallmatches argument equal true. each different expression matched, results different types collected "svc", "src", or "dst" results name. calling z.dump()
list tokens , results names , values, can see how works.
set policy id 233 set service "tcp_1002-1005" set dst-address "ip_10.3.28.38" set service "tcp_1006-1008" set service "tcp_1786" set log session-init exit
shows z.dump()
:
['233', 'tcp_1002-1005', 'ip_10.3.28.38', 'tcp_1006-1008', 'tcp_1786'] - pid: 233 - dst: [['ip_10.3.28.38']] - svc: [['tcp_1002-1005'], ['tcp_1006-1008'], ['tcp_1786']]
if wrap ungroup on p_xxx expressions, maybe this:
p_svc,p_src,p_dst = (ungroup(expr) expr in (p_svc,p_src,p_dst))
then output cleaner-looking:
['233', 'tcp_1002-1005', 'ip_10.3.28.38', 'tcp_1006-1008', 'tcp_1786'] - pid: 233 - dst: ['ip_10.3.28.38'] - svc: ['tcp_1002-1005', 'tcp_1006-1008', 'tcp_1786']
this looking pretty good, let me pass on 1 other option. there number of cases parsers have several sub-expressions in order. let's a,b,c, , d. accept these in order, write oneormore(a|b|c|d)
, accept multiple a's, or a, b, , c, not d. exhaustive/exhausting combinatorial explosion of (a+b+c+d) | (a+b+d+c) | etc. written, or maybe automate
from itertools import permutations mixnmatch = matchfirst(and(p) p in permutations((a,b,c,d),4))
but there class in pyparsing called each allows write same kind of thing:
each([a,b,c,d])
meaning "must have 1 each of a, b, c, , d, in order". , and, or, notany, etc., there operator shortcut too:
a & b & c & d
which means same thing.
if want "must have a, b, , c, , optionally d", write:
a & b & c & optional(d)
and parse same kind of behavior, looking a, b, c, , d, regardless of incoming order, , whether d last or mixed in a, b, , c. can use oneormore , zeroormore indicate optional repetition of of expressions.
so write expression as:
x = kpol + num('pid') + eol + (zeroormore(p_svc) & zeroormore(p_src) & zeroormore(p_dst))
i looked @ using results names expression, , zeroormore's seem confusing things, maybe still bug in how done. may have reserve using each more basic cases a,b,c,d example. wanted make aware of it.
some other notes on parser:
dblquotedstring.setparseaction(lambda t: t[0].replace('"',''))
better written dblquotedstring.setparseaction(removequotes)
. don't have embedded quotes in examples, it's aware of assumptions might not translate future application. here couple of ways of removing defining quotes:
dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \ , ending quote \' # removed leading , trailing "s, internal ones too, # part of quoted string dblquotedstring.setparseaction(lambda t: t[0].strip('"')) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \" , ending quote \' # removed leading , trailing "s, , leaves 1 internal ones strips off # escaped ending quote dblquotedstring.setparseaction(removequotes) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \" , ending quote \"' # removes leading , trailing " characters, leaves escaped "s in place
kpol = suppress(keyword('set policy id'))
bit fragile, break if there spaces between 'set' , 'policy', or between 'policy' , 'id'. define these kind of expressions first defining keywords individually:
set,policy,id,service,src_address,dst_address,exit = map(keyword, "set policy id service src-address dst-address exit".split())
and define separate expressions using:
ksvc = suppress(set + service) ksrc = suppress(set + src_address) kdst = suppress(set + dst_address)
now parser cleanly handle whitespace (or comments!) between individual keywords in expressions.
Comments
Post a Comment