pyparsing multiple lines optional missing data in result set -

i quite new pyparsing user , have missing match don't understand:

here text parse:

polraw=""" set policy id 800 "untrust" "trust"  "ip_10.124.10.6" "mip(10.0.2.175)" "tcp_1002" permit set policy id 800 set dst-address "mip(10.0.2.188)" set service "tcp_1002-1005" set log session-init exit set policy id 724 "trust" "untrust"  "ip_10.16.14.28" "ip_10.24.10.6" "tcp_1002" permit set policy id 724 set src-address "ip_10.162.14.38" set dst-address "ip_10.3.28.38" set service "tcp_1002-1005" set log session-init exit set policy id 233 name "the name 527 ;" "untrust" "trust"  "ip_10.24.108.6" "mip(10.0.2.149)" "tcp_1002" permit set policy id 233 set service "tcp_1002-1005" set service "tcp_1006-1008" set service "tcp_1786" set log session-init exit  """

i setup grammar way:

kpol  = suppress(keyword('set policy id')) num   = regex(r'\d+') ksvc  = suppress(keyword('set service')) ksrc  = suppress(keyword('set src-address')) kdst  = suppress(keyword('set dst-address')) svc    = dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) addr   = dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) exit  = suppress(keyword('exit')) eol = lineend().suppress()  p_svc = ksvc + svc + eol p_src = ksrc + addr + eol p_dst = kdst + addr + eol  x = kpol + num('pid') + eol + optional(zeroormore(p_svc)) + optional(zeroormore(p_src)) + optional(zeroormore(p_dst))   z in x.searchstring(polraw):     print z

result set such as

['800', 'mip(10.0.2.188)'] ['724', 'ip_10.162.14.38', 'ip_10.3.28.38'] ['233', 'tcp_1002-1005', 'tcp_1006-1008', 'tcp_1786']

the 800 missing service tag ???

what's wrong here.

thanks advance laurent

the problem seeing in expression, dst's looked after having skipped on optional svc's , src's. have couple of options, i'll go through each can sense of going on here.

(but first, there no point in writing "optional(zeroormore(anything))" - zeroormore implies optional, i'm going drop optional part in of these choices.)

if going svc's, src's, , dst's in order, refactor zeroormore accept of 3 data types, this:

x = kpol + num('pid') + eol + zeroormore(p_svc|p_src|p_dst)

this allow intermix different types of statements, , collected part of zeroormore repetition.

if want keep these different types of statements in groups, can add results name each:

x = kpol + num('pid') + eol + zeroormore(p_svc("svc*")|                                          p_src("src*")|                                          p_dst("dst*"))

note trailing '*' on each name - equivalent calling setresultsname listallmatches argument equal true. each different expression matched, results different types collected "svc", "src", or "dst" results name. calling z.dump() list tokens , results names , values, can see how works.

set policy id 233 set service "tcp_1002-1005" set dst-address "ip_10.3.28.38" set service "tcp_1006-1008" set service "tcp_1786" set log session-init exit

shows z.dump():

['233', 'tcp_1002-1005', 'ip_10.3.28.38', 'tcp_1006-1008', 'tcp_1786'] - pid: 233 - dst: [['ip_10.3.28.38']] - svc: [['tcp_1002-1005'], ['tcp_1006-1008'], ['tcp_1786']]

if wrap ungroup on p_xxx expressions, maybe this:

p_svc,p_src,p_dst = (ungroup(expr) expr in (p_svc,p_src,p_dst))

then output cleaner-looking:

['233', 'tcp_1002-1005', 'ip_10.3.28.38', 'tcp_1006-1008', 'tcp_1786'] - pid: 233 - dst: ['ip_10.3.28.38'] - svc: ['tcp_1002-1005', 'tcp_1006-1008', 'tcp_1786']

this looking pretty good, let me pass on 1 other option. there number of cases parsers have several sub-expressions in order. let's a,b,c, , d. accept these in order, write oneormore(a|b|c|d), accept multiple a's, or a, b, , c, not d. exhaustive/exhausting combinatorial explosion of (a+b+c+d) | (a+b+d+c) | etc. written, or maybe automate

from itertools import permutations mixnmatch = matchfirst(and(p) p in permutations((a,b,c,d),4))

but there class in pyparsing called each allows write same kind of thing:

each([a,b,c,d])

meaning "must have 1 each of a, b, c, , d, in order". , and, or, notany, etc., there operator shortcut too:

a & b & c & d

which means same thing.

if want "must have a, b, , c, , optionally d", write:

a & b & c & optional(d)

and parse same kind of behavior, looking a, b, c, , d, regardless of incoming order, , whether d last or mixed in a, b, , c. can use oneormore , zeroormore indicate optional repetition of of expressions.

so write expression as:

x = kpol + num('pid') + eol + (zeroormore(p_svc) &                                 zeroormore(p_src) &                                 zeroormore(p_dst))

i looked @ using results names expression, , zeroormore's seem confusing things, maybe still bug in how done. may have reserve using each more basic cases a,b,c,d example. wanted make aware of it.

some other notes on parser:

dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) better written dblquotedstring.setparseaction(removequotes). don't have embedded quotes in examples, it's aware of assumptions might not translate future application. here couple of ways of removing defining quotes:

dblquotedstring.setparseaction(lambda t: t[0].replace('"','')) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \ , ending quote \' # removed leading , trailing "s, internal ones too,  # part of quoted string  dblquotedstring.setparseaction(lambda t: t[0].strip('"')) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \" , ending quote \' # removed leading , trailing "s, , leaves 1 internal ones strips off # escaped ending quote  dblquotedstring.setparseaction(removequotes) print dblquotedstring.parsestring(r'"this embedded quote \" , ending quote \""')[0] # prints 'this embedded quote \" , ending quote \"' # removes leading , trailing " characters, leaves escaped "s in place

kpol = suppress(keyword('set policy id')) bit fragile, break if there spaces between 'set' , 'policy', or between 'policy' , 'id'. define these kind of expressions first defining keywords individually:

set,policy,id,service,src_address,dst_address,exit = map(keyword,     "set policy id service src-address dst-address exit".split())

and define separate expressions using:

ksvc  = suppress(set + service) ksrc  = suppress(set + src_address) kdst  = suppress(set + dst_address)

now parser cleanly handle whitespace (or comments!) between individual keywords in expressions.

Search This Blog

Three

pyparsing multiple lines optional missing data in result set -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -