How to remove overlap in numeric ranges (AWK) -

July 15, 2012

i'm trying remove overlap within file.

there's bunch of records starts 'a' , have 'start-value' , 'end-value'.
there's bunch of records start 'b', has range , shows possible overlap records starting 'a'. idea remove overlapping range non-overlapping ranges exist.

some of records in b have identical 'start-value' while others have identical 'end-value' a. so, if has range of 0 - 100 , b has range of 0 - 32. expected output is: 33 - 100 , b 0 - 32.

although have lot of files needs undergo operation, individual files small.

this example file:

a   0       100   101     160    200     300   500     1100   1200    1300   1301    1340   1810    2000 b   0       32 b   500     540 b   1250    1300 b   1319    1340 b   1920    2000

expected sample output

a   33      100   101     160    200     300   541     1100   1200    1249   1301    1318   1810    1919 b   0       32 b   500     540 b   1250    1300 b   1319    1340 b   1920    2000

thanks help!

ok, since op confirmed b 501 540 typo, post answer :)

awk -v ofs="\t" '/^a/{s[nr]=$2;e[nr]=$3;l=nr} /^b/{          for(i=1;i<=l;i++){                 if(s[i]==$2){                         s[i]=$3+1                         break                 }else if(e[i]==$3){                         e[i]=$2-1                         break                 }         }         s[nr] = $2; e[nr]=$3 } end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i]}         ' file

test file (the typo fixed):

kent$  awk -v ofs="\t" '/^a/{s[nr]=$2;e[nr]=$3;l=nr} /^b/{          for(i=1;i<=l;i++){                 if(s[i]==$2){                         s[i]=$3+1                         break                 }else if(e[i]==$3){                         e[i]=$2-1                         break                 }         }         s[nr] = $2; e[nr]=$3 } end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i]}         ' file           33      100           101     160           200     300           541     1100           1200    1249           1301    1318           1810    1919     b       0       32     b       500     540     b       1250    1300     b       1319    1340     b       1920    2000

edit 6 columns:

dirty , quick, pls check below example:

file:

kent$  cat file   0       100 1 2 3   101     160 4 5 6   200     300 7 8 9   500     1100 10 11 12   1200    1300 13 14 15   1301    1340 16 17 18   1810    2000 19 20 21 b   0       32  22 23 24 b   500     540 22 23 24 b   1250    1300 22 23 24 b   1319    1340 22 23 24 b   1920    2000 22 23 24

awk :

kent$  awk -v ofs="\t" '{s[nr]=$2;e[nr]=$3} /^a/{l=nr} /^b/{          for(i=1;i<=l;i++){                 if(s[i]==$2){                         s[i]=$3+1                         break                 }else if(e[i]==$3){                         e[i]=$2-1                         break                 }         } } {r[nr]=$4ofs$5ofs$6} end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i],r[i]} ' file       33      100     1       2       3       101     160     4       5       6       200     300     7       8       9       541     1100    10      11      12       1200    1249    13      14      15       1301    1318    16      17      18       1810    1919    19      20      21 b       0       32      22      23      24 b       500     540     22      23      24 b       1250    1300    22      23      24 b       1319    1340    22      23      24 b       1920    2000    22      23      24

Search This Blog

Three

How to remove overlap in numeric ranges (AWK) -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -