How to remove overlap in numeric ranges (AWK) -
i'm trying remove overlap within file.
- there's bunch of records starts 'a' , have 'start-value' , 'end-value'.
- there's bunch of records start 'b', has range , shows possible overlap records starting 'a'. idea remove overlapping range non-overlapping ranges exist.
some of records in b have identical 'start-value' while others have identical 'end-value' a. so, if has range of 0 - 100 , b has range of 0 - 32. expected output is: 33 - 100 , b 0 - 32.
although have lot of files needs undergo operation, individual files small.
this example file:
a 0 100 101 160 200 300 500 1100 1200 1300 1301 1340 1810 2000 b 0 32 b 500 540 b 1250 1300 b 1319 1340 b 1920 2000
expected sample output
a 33 100 101 160 200 300 541 1100 1200 1249 1301 1318 1810 1919 b 0 32 b 500 540 b 1250 1300 b 1319 1340 b 1920 2000
thanks help!
ok, since op confirmed b 501 540
typo, post answer :)
awk -v ofs="\t" '/^a/{s[nr]=$2;e[nr]=$3;l=nr} /^b/{ for(i=1;i<=l;i++){ if(s[i]==$2){ s[i]=$3+1 break }else if(e[i]==$3){ e[i]=$2-1 break } } s[nr] = $2; e[nr]=$3 } end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i]} ' file
test file (the typo fixed):
kent$ awk -v ofs="\t" '/^a/{s[nr]=$2;e[nr]=$3;l=nr} /^b/{ for(i=1;i<=l;i++){ if(s[i]==$2){ s[i]=$3+1 break }else if(e[i]==$3){ e[i]=$2-1 break } } s[nr] = $2; e[nr]=$3 } end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i]} ' file 33 100 101 160 200 300 541 1100 1200 1249 1301 1318 1810 1919 b 0 32 b 500 540 b 1250 1300 b 1319 1340 b 1920 2000
edit 6 columns:
dirty , quick, pls check below example:
file:
kent$ cat file 0 100 1 2 3 101 160 4 5 6 200 300 7 8 9 500 1100 10 11 12 1200 1300 13 14 15 1301 1340 16 17 18 1810 2000 19 20 21 b 0 32 22 23 24 b 500 540 22 23 24 b 1250 1300 22 23 24 b 1319 1340 22 23 24 b 1920 2000 22 23 24
awk :
kent$ awk -v ofs="\t" '{s[nr]=$2;e[nr]=$3} /^a/{l=nr} /^b/{ for(i=1;i<=l;i++){ if(s[i]==$2){ s[i]=$3+1 break }else if(e[i]==$3){ e[i]=$2-1 break } } } {r[nr]=$4ofs$5ofs$6} end{for(i=1;i<=nr;i++)print ((i<=l)?"a":"b"),s[i],e[i],r[i]} ' file 33 100 1 2 3 101 160 4 5 6 200 300 7 8 9 541 1100 10 11 12 1200 1249 13 14 15 1301 1318 16 17 18 1810 1919 19 20 21 b 0 32 22 23 24 b 500 540 22 23 24 b 1250 1300 22 23 24 b 1319 1340 22 23 24 b 1920 2000 22 23 24
Comments
Post a Comment