analysis - How do I pair up rows of network traffic data in R or SPSS? -

August 15, 2012

i have lots of silk flow data data mining on. looks destination ip column matches source ip column of row of data further down. rows (with many more columns) this:

 uid sip dip protocol    sport   dport 720107626538    1207697420  3232248333  17  53  7722 720108826800    3232248333  1207697420  17  47904   53

i have never programmed in r or spss , having trouble figuring out how turn 2 rows of 27 columns of data 1 row of 54 columns of data.

you can corresponding sip , dip records on same line through merge:

df <- data.frame(   "uid" = c(720107626538, 720108826800),   "sip" = c(1207697420, 3232248333),   "dip" = c(3232248333, 1207697420),   "protocol" = c(17, 17),   "sport" = c(53, 47904),   "dport" = c(7722, 53),   stringsasfactors = false)  df_merged <- merge(   df[,setdiff(colnames(df), "dip")],   df[,setdiff(colnames(df), "sip")],   by.x = "sip",   by.y = "dip",   = false,   suffixes = c("_sip", "_dip"))

after that, can use uid fields remove duplicates:

for(i in 2:nrow(df_merged)) {   ind <- df_merged$uid_dip   ind[i] <- df_merged$uid_sip[i]   df_merged <- df_merged[!duplicated(ind),] }  df_merged  df_merged          sip      uid_sip protocol_sip sport_sip dport_sip      uid_dip protocol_dip sport_dip dport_dip 1 1207697420 720107626538           17        53      7722 720108826800           17     47904        53

because de-duping relies on loop, whole thing time-consuming if dataset large.

Search This Blog

Three

analysis - How do I pair up rows of network traffic data in R or SPSS? -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -