analysis - How do I pair up rows of network traffic data in R or SPSS? -
i have lots of silk flow data data mining on. looks destination ip column matches source ip column of row of data further down. rows (with many more columns) this:
uid sip dip protocol sport dport 720107626538 1207697420 3232248333 17 53 7722 720108826800 3232248333 1207697420 17 47904 53
i have never programmed in r or spss , having trouble figuring out how turn 2 rows of 27 columns of data 1 row of 54 columns of data.
you can corresponding sip , dip records on same line through merge
:
df <- data.frame( "uid" = c(720107626538, 720108826800), "sip" = c(1207697420, 3232248333), "dip" = c(3232248333, 1207697420), "protocol" = c(17, 17), "sport" = c(53, 47904), "dport" = c(7722, 53), stringsasfactors = false) df_merged <- merge( df[,setdiff(colnames(df), "dip")], df[,setdiff(colnames(df), "sip")], by.x = "sip", by.y = "dip", = false, suffixes = c("_sip", "_dip"))
after that, can use uid fields remove duplicates:
for(i in 2:nrow(df_merged)) { ind <- df_merged$uid_dip ind[i] <- df_merged$uid_sip[i] df_merged <- df_merged[!duplicated(ind),] } df_merged df_merged sip uid_sip protocol_sip sport_sip dport_sip uid_dip protocol_dip sport_dip dport_dip 1 1207697420 720107626538 17 53 7722 720108826800 17 47904 53
because de-duping relies on loop, whole thing time-consuming if dataset large.
Comments
Post a Comment