java - hadoop hive count concurrency -

July 15, 2014

how implement in hadoop?

in hive, have table lots columns, 2 of them begin_time, end_time.

i need count number on each time

a piece of table this:

begin_time                  end_time 2011.04.26 10:19:06^a2011.04.26 10:20:22 2011.04.26 10:19:08^a2011.04.26 10:21:49 2011.04.26 10:19:08^a2011.04.26 11:18:46 2011.04.26 10:19:09^a2011.04.26 12:08:36 2011.04.26 10:19:09^a2011.04.26 11:00:16 2011.04.26 10:19:11^a2011.04.26 10:19:17 2011.04.26 10:19:12^a2011.04.26 10:46:21 2011.04.26 10:19:13^a2011.04.26 10:55:43 2011.04.26 10:19:17^a2011.04.26 10:19:41 2011.04.26 10:19:18^a2011.04.26 10:34:41

the result want how many people in on specific time.

e.g. on 2011.04.26 10:19:08, there 3 visitor on course there 1 in 19:06, , 2 in 19:08.

and 2011.04.26 10:19:18 9, course ten 1 leave on 2011.04.26 10:19:17

the desired result piece is

2011.04.26 10:19:06 1 2011.04.26 10:19:08 3 2011.04.26 10:19:09 5 2011.04.26 10:19:11 6 2011.04.26 10:19:12 7 2011.04.26 10:19:13 8 2011.04.26 10:19:17 9 2011.04.26 10:19:18 9

any appreciated , welcome.

you can try on hive (assume table name test_log):

select /*+ mapjoin(driven) */ driven.time, count(*)                  (select time             (select begin_time time test_log union        select end_time time test_log) u        group time) driven join test_log l on true     driven.time between l.begin_time , l.end_time group driven.time

probably not best solution @ least works. can add filter on driven subquery reduce data set.

Search This Blog

Three

java - hadoop hive count concurrency -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

c# - Farseer ContactListener is not working -

Automatically create pages in phpfox -