java - hadoop hive count concurrency -


how implement in hadoop?

in hive, have table lots columns, 2 of them begin_time, end_time.

i need count number on each time

a piece of table this:

begin_time                  end_time 2011.04.26 10:19:06^a2011.04.26 10:20:22 2011.04.26 10:19:08^a2011.04.26 10:21:49 2011.04.26 10:19:08^a2011.04.26 11:18:46 2011.04.26 10:19:09^a2011.04.26 12:08:36 2011.04.26 10:19:09^a2011.04.26 11:00:16 2011.04.26 10:19:11^a2011.04.26 10:19:17 2011.04.26 10:19:12^a2011.04.26 10:46:21 2011.04.26 10:19:13^a2011.04.26 10:55:43 2011.04.26 10:19:17^a2011.04.26 10:19:41 2011.04.26 10:19:18^a2011.04.26 10:34:41 

the result want how many people in on specific time.

e.g. on 2011.04.26 10:19:08, there 3 visitor on course there 1 in 19:06, , 2 in 19:08.

and 2011.04.26 10:19:18 9, course ten 1 leave on 2011.04.26 10:19:17

the desired result piece is

2011.04.26 10:19:06 1 2011.04.26 10:19:08 3 2011.04.26 10:19:09 5 2011.04.26 10:19:11 6 2011.04.26 10:19:12 7 2011.04.26 10:19:13 8 2011.04.26 10:19:17 9 2011.04.26 10:19:18 9 

any appreciated , welcome.

you can try on hive (assume table name test_log):

select /*+ mapjoin(driven) */ driven.time, count(*)                  (select time             (select begin_time time test_log union        select end_time time test_log) u        group time) driven join test_log l on true     driven.time between l.begin_time , l.end_time group driven.time 

probably not best solution @ least works. can add filter on driven subquery reduce data set.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -