java - hadoop hive count concurrency -
how implement in hadoop?
in hive, have table lots columns, 2 of them begin_time, end_time.
i need count number on each time
a piece of table this:
begin_time end_time 2011.04.26 10:19:06^a2011.04.26 10:20:22 2011.04.26 10:19:08^a2011.04.26 10:21:49 2011.04.26 10:19:08^a2011.04.26 11:18:46 2011.04.26 10:19:09^a2011.04.26 12:08:36 2011.04.26 10:19:09^a2011.04.26 11:00:16 2011.04.26 10:19:11^a2011.04.26 10:19:17 2011.04.26 10:19:12^a2011.04.26 10:46:21 2011.04.26 10:19:13^a2011.04.26 10:55:43 2011.04.26 10:19:17^a2011.04.26 10:19:41 2011.04.26 10:19:18^a2011.04.26 10:34:41
the result want how many people in on specific time.
e.g. on 2011.04.26 10:19:08, there 3 visitor on course there 1 in 19:06, , 2 in 19:08.
and 2011.04.26 10:19:18 9, course ten 1 leave on 2011.04.26 10:19:17
the desired result piece is
2011.04.26 10:19:06 1 2011.04.26 10:19:08 3 2011.04.26 10:19:09 5 2011.04.26 10:19:11 6 2011.04.26 10:19:12 7 2011.04.26 10:19:13 8 2011.04.26 10:19:17 9 2011.04.26 10:19:18 9
any appreciated , welcome.
you can try on hive (assume table name test_log):
select /*+ mapjoin(driven) */ driven.time, count(*) (select time (select begin_time time test_log union select end_time time test_log) u group time) driven join test_log l on true driven.time between l.begin_time , l.end_time group driven.time
probably not best solution @ least works. can add filter on driven subquery reduce data set.
Comments
Post a Comment