sql - how do I mark one row and store its particular value in Hive using standard Query or using UDF? -
i need write query in hive or define function needs followings:
the dataset:
student || time || comuputerpool ------------------------------------- || 9:15am || pool1.machine2 ------------------------------------- || 9:45am || pool1.machine7 ------------------------------------- || 10:15am|| pool1.machine9 ------------------------------------- || 11:00am|| pool2.machine2 ------------------------------------- || 12:05 || pool2.machine3 ------------------------------------- || 12:40 || pool3.machine5 ------------------------------------- || 13:10 || pool1.machine3 ------------------------------------- || 13:50 || pool1.machine10 ------------------------------------- b ..........................
so query should find out how long particular student has spent in particular computer pool calculating difference of when first used machine in pool , when first start using machine in pool. example time spent difference of : 11:00am - 9:15am = 1hour45mins
my question here how going mark first use in 1 store time value , use later when find next pool data.
hive supports row_number()
, can use trick grouping sequential values. difference between 2 row numbers defines group, can used aggregation. resulting query looks like:
select student, grp, min(time), max(time) (select t.*, (row_number() on (partition student order time) - row_number() on (partition student, computerpool order time) ) grp dataset t ) t group student, grp, computerpool;
edit:
your question pools not machines. so, care part before period, presumably. (by way, should store information in separate columns).
select student, substr(computerpool, 1, 6), min(time), max(time) (select t.*, (row_number() on (partition student order time) - row_number() on (partition student, substr(computerpool, 1, 6) order time) ) grp dataset t ) t group student, grp, substr(computerpool, 1, 6);
Comments
Post a Comment