sql - how do I mark one row and store its particular value in Hive using standard Query or using UDF? -


i need write query in hive or define function needs followings:

the dataset:

student || time    || comuputerpool -------------------------------------       ||  9:15am ||  pool1.machine2 -------------------------------------       ||  9:45am ||  pool1.machine7 -------------------------------------       ||  10:15am||  pool1.machine9 -------------------------------------       ||  11:00am||  pool2.machine2 -------------------------------------       ||  12:05  ||  pool2.machine3 -------------------------------------       ||  12:40  ||  pool3.machine5 -------------------------------------        ||  13:10  ||  pool1.machine3 -------------------------------------       ||  13:50  ||  pool1.machine10 -------------------------------------   b   .......................... 

so query should find out how long particular student has spent in particular computer pool calculating difference of when first used machine in pool , when first start using machine in pool. example time spent difference of : 11:00am - 9:15am = 1hour45mins

my question here how going mark first use in 1 store time value , use later when find next pool data.

hive supports row_number(), can use trick grouping sequential values. difference between 2 row numbers defines group, can used aggregation. resulting query looks like:

select student, grp, min(time), max(time) (select t.*,              (row_number() on (partition student order time) -               row_number() on (partition student, computerpool order time)              ) grp       dataset t      ) t group student, grp, computerpool; 

edit:

your question pools not machines. so, care part before period, presumably. (by way, should store information in separate columns).

select student, substr(computerpool, 1, 6), min(time), max(time) (select t.*,              (row_number() on (partition student order time) -               row_number() on (partition student, substr(computerpool, 1, 6) order time)              ) grp       dataset t      ) t group student, grp, substr(computerpool, 1, 6); 

Comments

Popular posts from this blog

Java 3D LWJGL collision -

spring - SubProtocolWebSocketHandler - No handlers -

methods - python can't use function in submodule -