dataframe - R table manipulation -
i have data.frame below
product=c(rep("a",4),rep("b",2)) ww1=c(201438,201440,201444,201446,201411,201412) ww2=ww1-6 diff=rep(6,6) demand=rep(100,6) df=data.frame(product,ww1,ww2,diff,demand) df<- df[with(df,order(product, ww1)),] df product ww1 ww2 diff demand 1 201438 201432 6 100 2 201440 201434 6 100 3 201444 201438 6 100 4 201446 201440 6 100 5 b 201411 201405 6 100 6 b 201412 201406 6 100
i want add rows based upon conditions below.
for row in data, if product on earlier row same product on current row, ww1 on earlier row not same ww1-1 on current row (basically ww1 difference 1), add new row.
for newly added row:
product same product on earlier row. ww1 ww1 on earlier row + 1 ww2 ww2 on earlier row + 1 ww_diff 6 demand 0
the final output need below:
product ww1 ww2 ww_diff demand 201438 201432 6 100 201439 201433 6 0 201440 201434 6 100 201441 201435 6 0 201442 201436 6 100 201443 201437 6 0 201444 201438 6 100 201445 201439 6 0 201446 201440 6 100 b 201411 201405 6 100 b 201412 201406 6 100
as of thinking of writing macro in excel, slow , therefore prefer r solution
update1===============================
how add column seq? column 1 earliest entry of ww1 of every product , increments 1.
product ww1 ww2 ww_diff demand seq 201438 201432 6 100 1 201439 201433 6 0 2 201440 201434 6 100 3 201441 201435 6 0 4 201442 201436 6 100 5 201443 201437 6 0 6 201444 201438 6 100 7 201445 201439 6 0 8 201446 201440 6 100 9 b 201411 201405 6 100 1 b 201412 201406 6 100 2
update2=======================================================
i posting questions again (i unchecked accepted answer of alistaire answer not working on original data, works on small sample of data :(
in below solution user alistaire, df3 <- right_join(df, data.frame(ww1=ww1big))
seem causing issue.
in final solution, prefer if columns specified names. way won't forced arrange columns in particular order.
here's similar data.table
solution assume should more efficient i'm minimizing calculations per group , using binary join instead.
library(data.table) setkey(setdt(df), product, ww1) ## sorting `product` , `ww1` indx <- setkey(df[, list(ww1 = seq.int(ww1[1l], ww1[.n], = 1l)), = product]) ## running `seq.int` on `ww1` per group res <- df[indx][is.na(ww2), `:=`(ww2 = ww1 - 6l, diff = 6l, demand = 0l)] ## filling missing values res[, seq := seq_len(.n), = product] # adding index res # product ww1 ww2 diff demand seq # 1: 201438 201432 6 100 1 # 2: 201439 201433 6 0 2 # 3: 201440 201434 6 100 3 # 4: 201441 201435 6 0 4 # 5: 201442 201436 6 0 5 # 6: 201443 201437 6 0 6 # 7: 201444 201438 6 100 7 # 8: 201445 201439 6 0 8 # 9: 201446 201440 6 100 9 # 10: b 201411 201405 6 100 1 # 11: b 201412 201406 6 100 2
Comments
Post a Comment