Impute missing observations R -
i have dataframe of calendar days , hourly observations associated values so:
starttime hour delay 04-22 0 10 04-22 1 10 04-22 3 10 04-22 4 10
what's needed hours of day , 0 associated said missing hours instead of no value or na. how best achieved? have attempted full_join dplyr dummy data frame, method seems clunky , inefficient.
in short, need this:
starttime hour delay 04-22 0 10 04-22 1 10 04-22 2 0 04-22 3 10 04-22 4 10
you use data.table
efficiently join dataset. convert 'data.frame' 'data.table(
setdt(df1)), set key columns (
setkey(.., starttime, hour)`), join combination of unique values of 'starttime' , 'hour', , replace na values in delay '0'
library(data.table) setkey(setdt(df1), starttime, hour)[cj(starttime=unique(starttime), hour=min(hour):max(hour))][is.na(delay), delay:=0l] # starttime hour delay #1: 04-22 0 10 #2: 04-22 1 10 #3: 04-22 2 0 #4: 04-22 3 10 #5: 04-22 4 10
or using merge/expand.grid
base r
, can above result
merge(expand.grid(starttime=unique(df1$starttime), hour= min(df1$hour):max(df1$hour)), df1, all.x=true)
Comments
Post a Comment