Python pandas time series interpolation and regularization -
i using python pandas first time. have 5-min lag traffic data in csv format:
... 2015-01-04 08:29:05,271238 2015-01-04 08:34:05,329285 2015-01-04 08:39:05,-1 2015-01-04 08:44:05,260260 2015-01-04 08:49:05,263711 ...
there several issues:
- for timestamps there's missing data (-1)
- missing entries (also 2/3 consecutive hours)
- the frequency of observations not 5 minutes, loses seconds once in while
i obtain regular time series, entries every (exactly) 5 minutes (and no missing valus). have interpolated time series following code approximate -1 values code:
ts = pd.timeseries(values, index=timestamps) ts.interpolate(method='cubic', downcast='infer')
how can both interpolate , regularize frequency of observations? thank help.
change -1
s nans:
ts[ts==-1] = np.nan
then resample data have 5 minute frequency.
ts = ts.resample('5t')
note that, default, if 2 measurements fall within same 5 minute period, resample
averages values together.
finally, linearly interpolate time series according time:
ts = ts.interpolate(method='time')
since looks data has 5-minute frequency, might need resample @ shorter frequency cubic or spline interpolation can smooth out curve:
import numpy np import pandas pd import matplotlib.pyplot plt values = [271238, 329285, -1, 260260, 263711] timestamps = pd.to_datetime(['2015-01-04 08:29:05', '2015-01-04 08:34:05', '2015-01-04 08:39:05', '2015-01-04 08:44:05', '2015-01-04 08:49:05']) ts = pd.series(values, index=timestamps) ts[ts==-1] = np.nan ts = ts.resample('t', how='mean') ts.interpolate(method='spline', order=3).plot() ts.interpolate(method='time').plot() lines, labels = plt.gca().get_legend_handles_labels() labels = ['spline', 'time'] plt.legend(lines, labels, loc='best') plt.show()
Comments
Post a Comment