Upsampling

Upsampling involves increasing the time-frequency of the data, it is a data disaggregation procedure where we break down the time frequency from a higher level to a lower level. For example Breaking down the time-frequency from months to days, or days to hours or hours to seconds. Upsampling usually blows up the size of the data, depending on the sampling frequency. If D is the size of original data and D’ is the size of Upsampled data, then D’ > D

Now, let’s look at an example using Python to perform resampling in time-series data.

Click here to download the practice dataset Detergent sales data.csv used for the implementation.

Example:

Python3




# import the python pandas library
import pandas as pd
  
# read data using read_csv
data = pd.read_csv("Detergent sales data.csv", header=0,
                   index_col=0, parse_dates=True, squeeze=True)


Output:

The detergent sales data shows sales value for the first 6 months. Assume the task here is to predict the value of the daily sales. Given monthly data, we are asked to predict the daily sales data, which signifies the use of Upsampling. 

Python3




# Use resample function to upsample months 
# to days using the mean sales of month
upsampled = data.resample('D').mean()


Output:

The output shows a few samples of the dataset which is upsampled from months to days, based on the mean value of the month. You can also try using sum(), median() that best suits the problem.

The dataset has been upsampled with nan values for the remaining days except for those days which were originally available in our dataset. (total sales data for each month).

Now, we can fill these nan values using a technique called Interpolation. Pandas provide a function called DataFrame.interpolate() for this purpose. Interpolation is a method that involves filling the nan values using one of the techniques like nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’. We will choose “linear” interpolation. This draws a straight line between available data, in this case on the last of the month, and fills in values at the chosen frequency from this line. 

Python3




# use interpolate function with method linear
# to upsample the values of the upsampled days 
# linearly
interpolated = upsampled.interpolate(method='linear')
  
# Printing the linear interpolated values for month 2
print(interpolated['2021-02']) .


Output:

How to Resample Time Series Data in Python?

In time series, data consistency is of prime importance, resampling ensures that the data is distributed with a consistent frequency. Resampling can also provide a different perception of looking at the data, in other words, it can add additional insights about the data based on the resampling frequency.

resample() function: It is a  primarily used for time series data.

Syntax:

# import the python pandas library
import pandas as pd

# syntax for the resample function.
pd.series.resample(rule, axis=0, closed='left',
 convention='start', kind=None, offset=None,
 origin='start_day')

Resampling primarily involves changing the time-frequency of the original observations. The two popular methods of resampling in time series are as follows

  • Upsampling
  • Downsampling

Similar Reads

Upsampling

Upsampling involves increasing the time-frequency of the data, it is a data disaggregation procedure where we break down the time frequency from a higher level to a lower level. For example Breaking down the time-frequency from months to days, or days to hours or hours to seconds. Upsampling usually blows up the size of the data, depending on the sampling frequency. If D is the size of original data and D’ is the size of Upsampled data, then D’ > D...

Upsampling with a polynomial interpolation

...

Downsampling:

...

Contact Us