Dynamic Time Warping (DTW)

Dynamic Time Warping (DTW) is a prominent similarity metric in time-series analysis, particularly when the data sets are of varying durations or exhibit phase changes or time warping. DTW, unlike Euclidean distance, allows for non-linear warping of the time axis to suit analogous patterns in time-series data sets. DTW is commonly used in speech recognition, signal processing, and finance.

DTW is a technique for discovering the optimum alignment between two time-series data sets by estimating the cumulative distance between each pair of related data points and calculating the shortest distance path through the cumulative distance matrix. The generated least distance path represents the optimal alignment.

The mathematical representation for Dynamic Time Warping (DTW) can be interpreted as follows:

Let call ‘time_series_A’ and ‘time_series_B’ be two-time series data sets with lengths of ‘n’ and ‘m’, respectively. The DTW distance between the first ‘i’th elements of ‘time_series_A’ and the first ‘j’th elements of ‘time-series_B’ denoted by ‘dtw_matrix [i,j]’.

Let us now apply the following recurrence relation:

D[0,0] = 0
D[i,j] = cost(i,j) + min(D[i-1,j], D[i,j-1], D[i-1,j-1])

where ‘cost(i,j)’ is the cost of aligning the ‘i’th element of ‘time_series_A’ with the ‘j’th element of ‘time_series_B’, and may be computed as the absolute difference between the two values, i.e., ‘cost(i,j) = abs(time_series_A[i-1] – time_series_B[j-1])’.

In Python, here’s an example of computing Dynamic Time Warping (DTW) distance between two time-series data sets:

Python3

import numpy as np
 
# Let us now define two time-series data sets
time_series_A = np.array([7, 8, 9, 15])
time_series_B = np.array([4, 6, 7, 3])
 
# Computing the DTW distance between the two time-series data sets
def dtw_distance(time_series_A, time_series_B):
    n = len(time_series_A)
    m = len(time_series_B)
    dtw_matrix = np.zeros((n+1, m+1))
    for i in range(1, n+1):
        for j in range(1, m+1):
            # Computing the cost by using the above mathematical formula &
            # Finding the absolute difference between two values
            cost = abs(time_series_A[i-1] - time_series_B[j-1])
            dtw_matrix[i, j] = cost + min(dtw_matrix[i-1, j],
                                          dtw_matrix[i, j-1],
                                          dtw_matrix[i-1, j-1])
    return dtw_matrix[n, m]
 
dtw_distance = dtw_distance(time_series_A, time_series_B)
print("Dynamic Time Warping (DTW) distance :",dtw_distance)

Output:

Dynamic Time Warping (DTW) distance : 15.0

The Dynamic Time Warping (DTW) distance is estimated using a dynamic programming technique, where a cost matrix is built to keep track of the accumulated costs of all potential pathways. The option with the lowest cost is chosen, and the total cost along that path equals the Dynamic Time Warping (DTW) distance between the two time-series data sets.

The advantages of Dynamic Time Warping (DTW) are as follows:

Resistant to time series scaling, shifting, and warping;
can handle time series of varying forms;
commonly used in voice and gesture detection.

The limitations of Dynamic Time Warping (DTW) are as follows:

For lengthy time series,
it is computationally costly;
it is susceptible to noise and outliers.

Similarity Search for Time-Series Data

Time-series analysis is a statistical approach for analyzing data that has been structured through time. It entails analyzing past data to detect patterns, trends, and anomalies, then applying this knowledge to forecast future trends. Time-series analysis has several uses, including in finance, economics, engineering, and the healthcare industry.

Time-series datasets are collections of data points that are recorded over time, such as stock prices, weather patterns, or sensor readings. In many real-world applications, it is often necessary to compare multiple time-series datasets to find similarities or differences between them.

Similarity search, which includes determining the degree to which similarities exist between two or more time-series data sets, is a fundamental task in time-series analysis. This is an essential phase in a variety of applications, including anomaly detection, clustering, and forecasting. In anomaly detection, for example, we may wish to find data points that differ considerably from the predicted trend. In clustering, we could wish to combine time-series data sets that have similar patterns, but in forecasting, we might want to discover the most comparable past data to reliably anticipate future trends.

In time-series analysis, there are numerous approaches for searching for similarities, including the Euclidean distance, dynamic time warping (DTW), and shape-based methods like the Fourier transform and Symbolic Aggregate ApproXimation (SAX). The approach chosen is determined by the individual purpose, the scope and complexity of the data collection, and the amount of noise and outliers in the data.

Although time-series analysis and similarity search are strong tools, they are not without their drawbacks. Handling missing data, dealing with big and complicated data sets, and selecting appropriate similarity metrics, can be challenging. Yet, these obstacles may be addressed with thorough data preparation and the selection of relevant procedures.

Types of similarity measures

Time-series analysis is the process of reviewing previous data to detect patterns, trends, and anomalies and then utilizing this knowledge to forecast future trends. Similarity search, which includes determining the degree to which similarities exist among two or more time-series data sets, is an essential problem in time-series analysis.

Similarity metrics, which quantify the degree to which there is similarity or dissimilarity among two time-series data sets, are critical in this endeavor. This article will go through the several types of similarity metrics that are often employed in time-series analysis.