{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multidimensional Motif Discovery\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/TDAmeritrade/stumpy/main?filepath=notebooks/Tutorial_Multidimensional_Motif_Discovery.ipynb)\n", "\n", "## Finding a Motif in Multidimensional Time Series Data with MSTUMP\n", "\n", "This tutorial utilizes the main takeaways from the [Matrix Profile VI](https://www.cs.ucr.edu/~eamonn/Motif_Discovery_ICDM.pdf) research paper and requires STUMPY v1.6.1 or newer. Also, the word “dimensionality” is overloaded for multi-dimensional time series since it is often used to refer to both the number of time series and to the number of data points in a subsequence. For clarity, we restrict our use of \"dimensions\" to refer only to the number of time series and not to the number of data points.\n", "\n", "Previously, we had introduced a concept called [time series motifs](https://stumpy.readthedocs.io/en/latest/Tutorial_STUMPY_Basics.html), which are conserved patterns found within a 1-dimensional time series, $T$, that can be discovered by computing its [matrix profile](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html) using STUMPY. This process of computing a matrix profile with one time series is commonly known as a \"self-join\" since the subsequences within time series $T$ are only being compared with itself. Since the first 1-dimensional motif discovery algorithm was introduced in 2002, a lot of effort has been made to generalize motif-finding to the multi-dimensional case but producing multi-dimensional matrix profiles are computationally expensive and so extra care must be taken to minimize the added time complexity. Also, while it may be tempting to find motifs in all available dimensions (i.e., a motif must exist in all dimensions and occur simultaneously), it has been shown that this rarely produces meaningful motifs except in the most contrived situations. Instead, given a set of time series dimensions, we should filter them down to a subset of \"useful\" dimensions before assigning a subsequence as a motif. For example, take a look at this motion capture of a boxer throwing some punches:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import IFrame\n", "\n", "IFrame(width=\"560\", height=\"315\", src=\"https://www.youtube.com/embed/2CQttFf2OhU\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we strictly focus on the boxer’s right arm in both cases, the two punches are almost identical. The position of the right shoulder, right elbow, and right hand (a three dimensional time series) are virtually the indistinguishable. So, identifying this punching motif is relatively straightforward when we limit ourselves to only a subset of all of the available body movement dimensions. However, if we incorporate the full set of motion capture markers from all of the limbs (i.e., increasing the number of dimensions from three), the differences captured by the left arm and the subtle noise in the footwork actually drowns out the the similarity of the right arm motions, making the previous punching motif impossible to find. This example demonstrates how classic multidimensional motif discovery algorithms are likely to fail since they try to use all of the available dimensions. So, not only do we need an efficient algorithm for computing the multi-dimensional matrix profile but we also need to establish an informed approach to guide us in selecting the relevant subset of dimensions that are to be used in identifying multi-dimensional motifs.\n", "\n", "In this tutorial, we will explain precisely what a multi-dimensional matrix profile is and then we'll learn how to compute it using the mstump function (i.e., \"multi-dimensional STUMP\") by exploring a simple toy dataset. To conclude, we'll see if we can identify a meaningful sub-dimensional motif (i.e., that only uses a subset of dimensions) in this multi-dimensional time series data.\n", "\n", "## Getting Started\n", "\n", "Let's import the packages that we'll need to load, analyze, and plot the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import stumpy\n", "import matplotlib.pyplot as plt\n", "\n", "plt.style.use('https://raw.githubusercontent.com/TDAmeritrade/stumpy/main/docs/stumpy.mplstyle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading and Visualizing the Toy Data\n", "\n", "In this example data, we have a 3-dimensional time series labeled T1, T2, and T3. Can you spot where the motif is? Does that motif exist in one, two, or all three dimensions?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T1T2T3
00.5651170.6371800.741822
10.4935130.6294150.739731
20.4693500.5392200.718757
30.4441000.5776700.730169
40.3730080.5701800.752406
\n", "