{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fast Pattern Matching\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/TDAmeritrade/stumpy/main?filepath=notebooks/Tutorial_Pattern_Matching.ipynb)\n", "\n", "## Beyond Matrix Profiles\n", "\n", "At the core of STUMPY, one can take any time series data and efficiently compute something called a [matrix profile](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html), which essentially scans along your entire time series with a fixed window size, `m`, and finds the exact nearest neighbor for every subsequence within your time series. A matrix profile allows you to determine if there are any conserved behaviors (i.e., conserved subsequences/patterns) within your data and, if so, it can tell you exactly where they are located within your time series. In a [previous tutorial](https://stumpy.readthedocs.io/en/latest/Tutorial_STUMPY_Basics.html), we demonstrated how to use STUMPY to easily obtain a matrix profile, learned how to interpret the results, and discover meaningful motifs and discords. While this brute-force approach may be very useful when you don't know what pattern or conserved behavior you are looking but, for sufficiently large datasets, it can become quite expensive to perform this exhaustive pairwise search. \n", "\n", "However, if you already have a specific user defined pattern in mind then you don't actually need to compute the full matrix profile! For example, maybe you've identified an interesting trading strategy based on historical stock market data and you'd like to see if that specific pattern may have been observed in the past within one or more stock ticker symbols. In that case, searching for a known pattern or \"query\" is actually quite straightforward and can be accomplished quickly by using the wonderful `stumpy.mass` function in STUMPY.\n", "\n", "In this short tutorial, we'll take a simple known pattern of interest (i.e., a query subsequence) and we'll search for this pattern in a separate independent time series. Let's get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started\n", "\n", "Let's import the packages that we'll need to load, analyze, and plot the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd\n", "import stumpy\n", "import numpy as np\n", "import numpy.testing as npt\n", "import matplotlib.pyplot as plt\n", "from matplotlib.patches import Rectangle\n", "\n", "plt.style.use('https://raw.githubusercontent.com/TDAmeritrade/stumpy/main/docs/stumpy.mplstyle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the Sony AIBO Robot Dog Dataset\n", "\n", "The time series data (below), `T_df`, has `n = 13000` data points and it was collected from an accelerometer inside of a [Sony AIBO robot dog](https://en.wikipedia.org/wiki/AIBO) where it tracked the robot dog when it was walking from a cement surface onto a carpeted surface and, finally, back to the cement surface:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Acceleration | \n", "
---|---|
0 | \n", "0.89969 | \n", "
1 | \n", "0.89969 | \n", "
2 | \n", "0.89969 | \n", "
3 | \n", "0.89969 | \n", "
4 | \n", "0.89969 | \n", "