STUMPY API

Overview

stumpy.stump Compute the z-normalized matrix profile
stumpy.stumped Compute the z-normalized matrix profile with a distributed dask cluster
stumpy.gpu_stump Compute the z-normalized matrix profile with one or more GPU devices
stumpy.scrump Compute an approximate z-normalized matrix profile
stumpy.stumpi Compute an incremental z-normalized matrix profile for streaming data.
stumpy.mstump Compute the multi-dimensional z-normalized matrix profile
stumpy.mstumped Compute the multi-dimensional z-normalized matrix profile with a distributed dask cluster
stumpy.aamp Compute the non-normalized (i.e., without z-normalization) matrix profile
stumpy.aamped Compute the non-normalized (i.e., without z-normalization) matrix profile
stumpy.gpu_aamp Compute the non-normalized (i.e., without z-normalization) matrix profile with one or more GPU devices
stumpy.aampi Compute an incremental non-normalized (i.e., without z-normalization) matrix profile for streaming data
stumpy.atsc Compute the anchored time series chain (ATSC)
stumpy.allc Compute the all-chain set (ALLC)
stumpy.fluss Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
stumpy.floss Compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data

stump

stumpy.stump(T_A, m, T_B=None, ignore_trivial=True)[source]

Compute the z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.

Parameters:
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.

Return type:

ndarray

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

The above reference outlines the use of the Pearson correlation via Welford’s centered sum-of-products along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.

DOI: 10.1109/ICDM.2016.0085

See Table II

Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.

Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]-m+1. Additionally, the left and right matrix profiles are also returned.

Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).

For self-joins, set ignore_trivial = True in order to avoid the trivial match.

Note that left and right matrix profiles are only available for self-joins.

stumped

stumpy.stumped(dask_client, T_A, m, T_B=None, ignore_trivial=True)[source]

Compute the z-normalized matrix profile with a distributed dask cluster

This is a highly distributed implementation around the Numba JIT-compiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.

Parameters:
  • dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.

Return type:

ndarray

Notes

DOI: 10.1007/s10115-017-1138-x

See Section 4.5

The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a row-wise fashion.

DOI: 10.1145/3357223.3362721

See Section 3.1 and Section 3.3

The above reference outlines the use of the Pearson correlation via Welford’s centered sum-of-products along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.

DOI: 10.1109/ICDM.2016.0085

See Table II

This is a Dask distributed implementation of stump that scales across multiple servers and is a convenience wrapper around the parallelized stump._stump function

Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.

Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]-m+1. Additionally, the left and right matrix profiles are also returned.

Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).

For self-joins, set ignore_trivial = True in order to avoid the trivial match.

Note that left and right matrix profiles are only available for self-joins.

gpu-stump

stumpy.gpu_stump(*args, **kwargs)

Compute the z-normalized matrix profile with one or more GPU devices

This is a convenience wrapper around the Numba cuda.jit _gpu_stump function which computes the matrix profile according to GPU-STOMP.

Parameters:
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B ((optional) ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
  • device_id (int or list) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in cuda.list_devices()].
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.

Return type:

ndarray

Notes

DOI: 10.1109/ICDM.2016.0085

See Table II, Figure 5, and Figure 6

Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.

Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]-m+1. Additionally, the left and right matrix profiles are also returned.

Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.

Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).

For self-joins, set ignore_trivial = True in order to avoid the trivial match.

Note that left and right matrix profiles are only available for self-joins.

scrump

stumpy.scrump(T_A, m, T_B=None, ignore_trivial=True, percentage=0.01, pre_scrump=False, s=None)[source]

Compute an approximate z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized _stump function which computes the matrix profile according to SCRIMP.

Parameters:
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded.
  • m (int) – Window size
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
  • percentage (float) – Approximate percentage completed. The value is between 0.0 and 1.0.
  • pre_scrump (bool) – A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to True, this is equivalent to computing SCRIMP++ and may lead to faster convergence
  • s (int) – The size of the PreSCRIMP fixed interval. If pre-scrump=True and s=None, then s will automatically be set to s=int(np.ceil(m/4)), the size of the exclusion zone.
stumpy.P_

The updated matrix profile

Type:ndarray
stumpy.I_

The updated matrix profile indices

Type:ndarray
stumpy.update()

Update the matrix profile and the matrix profile indices by computing additional new distances (limited by percentage) that make up the full distance matrix.

Notes

DOI: 10.1109/ICDM.2018.00099

See Algorithm 1 and Algorithm 2

stumpi

stumpy.stumpi(T, m, excl_zone=None, egress=True)[source]

Compute an incremental z-normalized matrix profile for streaming data. This is based on the on-line STOMPI and STAMPI algorithms.

Parameters:
  • T (ndarray) – The time series or sequence for which the matrix profile and matrix profile indices will be returned
  • m (int) – Window size
  • excl_zone (int) – The half width for the exclusion zone relative to the current sliding window
stumpy.P_

The updated matrix profile for T

Type:ndarray
stumpy.I_

The updated matrix profile indices for T

Type:ndarray
stumpy.left_P_

The updated left matrix profile for T

Type:ndarray
stumpy.left_I_

The updated left matrix profile indices for T

Type:ndarray
stumpy.T_

The updated time series or sequence for which the matrix profile and matrix profile indices are computed

Type:ndarray
stumpy.update(t)

Append a single new data point, t, to the time series, T, and update the matrix profile

Notes

DOI: 10.1007/s10618-017-0519-9

See Table V

Note that line 11 is missing an important sqrt operation!

mstump

stumpy.mstump(T, m, include=None, discords=False)[source]

Compute the multi-dimensional z-normalized matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized _mstump function which computes the multi-dimensional matrix profile and multi-dimensional matrix profile index according to mSTOMP, a variant of mSTAMP. Note that only self-joins are supported.

Parameters:
  • T (ndarray) – The time series or sequence for which to compute the multi-dimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
  • m (int) – Window size
  • include (list, ndarray) –

    A list of (zero-based) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:

    DOI: 10.1109/ICDM.2017.66

  • discords (bool) – When set to True, this reverses the distance matrix which results in a multi-dimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
Returns:

  • P (ndarray) – The multi-dimensional matrix profile. Each column of the array corresponds to each matrix profile for a given dimension (i.e., the first column is the 1-D matrix profile and the second column is the 2-D matrix profile).
  • I (ndarray) – The multi-dimensional matrix profile index where each column of the array corresponds to each matrix profile index for a given dimension.

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

mstumped

stumpy.mstumped(dask_client, T, m, include=None, discords=False)[source]

Compute the multi-dimensional z-normalized matrix profile with a distributed dask cluster

This is a highly distributed implementation around the Numba JIT-compiled parallelized _mstump function which computes the multi-dimensional matrix profile according to STOMP. Note that only self-joins are supported.

Parameters:
  • dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
  • T (ndarray) – The time series or sequence for which to compute the multi-dimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
  • m (int) – Window size
  • include (list, ndarray) –

    A list of (zero-based) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:

    DOI: 10.1109/ICDM.2017.66

  • discords (bool) – When set to True, this reverses the distance matrix which results in a multi-dimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
Returns:

  • P (ndarray) – The multi-dimensional matrix profile. Each column of the array corresponds to each matrix profile for a given dimension (i.e., the first column is the 1-D matrix profile and the second column is the 2-D matrix profile).
  • I (ndarray) – The multi-dimensional matrix profile index where each column of the array corresponds to each matrix profile index for a given dimension.

Notes

DOI: 10.1109/ICDM.2017.66

See mSTAMP Algorithm

aamp

stumpy.aamp(T_A, m, T_B=None, ignore_trivial=True)[source]

Compute the non-normalized (i.e., without z-normalization) matrix profile

This is a convenience wrapper around the Numba JIT-compiled parallelized _aamp function which computes the matrix profile according to AAMP.

Parameters:
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices.

Return type:

ndarray

Notes

arXiv:1901.05708

See Algorithm 1

Note that we have extended this algorithm for AB-joins as well.

aamped

stumpy.aamped(dask_client, T_A, m, T_B=None, ignore_trivial=True)[source]

Compute the non-normalized (i.e., without z-normalization) matrix profile

This is a highly distributed implementation around the Numba JIT-compiled parallelized _aamp function which computes the non-normalized matrix profile according to AAMP.

Parameters:
  • dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices.

Return type:

ndarray

Notes

arXiv:1901.05708

See Algorithm 1

Note that we have extended this algorithm for AB-joins as well.

gpu_aamp

stumpy.gpu_aamp(*args, **kwargs)

Compute the non-normalized (i.e., without z-normalization) matrix profile with one or more GPU devices

This is a convenience wrapper around the Numba cuda.jit _gpu_aamp function which computes the non-normalized matrix profile according to modified version GPU-STOMP.

Parameters:
  • T_A (ndarray) – The time series or sequence for which to compute the matrix profile
  • m (int) – Window size
  • T_B ((optional) ndarray) – The time series or sequence that contain your query subsequences of interest. Default is None which corresponds to a self-join.
  • ignore_trivial (bool) – Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
  • device_id (int or list) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPU-STUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in cuda.list_devices()].
Returns:

out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.

Return type:

ndarray

Notes

arXiv:1901.05708

See Algorithm 1

Note that we have extended this algorithm for AB-joins as well.

DOI: 10.1109/ICDM.2016.0085

See Table II, Figure 5, and Figure 6

aampi

stumpy.aampi(T, m, excl_zone=None, egress=True)[source]

Compute an incremental non-normalized (i.e., without z-normalization) matrix profile for streaming data

Parameters:
  • T (ndarray) – The time series or sequence for which the non-normalized matrix profile and matrix profile indices will be returned
  • m (int) – Window size
  • excl_zone (int) – The half width for the exclusion zone relative to the current sliding window
stumpy.P_

The updated matrix profile for T

Type:ndarray
stumpy.I_

The updated matrix profile indices for T

Type:ndarray
stumpy.left_P_

The updated left matrix profile for T

Type:ndarray
stumpy.left_I_

The updated left matrix profile indices for T

Type:ndarray
stumpy.T_

The updated time series or sequence for which the matrix profile and matrix profile indices are computed

Type:ndarray
stumpy.update(t)

Append a single new data point, t, to the time series, T, and update the matrix profile

Notes

arXiv:1901.05708

See Algorithm 1

Note that we have extended this algorithm for AB-joins as well.

atsc

stumpy.atsc(IL, IR, j)[source]

Compute the anchored time series chain (ATSC)

Parameters:
  • IL (ndarray) – Left matrix profile indices
  • IR (ndarray) – Right matrix profile indices
  • j (int) – The index value for which to compute the ATSC
Returns:

output – Anchored time series chain for index, j

Return type:

ndarray

Notes

DOI: 10.1109/ICDM.2017.79

See Table I

This is the implementation for the anchored time series chains (ATSC).

Unlike the original paper, we’ve replaced the while-loop with a more stable for-loop.

allc

stumpy.allc(IL, IR)[source]

Compute the all-chain set (ALLC)

Parameters:
  • IL (ndarray) – Left matrix profile indices
  • IR (ndarray) – Right matrix profile indices
Returns:

  • S (list(ndarray)) – All-chain set
  • C (ndarray) – Anchored time series chain for the longest chain (also known as the unanchored chain)

Notes

DOI: 10.1109/ICDM.2017.79

See Table II

Unlike the original paper, we’ve replaced the while-loop with a more stable for-loop.

This is the implementation for the all-chain set (ALLC) and the unanchored chain is simply the longest one among the all-chain set. Both the all-chain set and unanchored chain are returned.

The all-chain set, S, is returned as a list of unique numpy arrays.

fluss

stumpy.fluss(I, L, n_regimes, excl_factor=5, custom_iac=None)[source]

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)

Essentially, this is a wrapper to compute the corrected arc curve and regime locations.

Parameters:
  • I (ndarray) – The matrix profile indices for the time series of interest
  • L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
  • n_regimes (int) – The number of regimes to search for. This is one more than the number of regime changes as denoted in the original paper.
  • m (int) – The subsequence length. This is expected to be the same value as the window size used to compute the matrix profile and matrix profile index.
  • excl_factor (int) – The multiplying factor for the regime exclusion zone
  • custom_iac (np.array) – A custom idealized arc curve (IAC) that will used for correcting the arc curve
Returns:

  • cac (ndarray) – A corrected arc curve (CAC)
  • regime_locs (ndarray) – The locations of the regimes

Notes

DOI: 10.1109/ICDM.2017.21

See Section A

This is the implementation for Fast Low-cost Unipotent Semantic Segmentation (FLUSS).

floss

stumpy.floss(mp, T, m, L, excl_factor=5, n_iter=1000, n_samples=1000, custom_iac=None)[source]

Compute the Fast Low-cost Online Semantic Segmentation (FLOSS) for streaming data

Parameters:
  • mp (ndarray) – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
  • T (ndarray) – A 1-D time series data used to generate the matrix profile and matrix profile indices found in mp. Note that the the right matrix profile index is used and the right matrix profile is intelligently recomputed on the fly from T instead of using the bidirectional matrix profile.
  • m (int) – The window size for computing sliding window mass. This is identical to the window size used in the matrix profile calculation. For managing edge effects, see the L parameter.
  • L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
  • excl_factor (int) – The multiplying factor for the regime exclusion zone. Note that this is unrelated to the excl_zone used in to compute the matrix profile.
  • n_iter (int) – Number of iterations to average over when determining the parameters for the IAC beta distribution
  • n_samples (int) – Number of distribution samples to draw during each iteration when computing the IAC
  • custom_iac (np.array) – A custom idealized arc curve (IAC) that will used for correcting the arc curve
stumpy.cac_1d_

A 1-dimensional corrected arc curve (CAC) updated as a result of ingressing a single new data point and egressing a single old data point.

Type:ndarray
stumpy.P_

The matrix profile updated as a result of ingressing a single new data point and egressing a single old data point.

Type:ndarray
stumpy.I_

The (right) matrix profile indices updated as a result of ingressing a single new data point and egressing a single old data point.

Type:ndarray
stumpy.T_

The updated time series, T

Type:ndarray
stumpy.update(t)

Ingress a new data point, t, onto the time series, T, followed by egressing the oldest single data point from T. Then, update the 1-dimensional corrected arc curve (CAC_1D) and the matrix profile.

Notes

DOI: 10.1109/ICDM.2017.21 <https://www.cs.ucr.edu/~eamonn/Segmentation_ICDM.pdf>`__

See Section C

This is the implementation for Fast Low-cost Online Semantic Segmentation (FLOSS).