STUMPY API¶
Overview
stumpy.stump 
Compute the znormalized matrix profile 
stumpy.stumped 
Compute the znormalized matrix profile with a distributed dask cluster 
stumpy.gpu_stump 
Compute the znormalized matrix profile with one or more GPU devices 
stumpy.scrump 
Compute an approximate znormalized matrix profile 
stumpy.stumpi 
Compute an incremental znormalized matrix profile for streaming data. 
stumpy.mstump 
Compute the multidimensional znormalized matrix profile 
stumpy.mstumped 
Compute the multidimensional znormalized matrix profile with a distributed dask cluster 
stumpy.aamp 
Compute the nonnormalized (i.e., without znormalization) matrix profile 
stumpy.aamped 
Compute the nonnormalized (i.e., without znormalization) matrix profile 
stumpy.gpu_aamp 
Compute the nonnormalized (i.e., without znormalization) matrix profile with one or more GPU devices 
stumpy.aampi 
Compute an incremental nonnormalized (i.e., without znormalization) matrix profile for streaming data 
stumpy.atsc 
Compute the anchored time series chain (ATSC) 
stumpy.allc 
Compute the allchain set (ALLC) 
stumpy.fluss 
Compute the Fast Lowcost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing) 
stumpy.floss 
Compute the Fast Lowcost Online Semantic Segmentation (FLOSS) for streaming data 
stump¶

stumpy.
stump
(T_A, m, T_B=None, ignore_trivial=True)[source]¶ Compute the znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.
Parameters:  T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
Return type: ndarray
Notes
DOI: 10.1007/s101150171138x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a rowwise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sumofproducts along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
stumped¶

stumpy.
stumped
(dask_client, T_A, m, T_B=None, ignore_trivial=True)[source]¶ Compute the znormalized matrix profile with a distributed dask cluster
This is a highly distributed implementation around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.
Parameters:  dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
 T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
Return type: ndarray
Notes
DOI: 10.1007/s101150171138x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a rowwise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sumofproducts along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
This is a Dask distributed implementation of stump that scales across multiple servers and is a convenience wrapper around the parallelized stump._stump function
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
gpustump¶

stumpy.
gpu_stump
(*args, **kwargs)¶ Compute the znormalized matrix profile with one or more GPU devices
This is a convenience wrapper around the Numba cuda.jit _gpu_stump function which computes the matrix profile according to GPUSTOMP.
Parameters:  T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B ((optional) ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
 device_id (int or list) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in cuda.list_devices()].
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
Return type: ndarray
Notes
See Table II, Figure 5, and Figure 6
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
scrump¶

stumpy.
scrump
(T_A, m, T_B=None, ignore_trivial=True, percentage=0.01, pre_scrump=False, s=None)[source]¶ Compute an approximate znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to SCRIMP.
Parameters:  T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded.
 m (int) – Window size
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
 percentage (float) – Approximate percentage completed. The value is between 0.0 and 1.0.
 pre_scrump (bool) – A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to True, this is equivalent to computing SCRIMP++ and may lead to faster convergence
 s (int) – The size of the PreSCRIMP fixed interval. If prescrump=True and s=None, then s will automatically be set to s=int(np.ceil(m/4)), the size of the exclusion zone.

stumpy.
P_
¶ The updated matrix profile
Type: ndarray

stumpy.
I_
¶ The updated matrix profile indices
Type: ndarray

stumpy.
update
()¶ Update the matrix profile and the matrix profile indices by computing additional new distances (limited by percentage) that make up the full distance matrix.
Notes
See Algorithm 1 and Algorithm 2
stumpi¶

stumpy.
stumpi
(T, m, excl_zone=None, egress=True)[source]¶ Compute an incremental znormalized matrix profile for streaming data. This is based on the online STOMPI and STAMPI algorithms.
Parameters: 
stumpy.
P_
The updated matrix profile for T
Type: ndarray

stumpy.
I_
The updated matrix profile indices for T
Type: ndarray

stumpy.
left_P_
¶ The updated left matrix profile for T
Type: ndarray

stumpy.
left_I_
¶ The updated left matrix profile indices for T
Type: ndarray

stumpy.
T_
¶ The updated time series or sequence for which the matrix profile and matrix profile indices are computed
Type: ndarray

stumpy.
update
(t) Append a single new data point, t, to the time series, T, and update the matrix profile
Notes
DOI: 10.1007/s1061801705199
See Table V
Note that line 11 is missing an important sqrt operation!

mstump¶

stumpy.
mstump
(T, m, include=None, discords=False)[source]¶ Compute the multidimensional znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _mstump function which computes the multidimensional matrix profile and multidimensional matrix profile index according to mSTOMP, a variant of mSTAMP. Note that only selfjoins are supported.
Parameters:  T (ndarray) – The time series or sequence for which to compute the multidimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
 m (int) – Window size
 include (list, ndarray) –
A list of (zerobased) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:
 discords (bool) – When set to True, this reverses the distance matrix which results in a multidimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
Returns:  P (ndarray) – The multidimensional matrix profile. Each column of the array corresponds to each matrix profile for a given dimension (i.e., the first column is the 1D matrix profile and the second column is the 2D matrix profile).
 I (ndarray) – The multidimensional matrix profile index where each column of the array corresponds to each matrix profile index for a given dimension.
Notes
See mSTAMP Algorithm
mstumped¶

stumpy.
mstumped
(dask_client, T, m, include=None, discords=False)[source]¶ Compute the multidimensional znormalized matrix profile with a distributed dask cluster
This is a highly distributed implementation around the Numba JITcompiled parallelized _mstump function which computes the multidimensional matrix profile according to STOMP. Note that only selfjoins are supported.
Parameters:  dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
 T (ndarray) – The time series or sequence for which to compute the multidimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
 m (int) – Window size
 include (list, ndarray) –
A list of (zerobased) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:
 discords (bool) – When set to True, this reverses the distance matrix which results in a multidimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
Returns:  P (ndarray) – The multidimensional matrix profile. Each column of the array corresponds to each matrix profile for a given dimension (i.e., the first column is the 1D matrix profile and the second column is the 2D matrix profile).
 I (ndarray) – The multidimensional matrix profile index where each column of the array corresponds to each matrix profile index for a given dimension.
Notes
See mSTAMP Algorithm
aamp¶

stumpy.
aamp
(T_A, m, T_B=None, ignore_trivial=True)[source]¶ Compute the nonnormalized (i.e., without znormalization) matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _aamp function which computes the matrix profile according to AAMP.
Parameters:  T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices.
Return type: ndarray
Notes
See Algorithm 1
Note that we have extended this algorithm for ABjoins as well.
aamped¶

stumpy.
aamped
(dask_client, T_A, m, T_B=None, ignore_trivial=True)[source]¶ Compute the nonnormalized (i.e., without znormalization) matrix profile
This is a highly distributed implementation around the Numba JITcompiled parallelized _aamp function which computes the nonnormalized matrix profile according to AAMP.
Parameters:  dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
 T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices.
Return type: ndarray
Notes
See Algorithm 1
Note that we have extended this algorithm for ABjoins as well.
gpu_aamp¶

stumpy.
gpu_aamp
(*args, **kwargs)¶ Compute the nonnormalized (i.e., without znormalization) matrix profile with one or more GPU devices
This is a convenience wrapper around the Numba cuda.jit _gpu_aamp function which computes the nonnormalized matrix profile according to modified version GPUSTOMP.
Parameters:  T_A (ndarray) – The time series or sequence for which to compute the matrix profile
 m (int) – Window size
 T_B ((optional) ndarray) – The time series or sequence that contain your query subsequences of interest. Default is None which corresponds to a selfjoin.
 ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
 device_id (int or list) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in cuda.list_devices()].
Returns: out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
Return type: ndarray
Notes
See Algorithm 1
Note that we have extended this algorithm for ABjoins as well.
See Table II, Figure 5, and Figure 6
aampi¶

stumpy.
aampi
(T, m, excl_zone=None, egress=True)[source]¶ Compute an incremental nonnormalized (i.e., without znormalization) matrix profile for streaming data
Parameters: 
stumpy.
P_
The updated matrix profile for T
Type: ndarray

stumpy.
I_
The updated matrix profile indices for T
Type: ndarray

stumpy.
left_P_
The updated left matrix profile for T
Type: ndarray

stumpy.
left_I_
The updated left matrix profile indices for T
Type: ndarray

stumpy.
T_
The updated time series or sequence for which the matrix profile and matrix profile indices are computed
Type: ndarray

stumpy.
update
(t) Append a single new data point, t, to the time series, T, and update the matrix profile
Notes
See Algorithm 1
Note that we have extended this algorithm for ABjoins as well.

atsc¶

stumpy.
atsc
(IL, IR, j)[source]¶ Compute the anchored time series chain (ATSC)
Parameters:  IL (ndarray) – Left matrix profile indices
 IR (ndarray) – Right matrix profile indices
 j (int) – The index value for which to compute the ATSC
Returns: output – Anchored time series chain for index, j
Return type: ndarray
Notes
See Table I
This is the implementation for the anchored time series chains (ATSC).
Unlike the original paper, we’ve replaced the whileloop with a more stable forloop.
allc¶

stumpy.
allc
(IL, IR)[source]¶ Compute the allchain set (ALLC)
Parameters:  IL (ndarray) – Left matrix profile indices
 IR (ndarray) – Right matrix profile indices
Returns:  S (list(ndarray)) – Allchain set
 C (ndarray) – Anchored time series chain for the longest chain (also known as the unanchored chain)
Notes
See Table II
Unlike the original paper, we’ve replaced the whileloop with a more stable forloop.
This is the implementation for the allchain set (ALLC) and the unanchored chain is simply the longest one among the allchain set. Both the allchain set and unanchored chain are returned.
The allchain set, S, is returned as a list of unique numpy arrays.
fluss¶

stumpy.
fluss
(I, L, n_regimes, excl_factor=5, custom_iac=None)[source]¶ Compute the Fast Lowcost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
Essentially, this is a wrapper to compute the corrected arc curve and regime locations.
Parameters:  I (ndarray) – The matrix profile indices for the time series of interest
 L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
 n_regimes (int) – The number of regimes to search for. This is one more than the number of regime changes as denoted in the original paper.
 m (int) – The subsequence length. This is expected to be the same value as the window size used to compute the matrix profile and matrix profile index.
 excl_factor (int) – The multiplying factor for the regime exclusion zone
 custom_iac (np.array) – A custom idealized arc curve (IAC) that will used for correcting the arc curve
Returns:  cac (ndarray) – A corrected arc curve (CAC)
 regime_locs (ndarray) – The locations of the regimes
Notes
See Section A
This is the implementation for Fast Lowcost Unipotent Semantic Segmentation (FLUSS).
floss¶

stumpy.
floss
(mp, T, m, L, excl_factor=5, n_iter=1000, n_samples=1000, custom_iac=None)[source]¶ Compute the Fast Lowcost Online Semantic Segmentation (FLOSS) for streaming data
Parameters:  mp (ndarray) – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
 T (ndarray) – A 1D time series data used to generate the matrix profile and matrix profile indices found in mp. Note that the the right matrix profile index is used and the right matrix profile is intelligently recomputed on the fly from T instead of using the bidirectional matrix profile.
 m (int) – The window size for computing sliding window mass. This is identical to the window size used in the matrix profile calculation. For managing edge effects, see the L parameter.
 L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
 excl_factor (int) – The multiplying factor for the regime exclusion zone. Note that this is unrelated to the excl_zone used in to compute the matrix profile.
 n_iter (int) – Number of iterations to average over when determining the parameters for the IAC beta distribution
 n_samples (int) – Number of distribution samples to draw during each iteration when computing the IAC
 custom_iac (np.array) – A custom idealized arc curve (IAC) that will used for correcting the arc curve

stumpy.
cac_1d_
¶ A 1dimensional corrected arc curve (CAC) updated as a result of ingressing a single new data point and egressing a single old data point.
Type: ndarray

stumpy.
P_
The matrix profile updated as a result of ingressing a single new data point and egressing a single old data point.
Type: ndarray

stumpy.
I_
The (right) matrix profile indices updated as a result of ingressing a single new data point and egressing a single old data point.
Type: ndarray

stumpy.
T_
The updated time series, T
Type: ndarray

stumpy.
update
(t) Ingress a new data point, t, onto the time series, T, followed by egressing the oldest single data point from T. Then, update the 1dimensional corrected arc curve (CAC_1D) and the matrix profile.
Notes
DOI: 10.1109/ICDM.2017.21 <https://www.cs.ucr.edu/~eamonn/Segmentation_ICDM.pdf>`__
See Section C
This is the implementation for Fast Lowcost Online Semantic Segmentation (FLOSS).