STUMPY API¶
Overview
Compute the znormalized matrix profile 

Compute the znormalized matrix profile with a distributed dask cluster 

Compute the znormalized matrix profile with one or more GPU devices 

Compute the distance profile using the MASS algorithm 

Compute an approximate znormalized matrix profile 

Compute an incremental znormalized matrix profile for streaming data 

Compute the multidimensional znormalized matrix profile 

Compute the multidimensional znormalized matrix profile with a distributed dask cluster 

Compute the kdimensional matrix profile subspace for a given subsequence index and its nearest neighbor index 

Compute the anchored time series chain (ATSC) 

Compute the allchain set (ALLC) 

Compute the Fast Lowcost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing) 

Compute the Fast Lowcost Online Semantic Segmentation (FLOSS) for streaming data 

Find the znormalized consensus motif of multiple time series 

Find the znormalized consensus motif of multiple time series with a distributed dask cluster 

Find the znormalized consensus motif of multiple time series with one or more GPU devices 

Compute the znormalized matrix profile distance (MPdist) measure between any two time series 

Compute the znormalized matrix profile distance (MPdist) measure between any two time series with a distributed dask cluster 

Compute the znormalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices 

Discover the top motifs for time series T 

Find all matches of a query Q in a time series T 

Identify the top k snippets that best represent the time series, T 

Compute the Pan Matrix Profile 

Compute the Pan Matrix Profile with a distributed dask cluster 

Compute the Pan Matrix Profile with with one or more GPU devices 
stump¶

stumpy.
stump
(T_A, m, T_B=None, ignore_trivial=True, normalize=True)[source]¶ Compute the znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.
 Parameters
T_A (ndarray) – The time series or sequence for which to compute the matrix profile
m (int) – Window size
T_B (ndarray, default None) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
ignore_trivial (bool, default True) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
 Return type
ndarray
Notes
DOI: 10.1007/s101150171138x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a rowwise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sumofproducts along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
stumped¶

stumpy.
stumped
(dask_client, T_A, m, T_B=None, ignore_trivial=True, normalize=True)[source]¶ Compute the znormalized matrix profile with a distributed dask cluster
This is a highly distributed implementation around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to STOMPopt with Pearson correlations.
 Parameters
dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
T_A (ndarray) – The time series or sequence for which to compute the matrix profile
m (int) – Window size
T_B (ndarray, default None) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
ignore_trivial (bool, default True) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
 Return type
ndarray
Notes
DOI: 10.1007/s101150171138x
See Section 4.5
The above reference outlines a general approach for traversing the distance matrix in a diagonal fashion rather than in a rowwise fashion.
See Section 3.1 and Section 3.3
The above reference outlines the use of the Pearson correlation via Welford’s centered sumofproducts along each diagonal of the distance matrix in place of the sliding window dot product found in the original STOMP method.
See Table II
This is a Dask distributed implementation of stump that scales across multiple servers and is a convenience wrapper around the parallelized stump._stump function
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
gpu_stump¶

stumpy.
gpu_stump
(T_A, m, T_B=None, ignore_trivial=True, device_id=0, normalize=True)¶ Compute the znormalized matrix profile with one or more GPU devices
This is a convenience wrapper around the Numba cuda.jit _gpu_stump function which computes the matrix profile according to GPUSTOMP.
 Parameters
T_A (ndarray) – The time series or sequence for which to compute the matrix profile
m (int) – Window size
T_B (ndarray, default None) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded. Default is None which corresponds to a selfjoin.
ignore_trivial (bool, default True) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
device_id (int or list, default 0) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in numba.cuda.list_devices()].
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
out – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
 Return type
ndarray
Notes
See Table II, Figure 5, and Figure 6
Timeseries, T_A, will be annotated with the distance location (or index) of all its subsequences in another times series, T_B.
Return: For every subsequence, Q, in T_A, you will get a distance and index for the closest subsequence in T_B. Thus, the array returned will have length T_A.shape[0]m+1. Additionally, the left and right matrix profiles are also returned.
Note: Unlike in the Table II where T_A.shape is expected to be equal to T_B.shape, this implementation is generalized so that the shapes of T_A and T_B can be different. In the case where T_A.shape == T_B.shape, then our algorithm reduces down to the same algorithm found in Table II.
Additionally, unlike STAMP where the exclusion zone is m/2, the default exclusion zone for STOMP is m/4 (See Definition 3 and Figure 3).
For selfjoins, set ignore_trivial = True in order to avoid the trivial match.
Note that left and right matrix profiles are only available for selfjoins.
mass¶

stumpy.
mass
(Q, T, M_T=None, Σ_T=None, normalize=True)[source]¶ Compute the distance profile using the MASS algorithm
This is a convenience wrapper around the Numba JIT compiled _mass function.
 Parameters
Q (ndarray) – Query array or subsequence
T (ndarray) – Time series or sequence
M_T (ndarray, default None) – Sliding mean of T
Σ_T (ndarray, default None) – Sliding standard deviation of T
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
distance_profile – Distance profile
 Return type
ndarray
Notes
See Table II
Note that Q, T are not directly required to calculate D
Note: Unlike the Matrix Profile I paper, here, M_T, Σ_T can be calculated once for all subsequences of T and passed in so the redundancy is removed
scrump¶

stumpy.
scrump
(T_A, m, T_B=None, ignore_trivial=True, percentage=0.01, pre_scrump=False, s=None, normalize=True)[source]¶ Compute an approximate znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _stump function which computes the matrix profile according to SCRIMP.
 Parameters
T_A (ndarray) – The time series or sequence for which to compute the matrix profile
T_B (ndarray) – The time series or sequence that will be used to annotate T_A. For every subsequence in T_A, its nearest neighbor in T_B will be recorded.
m (int) – Window size
ignore_trivial (bool) – Set to True if this is a selfjoin. Otherwise, for ABjoin, set this to False. Default is True.
percentage (float) – Approximate percentage completed. The value is between 0.0 and 1.0.
pre_scrump (bool) – A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to True, this is equivalent to computing SCRIMP++ and may lead to faster convergence
s (int) – The size of the PreSCRIMP fixed interval. If pre_scrump=True and s=None, then s will automatically be set to s=int(np.ceil(m / config.STUMPY_EXCL_ZONE_DENOM)), the size of the exclusion zone.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this class gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized class decorator.

stumpy.
P_
¶ The updated matrix profile
 Type
ndarray

stumpy.
I_
¶ The updated matrix profile indices
 Type
ndarray

stumpy.
update
()¶ Update the matrix profile and the matrix profile indices by computing additional new distances (limited by percentage) that make up the full distance matrix.
Notes
See Algorithm 1 and Algorithm 2
stumpi¶

stumpy.
stumpi
(T, m, egress=True, normalize=True)[source]¶ Compute an incremental znormalized matrix profile for streaming data
This is based on the online STOMPI and STAMPI algorithms.
 Parameters
T (ndarray) – The time series or sequence for which the matrix profile and matrix profile indices will be returned
m (int) – Window size
egress (bool, default True) – If set to True, the oldest data point in the time series is removed and the time series length remains constant rather than forever increasing
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this class gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized class decorator.

stumpy.
P_
The updated matrix profile for T
 Type
ndarray

stumpy.
I_
The updated matrix profile indices for T
 Type
ndarray

stumpy.
left_P_
¶ The updated left matrix profile for T
 Type
ndarray

stumpy.
left_I_
¶ The updated left matrix profile indices for T
 Type
ndarray

stumpy.
T_
¶ The updated time series or sequence for which the matrix profile and matrix profile indices are computed
 Type
ndarray

stumpy.
update
(t) Append a single new data point, t, to the time series, T, and update the matrix profile
Notes
DOI: 10.1007/s1061801705199
See Table V
Note that line 11 is missing an important sqrt operation!
mstump¶

stumpy.
mstump
(T, m, include=None, discords=False, normalize=True)[source]¶ Compute the multidimensional znormalized matrix profile
This is a convenience wrapper around the Numba JITcompiled parallelized _mstump function which computes the multidimensional matrix profile and multidimensional matrix profile index according to mSTOMP, a variant of mSTAMP. Note that only selfjoins are supported.
 Parameters
T (ndarray) – The time series or sequence for which to compute the multidimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
m (int) – Window size
include (list, ndarray, default None) –
A list of (zerobased) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:
discords (bool, default False) – When set to True, this reverses the distance matrix which results in a multidimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
P (ndarray) – The multidimensional matrix profile. Each row of the array corresponds to each matrix profile for a given dimension (i.e., the first row is the 1D matrix profile and the second row is the 2D matrix profile).
I (ndarray) – The multidimensional matrix profile index where each row of the array corresponds to each matrix profile index for a given dimension.
Notes
See mSTAMP Algorithm
mstumped¶

stumpy.
mstumped
(dask_client, T, m, include=None, discords=False, normalize=True)[source]¶ Compute the multidimensional znormalized matrix profile with a distributed dask cluster
This is a highly distributed implementation around the Numba JITcompiled parallelized _mstump function which computes the multidimensional matrix profile according to STOMP. Note that only selfjoins are supported.
 Parameters
dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
T (ndarray) – The time series or sequence for which to compute the multidimensional matrix profile. Each row in T represents data from a different dimension while each column in T represents data from the same dimension.
m (int) – Window size
include (list, ndarray, default None) –
A list of (zerobased) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:
discords (bool, default False) – When set to True, this reverses the distance matrix which results in a multidimensional matrix profile that favors larger matrix profile values (i.e., discords) rather than smaller values (i.e., motifs). Note that indices in include are still maintained and respected.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
P (ndarray) – The multidimensional matrix profile. Each row of the array corresponds to each matrix profile for a given dimension (i.e., the first row is the 1D matrix profile and the second row is the 2D matrix profile).
I (ndarray) – The multidimensional matrix profile index where each row of the array corresponds to each matrix profile index for a given dimension.
Notes
See mSTAMP Algorithm
subspace¶

stumpy.
subspace
(T, m, subseq_idx, nn_idx, k, include=None, discords=False, normalize=True)[source]¶ Compute the kdimensional matrix profile subspace for a given subsequence index and its nearest neighbor index
 Parameters
T (ndarray) – The time series or sequence for which the multidimensional matrix profile, multidimensional matrix profile indices were computed
m (int) – Window size
subseq_idx (int) – The subsequence index in T
nn_idx (int) – The nearest neighbor index in T
k (int) – The subset number of dimensions out of D = T.shape[0]dimensions to return the subspace for
include (ndarray, default None) –
A list of (zerobased) indices corresponding to the dimensions in T that must be included in the constrained multidimensional motif search. For more information, see Section IV D in:
discords (bool, default False) – When set to True, this reverses the distance profile to favor discords rather than motifs. Note that indices in include are still maintained and respected.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
S (ndarray)
An array of that contains the `k`thdimensional subspace for the subsequence
with index equal to motif_idx
atsc¶

stumpy.
atsc
(IL, IR, j)[source]¶ Compute the anchored time series chain (ATSC)
Note that since the matrix profile indices, IL and IR, are precomputed, this function is agnostic to subsequence normalization.
 Parameters
IL (ndarray) – Left matrix profile indices
IR (ndarray) – Right matrix profile indices
j (int) – The index value for which to compute the ATSC
 Returns
out – Anchored time series chain for index, j
 Return type
ndarray
Notes
See Table I
This is the implementation for the anchored time series chains (ATSC).
Unlike the original paper, we’ve replaced the whileloop with a more stable forloop.
allc¶

stumpy.
allc
(IL, IR)[source]¶ Compute the allchain set (ALLC)
Note that since the matrix profile indices, IL and IR, are precomputed, this function is agnostic to subsequence normalization.
 Parameters
IL (ndarray) – Left matrix profile indices
IR (ndarray) – Right matrix profile indices
 Returns
S (list(ndarray)) – Allchain set
C (ndarray) – Anchored time series chain for the longest chain (also known as the unanchored chain)
Notes
See Table II
Unlike the original paper, we’ve replaced the whileloop with a more stable forloop.
This is the implementation for the allchain set (ALLC) and the unanchored chain is simply the longest one among the allchain set. Both the allchain set and unanchored chain are returned.
The allchain set, S, is returned as a list of unique numpy arrays.
fluss¶

stumpy.
fluss
(I, L, n_regimes, excl_factor=5, custom_iac=None)[source]¶ Compute the Fast Lowcost Unipotent Semantic Segmentation (FLUSS) for static data (i.e., batch processing)
Essentially, this is a wrapper to compute the corrected arc curve and regime locations. Note that since the matrix profile indices, I, are precomputed, this function is agnostic to subsequence normalization.
 Parameters
I (ndarray) – The matrix profile indices for the time series of interest
L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
n_regimes (int) – The number of regimes to search for. This is one more than the number of regime changes as denoted in the original paper.
m (int) – The subsequence length. This is expected to be the same value as the window size used to compute the matrix profile and matrix profile index.
excl_factor (int, default 5) – The multiplying factor for the regime exclusion zone
custom_iac (ndarray, default None) – A custom idealized arc curve (IAC) that will used for correcting the arc curve
 Returns
cac (ndarray) – A corrected arc curve (CAC)
regime_locs (ndarray) – The locations of the regimes
Notes
See Section A
This is the implementation for Fast Lowcost Unipotent Semantic Segmentation (FLUSS).
floss¶

stumpy.
floss
(mp, T, m, L, excl_factor=5, n_iter=1000, n_samples=1000, custom_iac=None, normalize=True)[source]¶ Compute the Fast Lowcost Online Semantic Segmentation (FLOSS) for streaming data
 Parameters
mp (ndarray) – The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
T (ndarray) – A 1D time series data used to generate the matrix profile and matrix profile indices found in mp. Note that the the right matrix profile index is used and the right matrix profile is intelligently recomputed on the fly from T instead of using the bidirectional matrix profile.
m (int) – The window size for computing sliding window mass. This is identical to the window size used in the matrix profile calculation. For managing edge effects, see the L parameter.
L (int) – The subsequence length that is set roughly to be one period length. This is likely to be the same value as the window size, m, used to compute the matrix profile and matrix profile index but it can be different since this is only used to manage edge effects and has no bearing on any of the IAC or CAC core calculations.
excl_factor (int, default 5) – The multiplying factor for the regime exclusion zone. Note that this is unrelated to the excl_zone used in to compute the matrix profile.
n_iter (int, default 1000) – Number of iterations to average over when determining the parameters for the IAC beta distribution
n_samples (int, default 1000) – Number of distribution samples to draw during each iteration when computing the IAC
custom_iac (ndarray, default None) – A custom idealized arc curve (IAC) that will used for correcting the arc curve
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances

stumpy.
cac_1d_
¶ A 1dimensional corrected arc curve (CAC) updated as a result of ingressing a single new data point and egressing a single old data point.
 Type
ndarray

stumpy.
P_
The matrix profile updated as a result of ingressing a single new data point and egressing a single old data point.
 Type
ndarray

stumpy.
I_
The (right) matrix profile indices updated as a result of ingressing a single new data point and egressing a single old data point.
 Type
ndarray

stumpy.
T_
The updated time series, T
 Type
ndarray

stumpy.
update
(t) Ingress a new data point, t, onto the time series, T, followed by egressing the oldest single data point from T. Then, update the 1dimensional corrected arc curve (CAC_1D) and the matrix profile.
Notes
DOI: 10.1109/ICDM.2017.21 <https://www.cs.ucr.edu/~eamonn/Segmentation_ICDM.pdf>`__
See Section C
This is the implementation for Fast Lowcost Online Semantic Segmentation (FLOSS).
ostinato¶

stumpy.
ostinato
(Ts, m, normalize=True)[source]¶ Find the znormalized consensus motif of multiple time series
This is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
 Parameters
 Returns
central_radius (float) – Radius of the most central consensus motif
central_Ts_idx (int) – The time series index in Ts which contains the most central consensus motif
central_subseq_idx (int) – The subsequence index within time series Ts[central_motif_Ts_idx] the contains most central consensus motif
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius in Ts is the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius via stumpy.ostinato._get_central_motif
ostinatoed¶

stumpy.
ostinatoed
(dask_client, Ts, m, normalize=True)[source]¶ Find the znormalized consensus motif of multiple time series with a distributed dask cluster
This is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
 Parameters
dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
Ts (list) – A list of time series for which to find the most central consensus motif
m (int) – Window size
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
central_radius (float) – Radius of the most central consensus motif
central_Ts_idx (int) – The time series index in Ts which contains the most central consensus motif
central_subseq_idx (int) – The subsequence index within time series Ts[central_motif_Ts_idx] the contains most central consensus motif
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius in Ts is the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius via stumpy.ostinato._get_central_motif
gpu_ostinato¶

stumpy.
gpu_ostinato
(Ts, m, device_id=0, normalize=True)¶ Find the znormalized consensus motif of multiple time series with one or more GPU devices
This is a wrapper around the vanilla version of the ostinato algorithm which finds the best radius and a helper function that finds the most central conserved motif.
 Parameters
Ts (list) – A list of time series for which to find the most central consensus motif
m (int) – Window size
device_id (int or list, default 0) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in numba.cuda.list_devices()].
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
central_radius (float) – Radius of the most central consensus motif
central_Ts_idx (int) – The time series index in Ts which contains the most central consensus motif
central_subseq_idx (int) – The subsequence index within time series Ts[central_motif_Ts_idx] the contains most central consensus motif
Notes
See Table 2
The ostinato algorithm proposed in the paper finds the best radius in Ts. Intuitively, the radius is the minimum distance of a subsequence to encompass at least one nearest neighbor subsequence from all other time series. The best radius in Ts is the minimum radius amongst all radii. Some data sets might contain multiple subsequences which have the same optimal radius. The greedy Ostinato algorithm only finds one of them, which might not be the most central motif. The most central motif amongst the subsequences with the best radius is the one with the smallest mean distance to nearest neighbors in all other time series. To find this central motif it is necessary to search the subsequences with the best radius via stumpy.ostinato._get_central_motif
mpdist¶

stumpy.
mpdist
(T_A, T_B, m, percentage=0.05, k=None, normalize=True)[source]¶ Compute the znormalized matrix profile distance (MPdist) measure between any two time series
The MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates the output of an ABjoin and a BAjoin and returns the `k`th smallest value as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.
 Parameters
T_A (ndarray) – The first time series or sequence for which to compute the matrix profile
T_B (ndarray) – The second time series or sequence for which to compute the matrix profile
m (int) – Window size
percentage (float, default 0.05) – The percentage of distances that will be used to report mpdist. The value is between 0.0 and 1.0.
k (int) – Specify the k`th value in the concatenated matrix profiles to return. When `k is not None, then the percentage parameter is ignored.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
MPdist – The matrix profile distance
 Return type
Notes
See Section III
mpdisted¶

stumpy.
mpdisted
(dask_client, T_A, T_B, m, percentage=0.05, k=None, normalize=True)[source]¶ Compute the znormalized matrix profile distance (MPdist) measure between any two time series with a distributed dask cluster
The MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates the output of an ABjoin and a BAjoin and returns the `k`th smallest value as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.
 Parameters
dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
T_A (ndarray) – The first time series or sequence for which to compute the matrix profile
T_B (ndarray) – The second time series or sequence for which to compute the matrix profile
m (int) – Window size
percentage (float, default 0.05) – The percentage of distances that will be used to report mpdist. The value is between 0.0 and 1.0. This parameter is ignored when k is not None.
k (int) – Specify the k`th value in the concatenated matrix profiles to return. When `k is not None, then the percentage parameter is ignored.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
MPdist – The matrix profile distance
 Return type
Notes
See Section III
gpu_mpdist¶

stumpy.
gpu_mpdist
(T_A, T_B, m, percentage=0.05, k=None, device_id=0, normalize=True)¶ Compute the znormalized matrix profile distance (MPdist) measure between any two time series with one or more GPU devices
The MPdist distance measure considers two time series to be similar if they share many subsequences, regardless of the order of matching subsequences. MPdist concatenates and sorts the output of an ABjoin and a BAjoin and returns the value of the `k`th smallest number as the reported distance. Note that MPdist is a measure and not a metric. Therefore, it does not obey the triangular inequality but the method is highly scalable.
 Parameters
T_A (ndarray) – The first time series or sequence for which to compute the matrix profile
T_B (ndarray) – The second time series or sequence for which to compute the matrix profile
m (int) – Window size
percentage (float, default 0.05) – The percentage of distances that will be used to report mpdist. The value is between 0.0 and 1.0. This parameter is ignored when k is not None.
k (int, default None) – Specify the k`th value in the concatenated matrix profiles to return. When `k is not None, then the percentage parameter is ignored.
device_id (int or list, default 0) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in numba.cuda.list_devices()].
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
MPdist – The matrix profile distance
 Return type
Notes
See Section III
motifs¶

stumpy.
motifs
(T, P, min_neighbors=1, max_distance=None, cutoff=None, max_matches=10, max_motifs=1, normalize=True)[source]¶ Discover the top motifs for time series T
A subsequence, Q, becomes a candidate motif if there are at least min_neighbor number of other subsequence matches in T (outside the exclusion zone) with a distance less or equal to max_distance.
 Parameters
T (ndarray) – The time series or sequence
P (ndarray) – Matrix Profile of T
min_neighbors (int, default 1) – The minimum number of similar matches a subsequence needs to have in order to be considered a motif. This defaults to 1, which means that a subsequence must have at least one similar match in order to be considered a motif.
max_distance (float or function, default None) – For a candidate motif, Q, and a nontrivial subsequence, S, max_distance is the maximum distance allowed between Q and S so that S is considered a match of Q. If max_distance is a function, then it must be a function that accepts a single parameter, D, in its function signature, which is the distance profile between Q and T. If None, this defaults to max(np.mean(D)  2 * np.std(D), np.min(D)).
cutoff (float, default None) – The largest matrix profile value (distance) that a candidate motif is allowed to have. If None, this defaults to max(np.mean(P)  2 * np.std(P), np.min(P))
max_matches (int, default 10) – The maximum amount of similar matches of a motif representative to be returned. The resulting matches are sorted by distance, so a value of 10 means that the indices of the most similar 10 subsequences is returned. If None, all matches within max_distance of the motif representative will be returned. Note that the first match is always the selfmatch/trivialmatch for each motif.
max_motifs (int, default 1) – The maximum number of motifs to return
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
motif_distances (ndarray) – The distances corresponding to a set of subsequence matches for each motif. Note that the first column always corresponds to the distance for the selfmatch/trivialmatch for each motif.
motif_indices (ndarray) – The indices corresponding to a set of subsequences matches for each motif. Note that the first column always corresponds to the index for the selfmatch/trivialmatch for each motif.
match¶

stumpy.
match
(Q, T, M_T=None, Σ_T=None, max_distance=None, max_matches=None, normalize=True)[source]¶ Find all matches of a query Q in a time series T
The indices of subsequences whose distances to Q are less than or equal to max_distance, sorted by distance (lowest to highest). Around each occurrence an exclusion zone is applied before searching for the next.
 Parameters
Q (ndarray) – The query sequence. It doesn’t have to be a subsequence of T
T (ndarray) – The time series of interest
M_T (ndarray, default None) – Sliding mean of time series, T
Σ_T (ndarray, default None) – Sliding standard deviation of time series, T
max_distance (float or function, default None) – Maximum distance between Q and a subsequence S for S to be considered a match. If a function, then it has to be a function of one argument D, which will be the distance profile of Q with T (a 1D numpy array of size nm+1). If None, defaults to max(np.mean(D)  2 * np.std(D), np.min(D)), i.e. at least the closest match will be returned.
max_matches (int, default None) – The maximum amount of similar occurrences to be returned. The resulting occurrences are sorted by distance, so a value of 10 means that the indices of the most similar 10 subsequences is returned. If None, then all occurrences are returned.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
out – The first column consists of distances of subsequences of T whose distances to Q are smaller than max_distance, sorted by distance (lowest to highest). The second column consists of the corresponding indices in T.
 Return type
ndarray
snippets¶

stumpy.
snippets
(T, m, k, percentage=1.0, s=None, mpdist_percentage=0.05, mpdist_k=None, normalize=True)[source]¶ Identify the top k snippets that best represent the time series, T
 Parameters
T (ndarray) – The time series or sequence for which to find the snippets
m (int) – The snippet window size
k (int) – The desired number of snippets
percentage (float, default 1.0) – With the length of each nonoverlapping subsequence, S[i], set to m, this is the percentage of S[i] (i.e., percentage * m) to set the s to. When percentage == 1.0, then the full length of S[i] is used to compute the mpdist_vect. When percentage < 1.0, then shorter subsequences from S[i] is used to compute mpdist_vect.
s (int, default None) – With the length of each nonoverlapping subsequence, S[i], set to m, this is essentially the subsubsequence length (i.e., a shorter part of S[i]). When s == m, then the full length of S[i] is used to compute the mpdist_vect. When s < m, then shorter subsequences with length s from each S[i] is used to compute mpdist_vect. When s is not None, then the percentage parameter is ignored.
mpdist_percentage (float, default 0.05) – The percentage of distances that will be used to report mpdist. The value is between 0.0 and 1.0.
mpdist_k (int) – Specify the k`th value in the concatenated matrix profiles to return. When `mpdist_k is not None, then the mpdist_percentage parameter is ignored.
normalize (bool, default True) – When set to True, this znormalizes subsequences prior to computing distances. Otherwise, this function gets rerouted to its complementary nonnormalized equivalent set in the @core.non_normalized function decorator.
 Returns
snippets (ndarray) – The top k snippets
snippets_indices (ndarray) – The index locations for each of top k snippets
snippets_profiles (ndarray) – The MPdist profiles for each of the top k snippets
snippets_fractions (ndarray) – The fraction of data that each of the top k snippets represents
snippets_areas (ndarray) – The area under the curve corresponding to each profile for each of the top k snippets
snippets_regimes (ndarray) – The index slices corresponding to the set of regimes for each of the top k snippets. The first column is the (zerobased) snippet index while the second and third columns correspond to the (inclusive) regime start indices and the (exclusive) regime stop indices, respectively.
Notes
See Table I
stimp¶

stumpy.
stimp
(T, min_m=3, max_m=None, step=1, percentage=0.01, pre_scrump=True)[source]¶ Compute the Pan Matrix Profile
This is based on the SKIMP algorithm.
 Parameters
T (ndarray) – The time series or sequence for which to compute the pan matrix profile
m_start (int, default 3) – The starting (or minimum) subsequence window size for which a matrix profile may be computed
m_stop (int, default None) – The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When m_stop = Non, this is set to the maximum allowable subsequence window size
m_step (int, default 1) – The step between subsequence window sizes
percentage (float, default 0.01) – The percentage of the full matrix profile to compute for each subsequence window size. When percentage < 1.0, then the scrump algorithm is used. Otherwise, the stump algorithm is used when the exact matrix profile is requested.
pre_scrump (bool, default True) – A flag for whether or not to perform the PreSCRIMP calculation prior to computing SCRIMP. If set to True, this is equivalent to computing SCRIMP++. This parameter is ignored when percentage = 1.0.

stumpy.
PAN_
¶ The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile
 Type
ndarray

stumpy.
M_
¶ The full list of (breadth first search (level) ordered) subsequence window sizes
 Type
ndarray

update():
Compute the next matrix profile using the next available (breadthfirstsearch (level) ordered) subsequence window size and update the pan matrix profile
Notes
See Table 2
stimped¶

stumpy.
stimped
(dask_client, T, min_m=3, max_m=None, step=1)[source]¶ Compute the Pan Matrix Profile with a distributed dask cluster
This is based on the SKIMP algorithm.
 Parameters
dask_client (client) – A Dask Distributed client that is connected to a Dask scheduler and Dask workers. Setting up a Dask distributed cluster is beyond the scope of this library. Please refer to the Dask Distributed documentation.
T (ndarray) – The time series or sequence for which to compute the pan matrix profile
m_start (int, default 3) – The starting (or minimum) subsequence window size for which a matrix profile may be computed
m_stop (int, default None) – The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When m_stop = Non, this is set to the maximum allowable subsequence window size
m_step (int, default 1) – The step between subsequence window sizes

stumpy.
PAN_
The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile
 Type
ndarray

stumpy.
M_
The full list of (breadth first search (level) ordered) subsequence window sizes
 Type
ndarray

update():
Compute the next matrix profile using the next available (breadthfirstsearch (level) ordered) subsequence window size and update the pan matrix profile
Notes
See Table 2
gpu_stimp¶

stumpy.
gpu_stimp
(T, min_m=3, max_m=None, step=1, device_id=0)¶ Compute the Pan Matrix Profile with with one or more GPU devices
This is based on the SKIMP algorithm.
 Parameters
T (ndarray) – The time series or sequence for which to compute the pan matrix profile
m_start (int, default 3) – The starting (or minimum) subsequence window size for which a matrix profile may be computed
m_stop (int, default None) – The stopping (or maximum) subsequence window size for which a matrix profile may be computed. When m_stop = Non, this is set to the maximum allowable subsequence window size
m_step (int, default 1) – The step between subsequence window sizes
device_id (int or list, default 0) – The (GPU) device number to use. The default value is 0. A list of valid device ids (int) may also be provided for parallel GPUSTUMP computation. A list of all valid device ids can be obtained by executing [device.id for device in numba.cuda.list_devices()].

stumpy.
PAN_
The transformed (i.e., normalized, contrasted, binarized, and repeated) pan matrix profile
 Type
ndarray

stumpy.
M_
The full list of (breadth first search (level) ordered) subsequence window sizes
 Type
ndarray

update():
Compute the next matrix profile using the next available (breadthfirstsearch (level) ordered) subsequence window size and update the pan matrix profile
Notes
See Table 2