Stats#

Some cool stats stuff, still need to add.

thebeat.stats.acf_df(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation analysis on a Sequence object, and return a Pandas.DataFrame object containing the results.

Parameters
  • sequence (Sequence) – A Sequence object.

  • resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.

  • smoothing_window (Optional[float], default: None) – The window within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns

DataFrame containing two columns: the timestamps, and the autocorrelation factor.

Return type

pandas.DataFrame

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

Examples

>>> rng = np.random.default_rng(seed=123)  # for reproducability
>>> seq = thebeat.core.Sequence.generate_random_uniform(n_events=10,a=400,b=600,rng=rng)
>>> df = acf_df(seq, smoothing_window=50, smoothing_sd=20, resolution=10)
>>> print(df.head(3))
   timestamp  correlation
0          0     1.000000
1         10     0.851373
2         20     0.590761
thebeat.stats.acf_plot(sequence, resolution, max_lag=None, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Autocorrelation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#

This function can be used for plotting an autocorrelation plot from a Sequence.

Parameters
  • sequence (Sequence) – A Sequence object.

  • resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration_in_ms / resolution.

  • max_lag (Optional[float], default: None) – The maximum lag to be plotted. Defaults to the sequence duration.

  • smoothing_window (Optional[float], default: None) – The window (in milliseconds) within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

  • style (str, default: 'seaborn-v0_8') – Style used by matplotlib. See matplotlib style sheets reference.

  • title (str, default: 'Autocorrelation') – If desired, one can provide a title for the plot. This takes precedence over using the Sequence or SoundSequence name attribute as the title of the plot (if the object has one).

  • x_axis_label (str, default: 'Lag') – A label for the x axis.

  • y_axis_label (str, default: 'Correlation') – A label for the y axis.

  • figsize (Optional[tuple], default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().

  • dpi (int, default: 100) – The desired output resolution of the plot in dots per inch (DPI). This refers to the dpi parameter in matplotlib.pyplot.figure().

  • ax (Optional[Axes], default: None) – If desired, one can provide an existing matplotlib.axes.Axes object to plot the autocorrelation plot on. This is for instance useful if you want to plot multiple autocorrelation plots on the same figure.

  • suppress_display (bool, default: False) – If True, matplotlib.pyplot.show() is not run.

Return type

tuple[Figure, Axes]

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.acf_values(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation. This function takes a Sequence object, and returns an array with steps of resolution of unstandardized correlation factors.

Parameters
  • sequence (Sequence) – A Sequence object.

  • resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.

  • smoothing_window (Optional[float], default: None) – The window within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Return type

ndarray

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

This function uses the numpy.correlate() to calculate the correlations.

thebeat.stats.ccf_df(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation analysis on a Sequence object, and return a Pandas.DataFrame object containing the results.

Parameters
  • test_sequence (Sequence) – The test sequence.

  • reference_sequence (Sequence) – The reference sequence.

  • resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.

  • smoothing_window (Optional[float], default: None) – The window within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns

DataFrame containing two columns: the timestamps, and the cross-correlation factor.

Return type

pandas.DataFrame

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.ccf_plot(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Cross-correlation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#

Calculate and plot the cross-correlation function (CCF) between two Sequence objects. The test sequence is compared to the reference sequence.

Parameters
  • test_sequence (Sequence) – The test sequence.

  • reference_sequence (Sequence) – The reference sequence.

  • resolution – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.

  • smoothing_window (Optional[float], default: None) – The window within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

  • style (str, default: 'seaborn-v0_8') – The matplotlib style to use. See matplotlib style reference.

  • title (str, default: 'Cross-correlation') – The title of the plot.

  • x_axis_label (str, default: 'Lag') – The label of the x axis.

  • y_axis_label (str, default: 'Correlation') – The label of the y axis.

  • figsize (Optional[tuple], default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().

  • dpi (int, default: 100) – The resolution of the plot in dots per inch. This refers to the dpi parameter in matplotlib.pyplot.figure().

  • ax (Optional[Axes], default: None) – A matplotlib.axes.Axes object. If None, a new Figure and Axes is created.

  • suppress_display (bool, default: False) – If True, the plot is not displayed. This is useful e.g. if you only want to save the plot to a file

Return type

tuple[Figure, Axes]

Returns

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.ccf_values(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Returns the unstandardized cross-correlation function (CCF) for two Sequence objects. The test sequence is compared to the reference sequence.

Parameters
  • test_sequence (Sequence) – The test sequence.

  • reference_sequence (Sequence) – The reference sequence.

  • resolution (float) – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.

  • smoothing_window (Optional[float], default: None) – The window within which a normal probability density function is used for smoothing out the analysis.

  • smoothing_sd (Optional[float], default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns

The unstandardized cross-correlation function.

Return type

correlation

thebeat.stats.edit_distance_rhythm(test_rhythm, reference_rhythm, smallest_note_value=16)[source]#

Caculates edit/Levenshtein distance between two rhythms. The smallest_note_value determines the underlying grid that is used. If e.g. 16, the underlying grid is composed of 1/16th notes.

Note

Based on the procedure described in Post and Toussaint [PT11].

Parameters
  • test_rhythm (Rhythm) – The rhythm to be tested.

  • reference_rhythm (Rhythm) – The rhythm to which test_rhythm will be compared.

  • smallest_note_value (int, default: 16) – The smallest note value that is used in the underlying grid. 16 means 1/16th notes, 4 means 1/4th notes, etc.

Return type

float

Examples

>>> from thebeat.music import Rhythm
>>> test_rhythm = Rhythm.from_fractions([1/4, 1/4, 1/4, 1/4])
>>> reference_rhythm = Rhythm.from_fractions([1/4, 1/8, 1/8, 1/4, 1/4])
>>> print(edit_distance_rhythm(test_rhythm, reference_rhythm))
1
thebeat.stats.edit_distance_sequence(test_sequence, reference_sequence, resolution)[source]#

Calculates the edit/Levenshtein distance between two sequences.

If Sequences are not quantized to resolution, they will be quantized to that resolution first.

Note

The resolution also represents the underlying grid. If, for example, the resolution is 50, that means that a grid will be created with steps of 50. The onsets of the sequence are then placed on the grid for both sequences. The resulting sequences consist of ones and zeros, where ones represent the event onsets. This string for test_sequence is compared to the string of the reference_sequence. Note that test_sequence and reference_sequence can be interchanged without an effect on the results.

Parameters
  • test_sequence (Sequence) – The sequence to be tested.

  • reference_sequence (Sequence) – The sequence to which test_sequence will be compared.

  • resolution (int) – The resolution to which the sequences will be quantized. If the sequences are already quantized to this resolution, they will not be quantized again.

Return type

float

thebeat.stats.fft_plot(sequence, unit_size, x_min=0, x_max=None, style='seaborn-v0_8', title='Fourier transform', x_axis_label='Cycles per unit', y_axis_label='Absolute power', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#

Plots the Fourier transform of a Sequence object. The unit_size parameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds. Note that the first frame is discarded since it will always have the highest power, yet is not informative.

Parameters
  • sequence (Sequence) – The sequence.

  • unit_size (float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.

  • x_min (float, default: 0) – The minimum number of cycles per unit to be plotted.

  • x_max (Optional[float], default: None) – The maximum number of cycles per unit to be plotted.

  • style (str, default: 'seaborn-v0_8') –

    The matplotlib style to use. See matplotlib style reference.

  • title (str, default: 'Fourier transform') – The title of the plot.

  • x_axis_label (str, default: 'Cycles per unit') – The label of the x axis.

  • y_axis_label (str, default: 'Absolute power') – The label of the y axis.

  • figsize (Optional[tuple], default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().

  • dpi (int, default: 100) – The resolution of the plot in dots per inch.

  • ax (Optional[Axes], default: None) – A matplotlib Axes object to plot on. If not provided, a new figure and axes will be created.

  • suppress_display (bool, default: False) – If True, the plot will not be displayed.

Return type

tuple[Figure, Axes]

Returns

  • fig – The matplotlib Figure object.

  • ax – The matplotlib Axes object.

Examples

>>> from thebeat import Sequence
>>> from thebeat.stats import fft_plot
>>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25)  # milliseconds
>>> fft_plot(seq, unit_size=1000)
(<Figure size 800x550 with 1 Axes>, <AxesSubplot: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
>>> seq = Sequence.generate_random_normal(n_events=100, mu=0.5, sigma=0.025)  # seconds
>>> fft_plot(seq, unit_size=1, x_max=5)
(<Figure size 800x550 with 1 Axes>, <AxesSubplot: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
thebeat.stats.get_cov(sequence)[source]#

Calculate the coefficient of variantion of the inter-onset intervals (IOIS) in a thebeat.core.Sequence object.

Parameters

sequence (Sequence) – A thebeat.core.Sequence object.

Returns

The covariance of the sequence.

Return type

float

thebeat.stats.get_npvi(sequence)[source]#

This function calculates the normalized pairwise variability index (nPVI) for a provided Sequence or SoundSequence object, or for an interable of inter-onset intervals (IOIs).

Parameters

sequence (Sequence) – Either a Sequence or SoundSequence object, or an iterable containing inter-onset intervals (IOIs).

Returns

The nPVI for the provided sequence.

Return type

numpy.float64

Notes

The normalied pairwise variability index (nPVI) is a measure of the variability of adjacent temporal intervals. The nPVI is zero for sequences that are perfectly isochronous. See Jadoul et al. [JRT+16] and Ravignani and Norton [RN17] for more information on its use in rhythm research.

Examples

>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=500)
>>> print(get_npvi(seq))
0.0
>>> rng = np.random.default_rng(seed=123)
>>> seq = thebeat.core.Sequence.generate_random_normal(n_events=10,mu=500,sigma=50,rng=rng)
>>> print(get_npvi(seq))
37.6263174529546
thebeat.stats.get_rhythmic_entropy(sequence, bin_fraction=0.03125)[source]#

Calculate Shannon entropy from bins. This is a measure of rhythmic complexity. If many different ‘note durations’ are present, entropy is high. If only a few are present, entropy is low. A sequence that is completely isochronous has a Shannon entropy of 0.

The bin size is determined from the average inter-onset interval in the thebeat.core.Sequence object (i.e. the tempo) and the bin_fraction. The bin_fraction corresponds to temporal sensitivity. The default is 1/32th of the average IOI. This implies that the smallest note value that can be detected is a 1/32th note.

Parameters
  • sequence (Union[Sequence, Rhythm]) – The thebeat.core.Sequence object for which Shannon entropy is calculated.

  • bin_fraction (float, default: 0.03125) – The fraction of the average inter-onset interval (IOI) that determines the bin size. It is multiplied by the average IOI to get the bin size.

Example

A Sequence has an average IOI of 500 ms. With a bin_fraction of 0.03125 (corresponding to 1/32th note value) the bins will have a size of 15.625 ms. The entropy will be calculated from the number of IOIs in each bin.

References

#todo add reference here for this type of entropy calculation.

thebeat.stats.get_ugof_isochronous(test_sequence, reference_ioi, output_statistic='mean')[source]#

This function calculates the universal goodness of fit (ugof) measure. The ugof statistic quantifies how well a theoretical sequence describes a sequence at hand (the test_sequence). This function can only calculate ugof using a theoretical sequence that is isochronous.

The reference_ioi is the IOI of an isochronous theoretical sequence.

Parameters
  • test_sequence (Sequence) – A Sequence or SoundSequence object that will be compared to reference_sequence.

  • reference_ioi (float) – A number (float or int) representing the IOI of an isochronous sequence.

  • output_statistic (str, default: 'mean') – Either ‘mean’ (the default) or ‘median’. This determines whether for the individual ugof values we take the mean or the median as the output statistic.

Returns

The ugof statistic.

Return type

numpy.float64

Notes

This measure is described in Burchardt et al. [BBKnornschild21]. Please also refer to this Github page for an R implementation of the ugof measure.

Examples

>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=1000)
>>> ugof = get_ugof_isochronous(seq, reference_ioi=68.21)
>>> print(ugof)
0.41817915
thebeat.stats.ks_test(sequence, reference_distribution='normal', alternative='two-sided')[source]#

This function returns the D statistic and p value of a one-sample Kolmogorov-Smirnov test. It calculates how different the distribution of inter-onset intervals (IOIs) is compared to the provided reference distribution.

If p is significant it means that the IOIs are not distributed according to the provided reference distribution.

Parameters
  • sequence (Sequence) – A Sequence object.

  • reference_distribution (str, default: 'normal') – Either ‘normal’ or ‘uniform’. The distribution against which the distribution of inter-onset intervals (IOIs) is compared.

  • alternative (str, default: 'two-sided') – Either ‘two-sided’, ‘less’ or ‘greater’. See scipy.stats.kstest() for more information.

Returns

A SciPy named tuple containing the D statistic and the p value.

Return type

scipy.stats._stats_py.KstestResult

Notes

This function uses scipy.stats.kstest(). For more information about the use of the Kolmogorov-Smirnov test in rhythm research, see Jadoul et al. [JRT+16] and Ravignani and Norton [RN17].

Examples

>>> rng = np.random.default_rng(seed=123)
>>> seq = thebeat.core.Sequence.generate_random_normal(n_events=100,mu=500,sigma=25,rng=rng)
>>> print(ks_test(seq))
KstestResult(statistic=0.07176677141846549, pvalue=0.6608009345687911)