Stats
Stats#
Some cool stats stuff, still need to add.
- thebeat.stats.acf_df(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequenceobject, and return aPandas.DataFrameobject containing the results.- Parameters
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
Optional[float], default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns
DataFrame containing two columns: the timestamps, and the autocorrelation factor.
- Return type
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
Examples
>>> rng = np.random.default_rng(seed=123) # for reproducability >>> seq = thebeat.core.Sequence.generate_random_uniform(n_events=10,a=400,b=600,rng=rng) >>> df = acf_df(seq, smoothing_window=50, smoothing_sd=20, resolution=10) >>> print(df.head(3)) timestamp correlation 0 0 1.000000 1 10 0.851373 2 20 0.590761
- thebeat.stats.acf_plot(sequence, resolution, max_lag=None, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Autocorrelation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
This function can be used for plotting an autocorrelation plot from a
Sequence.- Parameters
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration_in_ms / resolution.max_lag (
Optional[float], default:None) – The maximum lag to be plotted. Defaults to the sequence duration.smoothing_window (
Optional[float], default:None) – The window (in milliseconds) within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.style (
str, default:'seaborn-v0_8') – Style used by matplotlib. See matplotlib style sheets reference.title (
str, default:'Autocorrelation') – If desired, one can provide a title for the plot. This takes precedence over using theSequenceorSoundSequencenameattribute as the title of the plot (if the object has one).x_axis_label (
str, default:'Lag') – A label for the x axis.y_axis_label (
str, default:'Correlation') – A label for the y axis.figsize (
Optional[tuple], default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The desired output resolution of the plot in dots per inch (DPI). This refers to thedpiparameter inmatplotlib.pyplot.figure().ax (
Optional[Axes], default:None) – If desired, one can provide an existingmatplotlib.axes.Axesobject to plot the autocorrelation plot on. This is for instance useful if you want to plot multiple autocorrelation plots on the same figure.suppress_display (
bool, default:False) – IfTrue,matplotlib.pyplot.show()is not run.
- Return type
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.acf_values(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation. This function takes a
Sequenceobject, and returns an array with steps ofresolutionof unstandardized correlation factors.- Parameters
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
Optional[float], default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Return type
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
This function uses the
numpy.correlate()to calculate the correlations.
- thebeat.stats.ccf_df(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequenceobject, and return aPandas.DataFrameobject containing the results.- Parameters
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
Optional[float], default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns
DataFrame containing two columns: the timestamps, and the cross-correlation factor.
- Return type
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_plot(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Cross-correlation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
Calculate and plot the cross-correlation function (CCF) between two
Sequenceobjects. The test sequence is compared to the reference sequence.- Parameters
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.
smoothing_window (
Optional[float], default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.style (
str, default:'seaborn-v0_8') – The matplotlib style to use. See matplotlib style reference.title (
str, default:'Cross-correlation') – The title of the plot.x_axis_label (
str, default:'Lag') – The label of the x axis.y_axis_label (
str, default:'Correlation') – The label of the y axis.figsize (
Optional[tuple], default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The resolution of the plot in dots per inch. This refers to thedpiparameter inmatplotlib.pyplot.figure().ax (
Optional[Axes], default:None) – Amatplotlib.axes.Axesobject. IfNone, a new Figure and Axes is created.suppress_display (
bool, default:False) – IfTrue, the plot is not displayed. This is useful e.g. if you only want to save the plot to a file
- Return type
- Returns
fig– Thematplotlib.figure.Figureobject.ax– Thematplotlib.axes.Axesobject.
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_values(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Returns the unstandardized cross-correlation function (CCF) for two
Sequenceobjects. The test sequence is compared to the reference sequence.- Parameters
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution (
float) – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.smoothing_window (
Optional[float], default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional[float], default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns
The unstandardized cross-correlation function.
- Return type
correlation
- thebeat.stats.edit_distance_rhythm(test_rhythm, reference_rhythm, smallest_note_value=16)[source]#
Caculates edit/Levenshtein distance between two rhythms. The
smallest_note_valuedetermines the underlying grid that is used. If e.g. 16, the underlying grid is composed of 1/16th notes.Note
Based on the procedure described in Post and Toussaint [PT11].
- Parameters
- Return type
Examples
>>> from thebeat.music import Rhythm >>> test_rhythm = Rhythm.from_fractions([1/4, 1/4, 1/4, 1/4]) >>> reference_rhythm = Rhythm.from_fractions([1/4, 1/8, 1/8, 1/4, 1/4]) >>> print(edit_distance_rhythm(test_rhythm, reference_rhythm)) 1
- thebeat.stats.edit_distance_sequence(test_sequence, reference_sequence, resolution)[source]#
Calculates the edit/Levenshtein distance between two sequences.
If Sequences are not quantized to
resolution, they will be quantized to that resolution first.Note
The resolution also represents the underlying grid. If, for example, the resolution is 50, that means that a grid will be created with steps of 50. The onsets of the sequence are then placed on the grid for both sequences. The resulting sequences consist of ones and zeros, where ones represent the event onsets. This string for
test_sequenceis compared to the string of thereference_sequence. Note thattest_sequenceandreference_sequencecan be interchanged without an effect on the results.- Parameters
test_sequence (
Sequence) – The sequence to be tested.reference_sequence (
Sequence) – The sequence to whichtest_sequencewill be compared.resolution (
int) – The resolution to which the sequences will be quantized. If the sequences are already quantized to this resolution, they will not be quantized again.
- Return type
- thebeat.stats.fft_plot(sequence, unit_size, x_min=0, x_max=None, style='seaborn-v0_8', title='Fourier transform', x_axis_label='Cycles per unit', y_axis_label='Absolute power', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
Plots the Fourier transform of a
Sequenceobject. Theunit_sizeparameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds. Note that the first frame is discarded since it will always have the highest power, yet is not informative.- Parameters
sequence (
Sequence) – The sequence.unit_size (
float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.x_min (
float, default:0) – The minimum number of cycles per unit to be plotted.x_max (
Optional[float], default:None) – The maximum number of cycles per unit to be plotted.style (
str, default:'seaborn-v0_8') –The matplotlib style to use. See matplotlib style reference.
title (
str, default:'Fourier transform') – The title of the plot.x_axis_label (
str, default:'Cycles per unit') – The label of the x axis.y_axis_label (
str, default:'Absolute power') – The label of the y axis.figsize (
Optional[tuple], default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The resolution of the plot in dots per inch.ax (
Optional[Axes], default:None) – A matplotlib Axes object to plot on. If not provided, a new figure and axes will be created.suppress_display (
bool, default:False) – If True, the plot will not be displayed.
- Return type
- Returns
fig– The matplotlib Figure object.ax– The matplotlib Axes object.
Examples
>>> from thebeat import Sequence >>> from thebeat.stats import fft_plot >>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25) # milliseconds >>> fft_plot(seq, unit_size=1000) (<Figure size 800x550 with 1 Axes>, <AxesSubplot: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
>>> seq = Sequence.generate_random_normal(n_events=100, mu=0.5, sigma=0.025) # seconds >>> fft_plot(seq, unit_size=1, x_max=5) (<Figure size 800x550 with 1 Axes>, <AxesSubplot: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
- thebeat.stats.get_cov(sequence)[source]#
Calculate the coefficient of variantion of the inter-onset intervals (IOIS) in a
thebeat.core.Sequenceobject.- Parameters
sequence (
Sequence) – Athebeat.core.Sequenceobject.- Returns
The covariance of the sequence.
- Return type
- thebeat.stats.get_npvi(sequence)[source]#
This function calculates the normalized pairwise variability index (nPVI) for a provided
SequenceorSoundSequenceobject, or for an interable of inter-onset intervals (IOIs).- Parameters
sequence (
Sequence) – Either aSequenceorSoundSequenceobject, or an iterable containing inter-onset intervals (IOIs).- Returns
The nPVI for the provided sequence.
- Return type
numpy.float64
Notes
The normalied pairwise variability index (nPVI) is a measure of the variability of adjacent temporal intervals. The nPVI is zero for sequences that are perfectly isochronous. See Jadoul et al. [JRT+16] and Ravignani and Norton [RN17] for more information on its use in rhythm research.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=500) >>> print(get_npvi(seq)) 0.0
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=10,mu=500,sigma=50,rng=rng) >>> print(get_npvi(seq)) 37.6263174529546
- thebeat.stats.get_rhythmic_entropy(sequence, bin_fraction=0.03125)[source]#
Calculate Shannon entropy from bins. This is a measure of rhythmic complexity. If many different ‘note durations’ are present, entropy is high. If only a few are present, entropy is low. A sequence that is completely isochronous has a Shannon entropy of 0.
The bin size is determined from the average inter-onset interval in the
thebeat.core.Sequenceobject (i.e. the tempo) and thebin_fraction. Thebin_fractioncorresponds to temporal sensitivity. The default is 1/32th of the average IOI. This implies that the smallest note value that can be detected is a 1/32th note.- Parameters
sequence (
Union[Sequence,Rhythm]) – Thethebeat.core.Sequenceobject for which Shannon entropy is calculated.bin_fraction (
float, default:0.03125) – The fraction of the average inter-onset interval (IOI) that determines the bin size. It is multiplied by the average IOI to get the bin size.
Example
A
Sequencehas an average IOI of 500 ms. With a bin_fraction of 0.03125 (corresponding to 1/32th note value) the bins will have a size of 15.625 ms. The entropy will be calculated from the number of IOIs in each bin.References
#todo add reference here for this type of entropy calculation.
- thebeat.stats.get_ugof_isochronous(test_sequence, reference_ioi, output_statistic='mean')[source]#
This function calculates the universal goodness of fit (
ugof) measure. Theugofstatistic quantifies how well a theoretical sequence describes a sequence at hand (thetest_sequence). This function can only calculateugofusing a theoretical sequence that is isochronous.The
reference_ioiis the IOI of an isochronous theoretical sequence.- Parameters
test_sequence (
Sequence) – ASequenceorSoundSequenceobject that will be compared toreference_sequence.reference_ioi (
float) – A number (float or int) representing the IOI of an isochronous sequence.output_statistic (
str, default:'mean') – Either ‘mean’ (the default) or ‘median’. This determines whether for the individual ugof values we take the mean or the median as the output statistic.
- Returns
The ugof statistic.
- Return type
numpy.float64
Notes
This measure is described in Burchardt et al. [BBKnornschild21]. Please also refer to this Github page for an R implementation of the ugof measure.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=1000) >>> ugof = get_ugof_isochronous(seq, reference_ioi=68.21) >>> print(ugof) 0.41817915
- thebeat.stats.ks_test(sequence, reference_distribution='normal', alternative='two-sided')[source]#
This function returns the D statistic and p value of a one-sample Kolmogorov-Smirnov test. It calculates how different the distribution of inter-onset intervals (IOIs) is compared to the provided reference distribution.
If p is significant it means that the IOIs are not distributed according to the provided reference distribution.
- Parameters
reference_distribution (
str, default:'normal') – Either ‘normal’ or ‘uniform’. The distribution against which the distribution of inter-onset intervals (IOIs) is compared.alternative (
str, default:'two-sided') – Either ‘two-sided’, ‘less’ or ‘greater’. Seescipy.stats.kstest()for more information.
- Returns
A SciPy named tuple containing the D statistic and the p value.
- Return type
scipy.stats._stats_py.KstestResult
Notes
This function uses
scipy.stats.kstest(). For more information about the use of the Kolmogorov-Smirnov test in rhythm research, see Jadoul et al. [JRT+16] and Ravignani and Norton [RN17].Examples
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=100,mu=500,sigma=25,rng=rng) >>> print(ks_test(seq)) KstestResult(statistic=0.07176677141846549, pvalue=0.6608009345687911)