Evaluating Extracted Musical Features with Versions
School/Affiliation:MAPLE Lab, McMaster University
Virtual or In-person:In-person
Despite the widespread development of Music Information Retrieval (MIR) tools, little attention has been devoted to testing and evaluating their accuracy and robustness. In the case of subjective features with no perceptual ground truth, evaluation and testing is considerably more challenging.
In the western classical piano tradition musical features such as tempo and timbre are interpretive cues defined by performers and therefore may vary between performances or versions, while cues such as mode and the number of onsets are structurally defined by the composer and should not vary between versions.
We extracted these features from the first eight measures of 16 versions of the 24 preludes from J.S. Bach’s Well Tempered Clavier Book 1 using three MIR tools (MIRtoolbox, Essentia, and librosa). We computed a standardized variability metric for each prelude and feature to compare their relative variability. Results show significant differences between features, but also significantly more variability across all features extracted with MIRtoolbox.
This method is a novel framework for testing musical feature extraction: analyzing multiple versions of the same piece of music can determine if an extracted feature’s variability is consistent with our a priori assumption, and what factors may contribute to these patterns of variability. This approach may also be useful for other features where a ground truth is difficult to determine or otherwise unavailable, or as an algorithm or tool selection procedure.