Commit Graph

5 Commits

Author SHA1 Message Date
Philipp Wolfer
d1a7063c56 PICARD-2361: Fix clustering using removed files 2021-12-14 12:29:37 +01:00
Philipp Wolfer
b4ccb1618d PICARD-2340: Use configured Various Artists name for clusters without artist 2021-11-23 15:31:58 +01:00
Philipp Wolfer
2aefcd051a PICARD-2339: Ensure clustering uses most common spelling of the same artist
This restores previous behavior, where a cluster primary artist is based on the tokenized artist name, but then the most common real spelling is being used.
2021-11-23 15:17:07 +01:00
Philipp Wolfer
bb48705357 PICARD-2339: Simplify clustering algorithm
The existing code was using the Levenshtein distance to calculate similarity, which caused a O(n^2) performance. But since only exactly similar matches where used (similarity threshold 1.0) this was not necessary.

This new implementation uses simple comparison for string equality and performs in O(n).
2021-11-23 15:16:59 +01:00
Philipp Wolfer
0ff35391be Added tests for clustering algorithm
Co-authored-by: Laurent Monin <github@norz.org>
2021-11-23 15:11:15 +01:00