Adaptive Playlist Enhancements
Submitted by John Richard Moser
Link to original bug (#165745)
Description
I've seen a few adaptive playlist engines, and I think there are some new ideas that would be interesting to see deployed to tune adaptive databases both passive-aggressively.
Similarity webs and gradients could be added so that random playlists could be based on what's "similar" (according to passive-aggressive information gathering from the user) to high-rated songs and "dissimilar" to low-rated ones.
It'd be interesting to let the user smoothly make a similarity/difference analog between two songs so that a similarity web or gradient could be built and used. This would allow the user to relate songs directly by how similar they are to him, rather than by any form of spectral analysis or meta-data.
Such relations could be supplied by the user through a small information box
between the library browser (artist, album, genre lists) and the playlist. This
information box would say "[X] <Title of last song>
is ----|- similar to
current song" in the most simple way possible. The user would check the box,
called the "Relation toggle," and adjust the slider at leisure.
Songs also would be implicitly related. The user would have to activate implicit relations and ask Rhythmbox to calculate implicit relations from all relations, thus generating relations for all song pairs. In implicit mode, if the Relation Toggle is unchecked, then the given relation is implicit. Implicit relations are different from user-defined relations only in that they are regenerated during an implicit regeneration.
For a hypothetical example, song A is 95% similar to song B, song B is 95% similar to song C. By averaging these, we get that song C is 95% similar to song A. In reality, song C is 80% similar to song A, and so the user adjusts the slider and gives us an explicit relationship of 80%.
Implicit relations are initialized to 0 when calculation begins. They are calculated by finding the two explicit relations SongB has to SongC and SongD where Song{C,D} are related to SongA where SongB is most similar to Song{C,D}. The implicit relationship between SongA and SongB without two common related Song{C,D} available is automatically 0 (fully dissimilar).
This relationship scheme should allow the internals of the human mind to work out songs to be similar/different even though mathematically you can't do it too well. This should automatically account for {{hard,soft},{loud,gentle},genre}. Genre like rock/metal/jazz/piano concerto and so on should be played when the user wants to hear them if they are similar to songs the user wants to hear.
By playing songs similar to those the user "wants to hear," Rhythmbox can play songs the user "wants to hear" but hasn't particularly thought of. The rating system can thus take advantage of this information to become more robust. Alternately, the rating system can be ignored in favor of "similarity playlists" which define a set of songs to base playback on using the similarity gradients.
Song selection using the similarity gradients is easy. Random songs can be
played by selecting songs >=N similarity. A large database would store these
relationships; you could possibly chose a random song from the list returned by
"SELECT * FROM tbl_similarity WHERE (song1 = 'SONG' OR song2 = '
SONG') AND
rating > $N;"
These relationships would not be generated or stored until a user defined a relationship or requested implicit relationships be calculated. It would be the user's decision, because there would be Sum(i=2,i<N-1,i) entries max. Let's say a user had 738 entries. The relationships would total 37.8 megabytes. It takes 69M for 1000 and 6.9 gig for 10000.
It would be likely possible to calculate implicit relations on the fly and allow heuristic discarding of explicit relations if they would still allow all implicit relations to be calculated the same and would themselves be calculated the same afterwards. This would be OK if there were few relations, and would help keep the relations small.
The table storing relations would have to store two-way relations between files, and so could be uniquely indexed with (song1, song2). To do this properly, song1's identifying string must come alphabetically before song2's.
It's an interesting idea, but it has implications.