Menu

#124 fulltext search

open
nobody
None
5
2016-10-08
2016-08-26
No

The search feature of minidlna is useful but in many situations it is insufficient (typically, if I search for "Mozart piano" in my music collection, I get no answers, despite the fact that I have albums whose title contains "Mozart" and tracks, on these albums, whose title contains "piano").

When searching for an album, a full-text search across all fields (title and artist) of an album and of all of its children tracks would be immensely more useful. I understand this may not be strictly DLNA-compliant, but such a feature could be controlled by an option in the configuration file.

I have patched minidlna to do exactly this. I confirm that it is immensely useful, but my patch is a bit of a hack so instead of uploading it let me describe exactly how this can be achieved.
The idea is to use a new table in the sqlite3 database, say a table named bk. The following SQL commands can be tested on the command-line with sqlite3 on the minidlna database file.

Table creation:

CREATE VIRTUAL TABLE bk USING fts4(tokenize=unicode61,
DETAIL_ID INTEGER DEFAULT NULL,
CONTENT TEXT DEFAULT NULL);

Populating it from the other tables (once the other tables are built):

INSERT INTO bk
SELECT o.DETAIL_ID, group_concat(de.TITLE, ' ') || ' ' || d.ARTIST || ' ' || d.TITLE
from
OBJECTS o
join DETAILS d on (d.ID = o.DETAIL_ID)
join OBJECTS e on (o.OBJECT_ID = e.PARENT_ID)
join DETAILS de on (de.ID = e.DETAIL_ID)
where o.OBJECT_ID glob '$' and e.CLASS like 'item.audioItem%' and o.CLASS = 'container.album.musicAlbum'
group by o.DETAIL_ID;
INSERT INTO bk(bk) VALUES('optimize');

Doing an actual full-text search:

SELECT o.OBJECT_ID, o.PARENT_ID, o.REF_ID, o.DETAIL_ID, o.CLASS, d.SIZE, d.TITLE, d.DURATION, d.BITRATE, d.SAMPLERATE, d.ARTIST, d.ALBUM, d.GENRE, d.COMMENT, d.CHANNELS,
d.TRACK, d.DATE, d.RESOLUTION, d.THUMBNAIL, d.CREATOR, d.DLNA_PN, d.MIME, d.ALBUM_ART, d.ROTATION, d.DISC
from
bk f
left join DETAILS d on (d.ID = f.DETAIL_ID)
left join OBJECTS o on (f.DETAIL_ID = o.DETAIL_ID)
where o.REF_ID is NULL and o.OBJECT_ID glob '$' and f.CONTENT MATCH 'mozart piano'
group by o.DETAIL_ID limit 0, 50;

Added benefits: such a full-text search ignores diacritics (so it is easy to search for Dvorak, for example, no matter how the name is spellt across the database). In addition, this full-text search is orders of magnitude faster than the current search function of minidlna.

It would be really nice if such a feature was implemented in minidlna.

Discussion

  • Brett Kerwin

    Brett Kerwin - 2016-08-26
    Post awaiting moderation.
  • Shrimpkin

    Shrimpkin - 2016-09-06

    The deficiency is in your client not minidlna. Your client should send a search like below which will give you the results you want and is supported by minidlna. Although, most clients have simple search functions or none at all.

    upnp:class derivedfrom "object.item.audioItem" and (upnp:artist contains "mozart" and dc:title contains "piano")
    

    This is not my project but I don't see this feature being added? You can post your hack for others in Tickets > Patches.

     
  • Brett Kerwin

    Brett Kerwin - 2016-09-06

    Dear Shrimpkin,

    The search string you suggest does not address the problems solved by a full-text search across all fields of all tracks of each album.

    First, how does the user know that "mozart" should be contained in the artist (or composer) field of the album he is looking for, rather than, say, in a track title or in the album title? Second, I am not sure what the search string you quote does exactly, but I suspect it returns tracks as well as albums. My idea was to return albums, not individual tracks, though the search should extend across all the albums' tracks' fields; in practice this seems more useful. Moreover, I am not sure, but I doubt that DLNA even allows for searches of the kind "return all albums which contain a track whose title contains "piano" and such that the composer of the track, or of the album, contains "mozart"". Third, as far as I know, with the search string you suggest, minidlna would make a distinction between Dvorak and Dvořák, which is something that the user may not want.

    I completely agree that the deficiency is not in minidlna. But it is not in my client either: it is in the protocol, which has something for database-type searches but nothing for Google-like searches. The simple solution I suggest slightly deviates from the protocol, but only so slightly, and it need not be enabled by default.

    In addition, sticking to a search string as the one you suggest implies that in order for this to be useful, the user needs to have a tidy database, where the artist, the composer, the orchestra, the conductor, the instruments, the type of the piece, etc are all specified in the right fields, not semi-randomly. Now, a user who needs a search tool typically has a quite large database, and a user who has a quite large database typically uses an online resource (freedb, Gracenote, etc) to populate it. Therefore such a user's database is typically riddled with inconsistencies. So, even if one neglects the diacritics issue and even if one assumes that the client is sophisticated enough to make intelligent queries (an assumption which, as you note, is already very unrealistic), the search string you suggest would be, in practice, of little use to many users (though I agree that the theory says otherwise).

    Brett

     
  • Shrimpkin

    Shrimpkin - 2016-09-16

    OK, I got what you're trying to do. Still think the client needs to handle this. For example, one client (BubbleUPnP) sends several SearchCriteria strings for each search done on the client.

    1) Use the 'or' operator to search other fields. 2) Maybe you can find the album you want in two searches? First, search for matching tracks. Then, search for the album based on the selected track. 3) Use the 'or' operator or a wildcard like 'devo__k'.

    FTS tables are interesting and improves search speed but there are tradeoffs to consider. As I mentioned, you can post a patch for others.

     
  • Brett Kerwin

    Brett Kerwin - 2016-09-24

    I understand your reluctancy but still believe the FTS table approach works better in practice, for at least three significant reasons: 1) search speed, 2) the usefulness of ignoring diacritics, 3) the impossibility, in practice, for many use cases, to modify the client (e.g., the user may be bound to a closed-source client which is the only client compatible with his hardware). I agree that in theory 3) should not count as an argument. But 1) and 2) do count, even in theory. As regards 1), I did not carry out extensive tests but am quite sure that on slow hardware such as a cheap NAS, the two-step search you suggest will take a long time for every search while the FTS search is blazingly fast.

    You mention tradeoffs. Does this approach have any drawback besides enabling the user, by a configurable option, to make minidlna slightly break the dlna protocol?

    I posted a patch under Tickets/Patches. (I messed up the link there, sorry, I am new to Sourceforge, surprisingly enough there does not seem to be a way to preview posts.)
    But again it is a bit of a hack, I'd be happy if someone could implement it properly.
    Based on the SQL requests I wrote above, doing so should be easy for anyone who completely understands the minidlna source code.

     
  • Shrimpkin

    Shrimpkin - 2016-10-01

    The tradeoff is memory for speed. I doubt search speed is going to be a problem (timeouts) unless your database is very large. Not sure how you would implement properly since the search you want to perform is very specific.

    I did a little testing and found no improvements using FTS tables when the table needs to be built (or rebuilt on db updates). Of course, searches done after FTS tables are (re)built are much faster. I used the attached patch for comparision testing. It uses bang (!) type searching and hardcoded search terms.

    If you are doing similar searches, another option is using keyword based containers. Ex: Music > Album > Keywords > ... in scanner.c with provided keywords. The benefit is the container can be accessed by clients which do not support searching.

    BTW, there is an option for preview in the message toolbar.

     
  • Brett Kerwin

    Brett Kerwin - 2016-10-01

    Thanks for your interest.

    Sure, when the FTS table needs to be built or rebuilt, there is no search speed improvement. But, first, in practice, many users would perform searches much more often than database changes, so that building or rebuilding would, in practice, be something which happens rarely (for this use case). And second, a proper implementation would not rebuild the whole FTS at every database change, and would not build it at the time of the first search but instead at the time of the construction of the rest of the database. I proceeded in this way because I did not understand the code well enough to do otherwise (this is what I meant by "a quick hack") and because in practice there was little drawback for my use case. But it should be easy to simply update the FTS table whenever a database change is detected (rather than destroy it and rebuild it). In such an implementation, all searches would be very fast.

    I did not test memory consumption of the FTS table and cannot say anything about it.
    I would have guessed it remains reasonable. These days, cheap hardware tends to have more memory than CPU speed. On my NAS it works fine.

     
  • Shrimpkin

    Shrimpkin - 2016-10-08

    From what I have read about FTS tables, some of the overhead can be quite expensive. It may be when only certain features are enabled?

    If I needed FTS table support, I would probably get rid of the group_concat and data duplication. Just move the columns into the FTS table. I imagine most of the modifications would be with the SQL statements and very little additional code would be needed. My knowledge of FTS is limited and this may not be feasible.

    If you want a better implementation, I believe you will have to code it yourself? The main files that will need modification: inotify, metadata, scanner, upnpsoap. Quite a bit of the other code is based on miniupnp project.

     

Log in to post a comment.