A FreeDB MySQL importer Code
Status: Beta
Brought to you by:
ichaer
File | Date | Author | Commit |
---|---|---|---|
perl | 2006-05-08 | ichaer | [r1] |
php_demo | 2006-05-08 | ichaer | [r1] |
sql | 2006-05-08 | ichaer | [r1] |
LICENSE.txt | 2006-05-08 | ichaer | [r1] |
README.txt | 2006-05-08 | ichaer | [r1] |
freedb2mysql - A FreeDB MySQL importer (or is it?) ------------------------------------------ Project page: http://www.sourceforge.net/projects/freedb2mysql/ ------------------------------------------ Version: 0.8 Created: March 2006 1. Overview ----------- This consists in a bunch of scripts that are to build a reasonably normalized and good for searches version of FreeDB that could be, theoretically, held in any DBMS. I wrote it for MySQL, and it works pretty well there. 2. How to run it ----------------- Before running any Perl or PHP script, check the database configuration in them! The execution sequence for the conversion should be: - Run 'freedb.sql' on the database and create all the tables you'll need. It expects to be ran in a schema called `freedb`, and all the following scripts expect that too. Hope you don't mind. - Run 'FreedbPrepareDb.pl'. Don't forget to set the correct freedb path in line 17 before doing so! As I ran it under windows, it currently looks like this: $fileReader->insertFiles('C:/freedb'); But a Linux user's path might look more like this: $fileReader->insertFiles('~/freedb'); This script will load a file control table, which is used to find out the path of the input files which still need to be processed. The loading process might take a very long time, and it is very good to be able to interrupt it and continue from where it stopped without corrupting the whole thing. Under Windows* at least, this script takes a few hours to complete its job... - Run 'FreedbInsert.pl'. It will actually load the data in the database. It can be stopped gracefully by creating a file named 'stop' in its directory (I thought about catching signals, but in the environment I was working in, this approach was more practical; you can change it if it bothers you). This script took me about 4 days to process the whole FreeDB under Windows* in a 1.8GHz Pentium 4 with 1GB of RAM. Errors while reading or inserting data will be output in STDERR and logged to the freedb_fileerror table. When I ran it, I got 2,016 rejections for various reasons - most were bizarre situations or entries containing values that were too large for the database definitions. I simply ignored those entries, it's a small number and most of them didn't look good anyway; but I suppose one could them with some clever scripting and a bit of manual patching. - Run 'disc_sdx.pl'. This script will create a table relating albums to soundexes of their titles and artists. Kind of a datawarehouse for typo-ignoring searches. - Run 'create_cd_info.sql' in the database. This little script creates a the cd_info table, that uses a mix of soundexes and fulltext indexes for a (hopefully) typo-ignoring, fast search database. - Run 'searchCd_procedure.sql'. This creates the searchCd procedure, which should do fast searches in the cd_info table. I'm not sure if that procedure is thread-safe, nor if it is the best approach for the problem... its algorithm, though, is valid, and I believe it is also pretty optimized. You could port it to another language (say, PHP) if it isn't good the way it is. You have to run this script directly under mysql client. phpMyAdmin (which I use for this kind of thing) doesn't like it. - Test the final result =). I made a little test page for that, 'search.html', that relies on 'searchAlbum.php' to get results from the MySQL FreeDB built in the previous steps; it might be useful as a demo of how this whole shebang works. Oh... and no, I'm not always finishing the statements I prepare in my perl scripts. I know it isn't a pretty thing to do, but I'm so lazy, and these scripts aren't to be ran constantly... here, lets make it sound prettier: TODO: finish all statements before exiting in perl scripts. 3. Post-execution considerations ----------------- The freedb_file and freedb_fileerror tables are auxiliary and may be removed after data has been loaded to the DBMS. The disc_search table, although possibly interesting in itself, may also be removed after cd_info has been generated. 4. File listing ----------------- + sql - Database table and procedure creation scripts. - freedb.sql - Table creation script. - create_cd_info.sql - Creates the cd_info table, that uses a mix of soundexes and fulltext indexes for a (hopefully) typo-ignoring, fast search database. - searchCd_procedure.sql - A small procedure built to quickly retrieve results from the cd_info table. + perl - Freedb data loading perl scripts. - FreedbFileReader.pm - Class encapsulating freedb file control. - FreedbEntry.pm - Class encapsulating various freedb entry functions. - FreedbPrepareDb.pl - Loads the freedb_file table. - FreedbInsert.pl - This little script does all the file reading and data inserting. - disc_sdx.pl - Loads data into the disc_search table. This table associates albums to a soundex string containing the artist name and album title. This script gets its input and outputs its result to the database. + php - A little search page that uses the database search procedure to find stuff. - search.html - Basic search form. - searchAlbum.php - Search results retriever. 5. Licensing ----------------- This program (err, batch of scripts) is licensed under the GNU Lesser Public License. If you ever want to read a full copy of it, it can be found here: http://www.gnu.org/licenses/lgpl.html It basically states that you can do whatever you want with the program, including incorporating it in your own proprietary program, as long as you are fair and don't try to cheat the author. All I ask for is proper credit =). If you ever have a good idea concerning this program, contact me, maybe we can do something neat! 6. Thanks ----------------- Haven't seen this section in many READMEs, but I just had to have one =). I'd like to thank Michael Kaiser, from the FreeDB project, to whom I first sent news of this little perl hack that freedb2mysql is. If he hadn't answered my e-mail, I wouldn't have started up a project in SF. And to Andre Bianchi, who dislikes Windows* so much he rejected all the alternative MP3 renaming projects I told him of (and there are a few good ones out there for Windows*!) and made me think of continuing in the pursuit of a normalized, easy and fast to search CD information database. Have fun! Iúri Chaer