A FreeDB MySQL importer Code

Status: Beta
Brought to you by: ichaer
Tree [r1] / History
HTTPS access
File	Date	Author	Commit
perl	2006-05-08	ichaer	[r1]
php_demo	2006-05-08	ichaer	[r1]
sql	2006-05-08	ichaer	[r1]
LICENSE.txt	2006-05-08	ichaer	[r1]
README.txt	2006-05-08	ichaer	[r1]
Read Me

freedb2mysql - A FreeDB MySQL importer (or is it?)
------------------------------------------

Project page: http://www.sourceforge.net/projects/freedb2mysql/
------------------------------------------

Version: 0.8
Created: March 2006



1. Overview
-----------

  This consists in a bunch of scripts that are to build a reasonably normalized
and good for searches version of FreeDB that could be, theoretically, held in
any DBMS. I wrote it for MySQL, and it works pretty well there.



2. How to run it
-----------------

  Before running any Perl or PHP script, check the database configuration in
them!


  The execution sequence for the conversion should be:

    - Run 'freedb.sql' on the database and create all the tables you'll need.
      It expects to be ran in a schema called `freedb`, and all the following
      scripts expect that too. Hope you don't mind.

    - Run 'FreedbPrepareDb.pl'. Don't forget to set the correct freedb path
      in line 17 before doing so! As I ran it under windows, it currently
      looks like this:
          $fileReader->insertFiles('C:/freedb');
      But a Linux user's path might look more like this:
          $fileReader->insertFiles('~/freedb');
      This script will load a file control table, which is used to find out the
      path of the input files which still need to be processed. The loading
      process might take a very long time, and it is very good to be able to
      interrupt it and continue from where it stopped without corrupting the
      whole thing. Under Windows* at least, this script takes a few hours to
      complete its job...

    - Run 'FreedbInsert.pl'. It will actually load the data in the database.
      It can be stopped gracefully by creating a file named 'stop' in its
      directory (I thought about catching signals, but in the environment I
      was working in, this approach was more practical; you can change it if
      it bothers you). This script took me about 4 days to process the whole
      FreeDB under Windows* in a 1.8GHz Pentium 4 with 1GB of RAM.
      Errors while reading or inserting data will be output in STDERR and
      logged to the freedb_fileerror table. When I ran it, I got 2,016
      rejections for various reasons - most were bizarre situations or entries
      containing values that were too large for the database definitions. I
      simply ignored those entries, it's a small number and most of them didn't
      look good anyway; but I suppose one could  them with some clever
      scripting and a  bit of manual patching.

    - Run 'disc_sdx.pl'. This script will create a table relating albums to
      soundexes of their titles and artists. Kind of a datawarehouse for
      typo-ignoring searches.

    - Run 'create_cd_info.sql' in the database. This little script creates a
      the cd_info table, that uses a mix of soundexes and fulltext indexes
      for a (hopefully) typo-ignoring, fast search database.

    - Run 'searchCd_procedure.sql'. This creates the searchCd procedure, which
      should do fast searches in the cd_info table. I'm not sure if that
      procedure is thread-safe, nor if it is the best approach for the
      problem... its algorithm, though, is valid, and I believe it is also
      pretty optimized. You could port it to another language (say, PHP)
      if it isn't good the way it is.
      You have to run this script directly under mysql client. phpMyAdmin
      (which I use for this kind of thing) doesn't like it.

    - Test the final result =). I made a little test page for that,
      'search.html', that relies on 'searchAlbum.php' to get results from the
      MySQL FreeDB built in the previous steps; it might be useful as a demo
      of how this whole shebang works.

  Oh... and no, I'm not always finishing the statements I prepare in my perl
scripts. I know it isn't a pretty thing to do, but I'm so lazy, and these
scripts aren't to be ran constantly... here, lets make it sound prettier:
TODO: finish all statements before exiting in perl scripts.



3. Post-execution considerations
-----------------

  The freedb_file and freedb_fileerror tables are auxiliary and may be removed
after data has been loaded to the DBMS. The disc_search table, although
possibly interesting in itself, may also be removed after cd_info has been
generated.



4. File listing
-----------------

 + sql - Database table and procedure creation scripts.
   - freedb.sql             - Table creation script.
   - create_cd_info.sql     - Creates the cd_info table, that uses a mix of
                              soundexes and fulltext indexes for a (hopefully)
                              typo-ignoring, fast search database.
   - searchCd_procedure.sql - A small procedure built to quickly retrieve
                              results from the cd_info table.

 + perl - Freedb data loading perl scripts.
   - FreedbFileReader.pm - Class encapsulating freedb file control.
   - FreedbEntry.pm      - Class encapsulating various freedb entry functions.
   - FreedbPrepareDb.pl  - Loads the freedb_file table.
   - FreedbInsert.pl     - This little script does all the file reading and
                           data inserting.
   - disc_sdx.pl         - Loads data into the disc_search table. This table
                           associates albums to a soundex string containing the
                           artist name and album title. This script gets its
                           input and outputs its result to the database.


 + php - A little search page that uses the database search procedure to find
         stuff.
   - search.html     - Basic search form.
   - searchAlbum.php - Search results retriever.



5. Licensing
-----------------

  This program (err, batch of scripts) is licensed under the GNU Lesser Public
License. If you ever want to read a full copy of it, it can be found here:
    http://www.gnu.org/licenses/lgpl.html
  It basically states that you can do whatever you want with the program,
including incorporating it in your own proprietary program, as long as you
are fair and don't try to cheat the author. All I ask for is proper credit =).
If you ever have a good idea concerning this program, contact me, maybe we can
do something neat!



6. Thanks
-----------------

  Haven't seen this section in many READMEs, but I just had to have one =). I'd
like to thank Michael Kaiser, from the FreeDB project, to whom I first sent
news of this little perl hack that freedb2mysql is. If he hadn't answered my
e-mail, I wouldn't have started up a project in SF.
  And to Andre Bianchi, who dislikes Windows* so much he rejected all the
alternative MP3 renaming projects I told him of (and there are a few good ones
out there for Windows*!) and made me think of continuing in the pursuit of a
normalized, easy and fast to search CD information database.





Have fun!
Iúri Chaer
A FreeDB MySQL importer Code

Tree [r1] / Download Snapshot History

Read Me

Tree [r1] /

History