Menu

#657 Use MySQL blob-based tables for images

open
nobody
5
2006-08-18
2006-08-18
No

I would really like to have the ability to store and
manage images in a MySQL database/table using BlOBs.

It doesn't seem like it should be too difficult to
implement (I know, easy for me to say it's a SMOP).
There was an article on DevShed entitled "Blobbing Data
with PHP and MySQL". I am not a programmer and don't
even play one on TV, but I did manage to modify and
implement the code there to create a simple database
for uploading, storing and showing image (and other
large) files.

http://www.devarticles.com/c/a/MySQL/Blobbing-Data-With-PHP-and-MySQL/

Discussion

  • KosherJava

    KosherJava - 2006-08-18

    Logged In: YES
    user_id=634811

    This was discussed on the developer list in Jan 2004. Here
    is a post from John:

    By: John Finlay - yalnifjProject AdminAccepting Donations
    RE: mySQL Blobs for images?
    2004-01-08 19:36
    There are some disadvantages to storing the images as blobs
    in MySQL and it can be difficult to make work.

    The big disadvantage is that the queries are slow. When I
    originally did the Employee Scheduler application I stored
    the employee's pictures in a blob field in the same user
    table. This slowed down the database queries on the order of
    100 times slower.

    As long as you use a seperate table just for storing blobs I
    don't think there will be any major database problems.

    The other problems you will run into is setting the MIME
    type headers correctly and using PHP to print out the binary
    data. If you get those right it should work great.

    --John

     
  • Bill Conlon

    Bill Conlon - 2006-09-18

    Logged In: YES
    user_id=67152

    I've done a lot of applications that store images in mysql,
    and I find many advantages to this approach.

    Any suggestions on how to get involved?

     
  • Michael Greis

    Michael Greis - 2006-09-18

    Logged In: YES
    user_id=1578309

    The PhpGedView team could best respond on how you could get
    involved.

    Thinking out loud, it seems that this can be divided into
    two pieces - the data schema for the images and the changes
    to the PHP coding that manages the images.

    One possible schema would have several tables:
    - One table with the full size images
    - One table with the thumbnails and a pointer to the table
    entry for the full size images, along with information about
    the image (including text for searching for the image).
    - One table with pairwise entries, with each entry
    consisting of a family/individual identifier and a media
    identifier, so you can easily query and select all the
    images associated with an individual or family

    Each individual or family can have a primary (thumbnail)
    image that is displayed on its page and on charts with it.
    You'd have to give some thought to how best to represent
    that, since you'd want fast lookup and retrieval of the
    thumbnails for display (hence putting the full-size images
    in a separate table).

    If there were a one-to-one relationship between
    indviduals/families and PRIMARY images, you could include a
    field in the "thumbnail" table designating that image as the
    PRIMARY image for a specific individual/family. However,
    there is nothing that currently prohibits you from having
    the same image serve as the primary image for multiple
    individuals. You could either enforce that one-to-one
    (probably less desirable) or put those relationships elsewhere.

    Another question would be how to make the names of all the
    people linked to the image searchable as well (i.e., when
    you search the photos, you'd like to be able to use the name
    of a person who has been linked to the image, presumably as
    he or she is in it).

    The second part would involve changes to the PHP so the
    images are retrieved from MySQL rather than a directory on
    the disk. The "manage media" portion would be relatively
    self-contained, but the changes for the use of the images
    would be more extensive throughout the code and would, I
    imagine, require the cooperation of a number of the coders
    working on PHP.

    Perhaps one of the owners/coders could offer some thoughts.

     
  • Bill Conlon

    Bill Conlon - 2006-09-18

    Logged In: YES
    user_id=67152

    Having done this many times, generally all files get stored
    in a single table. Doesn't matter what size they are.

    A related tables manages image meta data (size, description,
    copyright, etc.).

    Related tables allow many to many relationships of
    images/GED entities -- eg., many people can be in one image;
    one person can be in many images.

    Choosing primary image to be displayed is presumably already
    done. Presumably, we only need to change the table/key
    relationship in existing code that manages iamges.

     
  • John Finlay

    John Finlay - 2006-09-18

    Logged In: YES
    user_id=300048

    I see no advantage to storing images in the database as
    opposed to storing them on the file system.

    Why would we want to implement this?

    --John

     
  • Bill Conlon

    Bill Conlon - 2006-09-18

    Logged In: YES
    user_id=67152

    Here are a few reasons.

    1. unified access control
    Images stored in the file system rely on completely
    different authentication and access control mechanism, such
    as .htacess. all of the admin controls that apply to
    existing data elements would apply to image data. (As
    opposed to merely controlling a file system pointer).
    Setting up .htaccess control for nested file systems and
    multiple users, with different privileges is not for the
    faint of heart.

    2. scalability
    Though less of an issue with modern file systems,
    directories of many thousands of files would get slow.
    Databases don't have this issue. Also I would rather use
    mySQL's replication services than mirror the file systems.

    3. The many-to-many relationships of images to persons
    seems to me to be a logical extension of the GedView
    capabilities. Of course, there is the issue of how to
    re-export GEDCOMs.

    4. File ssytem security. Don't know about you, but I'm
    awful reluctant to give the php user access to anything
    other than a specified database. I don't want it traversing
    my file system, I certainly don't want it writing to my file
    system, and I especially don't want want it executing
    anything. In any event, it's hard to execute a trojan
    stored in the database, as opposed to getting it into the
    file system.

     
  • Michael Greis

    Michael Greis - 2006-09-19

    Logged In: YES
    user_id=1578309

    In addition to those system/access reasons, it just seems a
    lot cleaner to have all that data in a managed environment,
    where it is more easily backed up and more portable.

    The limitations of the GEDCOM format and standard have been
    worked around by creating references to external files for
    images, which is a real kluge. I haven't followed this long
    enough to have a sense whether an XML standard will really
    be implemented, but there is an inherent logic to defining
    and using containers for objects that are not bound by the
    constraint of a file system. (If you think about it, the
    idea of long text passages implemented as continuation lines
    in a GEDCOM file could potentially be better implemented in
    an external environment as well.)

     
  • Michael Greis

    Michael Greis - 2006-09-19

    Logged In: YES
    user_id=1578309

    Question for wmconlon -

    I suggested separating the images from the thumbnails
    because the images themselves are only displayed when
    specifically requested, whereas the thumbnails are used all
    over the PhpGedView screens. I'm not a SQL performance
    person, but it didn't seem to make sense to be putting
    potentially very large items in a table that has to have
    fast lookup and response. (I'm thinking of the system
    itself and how it handles and retrieves the data from the
    tables; I know the keys will ensure fast searching.) And
    since there's a one-to-one correspondence with the
    thumbnails (you create a thumbnail for each image), it
    seemed reasonable to separate the full-size images from the
    thumbnails.

    Would it in fact make little or no difference from a
    performance standpoint?

     
  • John Finlay

    John Finlay - 2006-09-19

    Logged In: YES
    user_id=300048

    1. unified access control
    I can see this as a valid reason, however it can be acheived
    without a database. Just put the image files in a context
    outside of the web directory and use the same logic for
    displaying them from a database for displaying them from
    another location.

    2. scalability
    Having implemented blob storage for images before, I would
    rather not have a bloated and slow database. Eventually all
    storage goes back to the file system and the hard drive even
    if it is in the database. All it is doing is shifting where
    the resources are.

    PGV allows you to have a hierarchy of folders in your media
    directory. So not all files are in a single directory.

    If you have the ability to replicate a database, then I
    would also think that you could setup rsync to replicate the
    filesystem.

    3. The many-to-many relationships of images to persons...
    This already exists and is supported by GEDCOM. You don't
    need a database for it.

    4. File sytem security.
    Beyond what you listed in #1, I don't see it as being any
    more of a security risk. If you allow users to put
    something "anywhere" on the server, and then allow them to
    get it back, then it is a security risk.

    Here are some disadvantages to storing images in a database:
    A. FTP access
    You cannot upload images outside of the program. Most of
    the PGV admins use FTP to upload/download files. Without
    FTP access to their images we would have to also provide
    clients for batch processing etc.

    B. PHP upload limits
    Related to "A" above, most PHP installations limit you to a
    2MB upload size. If you have objects that are larger than
    2MB you will have no way to get them into the database.
    There are also things besides images that need to be linked
    to people.

    C. Memory limits
    PGV is already pushing the memory limits of PHP. If we then
    have to grab blobs out of the database just to show the
    pictures, we will be increasing the required memory limits.

    --John

     
  • Bill Conlon

    Bill Conlon - 2006-09-19

    Logged In: YES
    user_id=67152

    1. unified access control.

    disagree. If you want to control access to the image, the
    HTTP GET request must be processed by the applicaion. images
    in the file system bypass the applicaiton (even if links to
    those images are generated by the app) and are accessible
    directly via a URI, subject only to apache access control,
    which is not as granular nor as easy to manage as
    application-based ACLs.

    2. scalability.
    Not al file systems are created equal. Your point is valid
    about disk I/O -- certainly the image needs to come off the
    disk no matter how it is stored. My experience is clearly
    different, as I find that retrieving images from mySQL ISAM
    tables is very fast.

    Yes, we could run rsync on every update. Tell me how a web
    farm would be scaled. User A uploads image A.jpg to Server
    A and User B uploads image A.jpg to Server B. They might or
    might not be the same image, but someone will have their
    image overwritten during mirroring.

    This brings up another issue -- naming. Images in the file
    system depend on their path/name to be "unique" whereas
    images in the db have a unique key. We can use an MD5 hash
    to identify suspected duplicates.

    3. Actually, I don't know anything about the GEDCOM
    standard. I'm glad to hear it wouldn't be a problem to
    deploy the many-to-many relationships back to a downloaded
    GEDCOM.

    4. security.
    Assuming I can get a shell, I can ./my_trojan.jpg, but I
    can't execute a BLOB

    A. FTP.
    ok. I don't run FTP on my systems for my own security
    reasons. It seems to me that if the master image is
    maintained at a GEDview server, the downloaded GEDCOM would
    just reference the URI at the server:
    http://myserver.com/image.php?_uid1=FE123
    so there wouldn't be a need for bulk downloads. Regarding
    uploads, yes, it would be convenient to bulk process images
    into the db.

    B. Upload limits.
    Yes. And mysql max_allowed_packet comes into play. This
    might be a reason to write a utility for uploading large
    images. Speaking of which, I agree that any mime type could
    be stored (that's what I do in my apps).

    C. Memory
    Hmm. That is a problem. I guess I need to install
    phpGEDView to see this in action. But we could use mod_perl
    to show the images:
    http://myserver.com/cgi-perl/image.pl?_uid1=FE123
    to relieve the php interpreter.

    --bill

     
  • John Finlay

    John Finlay - 2006-09-19

    Logged In: YES
    user_id=300048

    Bill,

    1. unified access control
    I think you misunderstood me. If the image is on the file
    system outside the web root, you cannot access it from the
    webserver using a URL. Just as you would stream it from the
    database and alter the HTTP headers to return it from a
    database, you would do the same thing to stream it from
    another location on the filesystem. You would have to do
    the same thing to alter the HTTP headers to set the correct
    MIME type etc that you do from a database. It is exactly
    the same concept, but instead of getting the blob from the
    database you get it from a non accessible location on disk
    outside the web root. Then the program has to provide all
    access to it and you don't have to deal with .htaccess.

    2. Scalabilty
    I used to work for a large library on their digital
    collections department. We evaluated a lot of software to
    help run our online digital collections. Those that stored
    digital media on the file system stored them using a hash or
    renamed the images.

    But for 99% of PGV admins this is unnecessary. Most of them
    will not be scaling to multiple servers and most of them
    will only be confused if their images do not have the same
    filename on the server, in the GEDCOM, and on their home
    computers.

    4. Security
    If you can get a shell, you have a lot more to worry about
    than someone executing a file they uploaded. It is slightly
    more difficult to execute something from the db, but still
    just as possible. Good security programming principles are
    the best way to ensure security.

    Also, only authenticated users may upload files. So you
    would know who uploaded any file to your server.

    From my perspective, I still don't see a reason to put
    development resources towards this when there are so many
    other things that need to be worked on. We still have a
    great deal of user-friendliness issues to work out when it
    comes to media.

    But if someone from the open source community wants to work
    on it as an optional feature, then we could include it in
    the project.

    --John

     
  • Michael Greis

    Michael Greis - 2006-09-19

    Logged In: YES
    user_id=1578309

    This is a great discussion and I really appreciate the back
    and forth. I'm learning a lot and it's great to dig in to
    some of these questions and issues that have applicability
    beyond this specific "feature" request.

    Follow-up on a few of the points:

    - "There are also things besides images that need to be
    linked to people."

    Absolutely. Since you can store MIME-type information, you
    can potentially make available many forms of information
    under "media", and they would be stored along with
    information about how to "access" them. Might that not make
    it easier, down the road, to allow a user to launch another
    program to use or update information (images, video, voice,
    text)? The items are also nicely encapsulated in the
    database where they can't be accessed outside of the program.

    2. "scalability
    Having implemented blob storage for images before, I would
    rather not have a bloated and slow database. Eventually all
    storage goes back to the file system and the hard drive even
    if it is in the database. "

    That makes sense and was why I was thinking that the
    full-scale "images" should be in their own table - they
    won't actually be accessed that often.

    3. FTP access

    I'm not sure I understand why this is a problem. Couldn't
    you just continue to use whatever you use now to get the
    files onto the system and then run the procedure that puts
    them into the database?

    That eliminates the entire problem of figuring out the
    hierarchy and naming convention for the files. Having
    loaded a small number of images into PhpGedView, I would
    like nothing better than to load images without having to
    worry about creating unique file names for them.

     
  • Bill Conlon

    Bill Conlon - 2006-09-19

    Logged In: YES
    user_id=67152

    1. unified access control
    sure. I was thinking you meant Apache Alias directives.
    But this now involves a php file read into $blob as opposed
    to sql_select into $blob. Memory use is roughly the same.
    I need to think more about this, as it seems like taking
    image control away from the app opens up potential database
    inconsistencies.

    2. scalability
    yes. but the filename is part of the http header, so as long
    as it is metadata in a table, it doesn't change for the client.

    4. Security
    Ahh, but the web server (say user apache) now has write
    access, and I presume execute so it can create a new
    directory. So I have to be sure there are no potential
    flaws in the php code I'm running. But now I see why you
    keep the uploads out of the web root.

    Resources.
    My first post was "how do I get involved?" Can you tell me
    whether there is an architecture in place for 'mods' since
    the ideal approach would be to have a media_store mod that
    just supplanted to file system store.

     
  • John Finlay

    John Finlay - 2006-09-19

    Logged In: YES
    user_id=300048

    > Resources.
    > My first post was "how do I get involved?" Can you tell me
    > whether there is an architecture in place for 'mods' since
    > the ideal approach would be to have a media_store mod that
    > just supplanted to file system store.
    >

    PGV does not have an abstraction for where uploads are
    stored. The changes to implement one would be minimal since
    there are only a few plages where files can be uploaded from.

    PGV does have an architecture for plugins or modules similar
    to the *Nuke CMS systems. Others have successfully ported
    Gallery and other image manipulation programs over to PGV
    modules. It would probably be a good idea to build on an
    existing open source app like Gallery to do this. You can
    get the latest PGV and the Gallery mod from the SVN.

    I haven't looked at Gallery for some time, so I don't know
    if they store the images in the DB. When I last used
    gallery they did not use a DB but that was several years ago.

    --John

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.