Re: [Linkbat-devel] Data format, XML to DB conversion

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wednesday 20 November 2002 17:24, Shanta McBain wrote:

<extropia stuff snipped>
> I am a bit at a loss as to what is so special about CSV text files. I
> see them only as a poor and limited (Hard to display code) form of data
> storage When data is stored in SQL there is no need for them at all
> except as an alternate storage system when a host that dose not have
> access to SQL. That being said I still use SQL quires to search the
> files for the records and contents of the records you what to display in
> the "View".

That's exactly it: "when a host that dose not have access to SQL" and that's 
really the only reason.  A corollary is someone who does not have the skill 
or desire to set up an SQL database, but that is less important to me than 
someone who cannot set it up. However, it will be a while before it is a 
"product" that we can package and make available on SourceForge, etc. I just 
don't want to paint ourselves into a corner. 

> > Rama has volunteered to work on the code to convert the XML files into
> > something the database can handle. Shanta is working on the "presentation
> > layer", so obviously you two will be working closely together. When the
> > code is done for the CSV->XML conversion, I would like Luu to start
> > working on the code to validate the XML files. That is, making sure the
> > right tags are there, tags match, etc.
>
> I will have the CSV s files into MySQL in a few hours. I have code to do
> this by simply importing the contents into the SQL Table.PHP but it
> works.  From that point on to switch the code from on DataSource to
> another takes a simple change in one variable in the site setup file.
> Export the contents of the SQL table to a | delimited file though and
> delimiter can be used.  and there you have switched. 

Does that  mean there would be little change in the eXtropia code to reflect a 
CSV data source as opposed to SQL?

> Validation, field
> contents, required fields and many other checking is already in eXtropia
> one just has to tell it what you which to check for.  Don't know if this
> is what you are referring to with XML validation as I know little of
> XML.

I'm refering to the actual XML files before they are imported into the 
database. If someone forgets a closing tag or something else, we should know 
about it before we try to read it into the database. 

> > Following are some thoughts and ideas about how the tables can be
> > created. I have no particular affection for any of these ideas, so feel
> > free to tell me they stink.
> >
> > The first thing we need to decided for the data model is how to store the
> > KUs themselves. Since different KU types have different attributes, I do
> > not see a way of storing them all in a single table. Instead, I would see
> > one table for each KU type and then a central table that contains the
> > unique KU ID and then the KU type. When making a cross reference from one
> > KU to another, we would then go through this central table.
> >
> > Next is the inter-relationship between the KUs, which is a core aspect of
> > linkbat. There is a potential that each KU (regardless of type) can
> > reference every other type of KU. However, I cannot really imagine an
> > index for each relationship. For example, the Concept KU has an index
> > listing all of the Concept KUs and with pointers to all of the referenced
> > Glossary KUs.  Then another index with the MoreInfo KUs. In my mind that
> > would mean too many tables. If there are five KU types and each has a
> > index for the relationship to the others, then we would have 20 tables
> > (5x4). With 6 KU tables, 30 tables (6x5). Is my math right?
> >
> > However, the question is whether 20 or 30 indexes is "too many".  Does
> > having seperate indexes for each relationship provide any advantages?
> > Quicker access? If so, does it compensate for the extra work to manage
> > 20-30 tables??
>
> The SQL table takes care of that for another reason to store all data in
> SQL.You can assign indexes within each table.

I'm aware of that. However, these are not quite the same indexes that an RDBMS 
system like MySQL would generate automatically. It's not a index of a single 
table used to speed-up access, but rather a table containing the relationship 
between the KUs. These would be something we would have to define and fill 
ourselves I'm just curous about the performance issue of having 1 or 30 
tables versus the administration of 30 tables. Is that really an issue. 

Based on the discusion following this part, I think only a single table will 
be needed since MySQL should be able to handle the amount of data we are 
dealing with.

Regards,

jimmo
-- 
---------------------------------------
"Be more concerned with your character than with your reputation. Your
character is what you really are while your reputation is merely what others
think you are." -- John Wooden
---------------------------------------
Be sure to visit the Linux Tutorial:  http://www.linux-tutorial.info