[Linkbat-devel] Data format, XML to DB conversion
Brought to you by:
jimmo
From: James M. <lin...@ji...> - 2002-11-20 13:33:16
|
Hi All! I think we have reached the point where we need to be talking about the actual data/table structure in CSV text or a database. For now let's just work on the database model, to make things simpler. However, we all need to keep in mind that I eventually want to have the system work with CSV text files. Rama has volunteered to work on the code to convert the XML files into something the database can handle. Shanta is working on the "presentation layer", so obviously you two will be working closely together. When the code is done for the CSV->XML conversion, I would like Luu to start working on the code to validate the XML files. That is, making sure the right tags are there, tags match, etc. Following are some thoughts and ideas about how the tables can be created. I have no particular affection for any of these ideas, so feel free to tell me they stink. The first thing we need to decided for the data model is how to store the KUs themselves. Since different KU types have different attributes, I do not see a way of storing them all in a single table. Instead, I would see one table for each KU type and then a central table that contains the unique KU ID and then the KU type. When making a cross reference from one KU to another, we would then go through this central table. Next is the inter-relationship between the KUs, which is a core aspect of linkbat. There is a potential that each KU (regardless of type) can reference every other type of KU. However, I cannot really imagine an index for each relationship. For example, the Concept KU has an index listing all of the Concept KUs and with pointers to all of the referenced Glossary KUs. Then another index with the MoreInfo KUs. In my mind that would mean too many tables. If there are five KU types and each has a index for the relationship to the others, then we would have 20 tables (5x4). With 6 KU tables, 30 tables (6x5). Is my math right? However, the question is whether 20 or 30 indexes is "too many". Does having seperate indexes for each relationship provide any advantages? Quicker access? If so, does it compensate for the extra work to manage 20-30 tables?? Alternatively, I could imagine one index that keeps tracks of the type of KUs, as well: UNIQUE_KU_ID:REFERENCED_TYPE:REFERENCED_KU_ID UNIQUE_KU_ID - The ID of the current KU REFERENCED_TYPE - The type of KU that this KU is referencing REFERENCED_KU_ID - The ID of that KU that is being refereced. A query could be done sorted by UNIQUE_KU_ID and then REFERENCED_TYPE. Once we have that, the code to present a list of referenced KUs grouped by KU type would be easy. Currently we are only dealing with a few thousands KUs. However, what about when we get to tens of thousands?? Hopefully MySQL should be in a position to deal with that **few** records. So, I would not see it as a problem to have all of the KUs referenced in a single table like that. Is the REFERENCED_TYPE even necessary here? We obviously need a table that contains the attributes of each KU and a unique ID, so the REFERENCED_KU_ID can be used to get the type through the central table that I mentioned above and we have the KU type. On the other hand, to get a list of all referenced KUs in the correct order we could have a query like this: select REFERENCED_KU_ID from WHATEVER_TABLE where UNIQUE_KU_ID = 'CURRENT_KU_ID" sort by REFERENCED_TYPE This is a single table access and is obviously faster than mutliple access. Therefore, I think this would more than make up for having an extra field in the index. Since we can expect that new data files will have mixed KU types, I am thinking that a single index which also contains the KU type would be more efficient. Each time we read a new KU, it is inserted into that single index, so we don't have to deal with mulitple indexes. Comments? Although not KUs in the true sense, I think we need an index by topic and skill level (if we decide to implement that). This would be used in generateing the technical FAQs or lists of all info on a particular topic. What about the reverse references. For example a list of all Concept KUs that reference a particular Glossary KU. This is not the same as the list of Glossary KUs that this Concept KU references. We click on a link that displays a Glossary KU, with a link to all the KUs that it references. I would also like links to all of the KUs that reference this glossary term. Although it seems logical that each Concept KU that references a particular Glossary KU should also be referenced by that Glossary KU, it is likely that we will forget the reverse references when create the XML files. Think about the glossary terms on the content pages. Each time we add a new page we have to update the referenced glossary KUs to point back to this page. I'm sure there is a lot more, but I should get this off to start the discussion going. Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info |