RE: [Linkbat-devel] Linkbat Dataflow
Brought to you by:
jimmo
From: Hal F. G. <hgo...@pr...> - 2002-11-25 03:03:38
|
I have been on the road, I'll read through all the emails this week. What are my tasks? -----Original Message----- From: lin...@li... [mailto:lin...@li...] On Behalf Of James Mohr Sent: Sunday, November 24, 2002 12:02 PM To: lin...@li... Subject: [Linkbat-devel] Linkbat Dataflow Hi All! I have included a diagram of how I envision the dataflow within linkbat. It is also available online (linkbat.sourceforge.net/dataflow.html). At the bottom we have the data management layer, which, as it's name implies, is where the management of the data occurs. That is new data, new KUs, new KU types, changes etc are inserted into the system at this level. Above that is the data access layer, which provides the data to the presentation layer. Between the data management and data access layers are the perl routines that read the XML files and convert the data into the appropriate files. Note that there are two lines from the XML files into the data access layer, one into CSV and one into SQL. My ultimate goal is to give the user a choice as to which format the data should be used. My perception is that it will be unavoidable to store the data in intermediate files during the conversion, or at the very least the data will be in a form in memory that allows for easy conversion to CSV. That is, the KUs and reference indexes will probably be stored in an array and it would be a simple matter of looping through the arrays and then writing out the data to text files. However, instead of writing the data to text files, it could simply be inserted into the database tables. Therefore, I could imagine a command line options that determines where the data is writen CSV file or DB. One important aspect of the code is adding data to the system. Once it is in the database, then adding new KUs becomes becomes a key issue. Each KU will need to have a unique ID, which I think should be added during the conversion to CSV/DB and not stored within the XML file. This will be used for the references between KUs. A see a problem there when we add data to the system since the new data will not have any knowledge of the KU IDs. I see one solution as a table that contrains the relationship between KU text and KU ID, which probably should also contain the KU type. So each row looks like this: KUID:KU_TEXT_FROM_XML_FILE:KU_TYPE When a new KU is added we can reference existing KUs using this table. However, how do we add the references from existing KUs to the new ones. We cannot simply add them to the existing database as the will not be in the original XML files. Therefore, I see the only way to add KUs is to redo the **complete** data import each time we add a new a KU. Comments? Ideas? In the transition from data access to presentation I see an "issue" with the different data sources. I would like to see the code that the presentation layer uses to be **independant** of the data source. That is, the presenation layer makes a request of the data access layer to deliver a specific piece or set of data and the data access layer takes care of the rest. For example, the presentation layer would call functions similar to this: get_single_content_ku(KU_UD) get_all_moreinfo_ku(TOPIC) get_all_related_moreinfo_ku(KU_ID) In each case the presentation layer wants a piece or set of data and asks the data access layer to deliver it. It is then the responsibility of the data access layer to determine if the data source and the proper method to access that data source in order to deliver the requested data. By standardizing the interface, I see it a lot easier to have different delivery methods. Each can load the necessary perl modules (for example) and simple call the appropriate functions. How the delivery process interacts with the web server is dependant on the delivery and web server. Comments? Ideas? Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002 |