Thread: [Linkbat-devel] Linkbat Dataflow
Brought to you by:
jimmo
From: James M. <lin...@ji...> - 2002-11-24 15:32:28
Attachments:
dataflow.png
|
Hi All! I have included a diagram of how I envision the dataflow within linkbat. It is also available online (linkbat.sourceforge.net/dataflow.html). At the bottom we have the data management layer, which, as it's name implies, is where the management of the data occurs. That is new data, new KUs, new KU types, changes etc are inserted into the system at this level. Above that is the data access layer, which provides the data to the presentation layer. Between the data management and data access layers are the perl routines that read the XML files and convert the data into the appropriate files. Note that there are two lines from the XML files into the data access layer, one into CSV and one into SQL. My ultimate goal is to give the user a choice as to which format the data should be used. My perception is that it will be unavoidable to store the data in intermediate files during the conversion, or at the very least the data will be in a form in memory that allows for easy conversion to CSV. That is, the KUs and reference indexes will probably be stored in an array and it would be a simple matter of looping through the arrays and then writing out the data to text files. However, instead of writing the data to text files, it could simply be inserted into the database tables. Therefore, I could imagine a command line options that determines where the data is writen CSV file or DB. One important aspect of the code is adding data to the system. Once it is in the database, then adding new KUs becomes becomes a key issue. Each KU will need to have a unique ID, which I think should be added during the conversion to CSV/DB and not stored within the XML file. This will be used for the references between KUs. A see a problem there when we add data to the system since the new data will not have any knowledge of the KU IDs. I see one solution as a table that contrains the relationship between KU text and KU ID, which probably should also contain the KU type. So each row looks like this: KUID:KU_TEXT_FROM_XML_FILE:KU_TYPE When a new KU is added we can reference existing KUs using this table. However, how do we add the references from existing KUs to the new ones. We cannot simply add them to the existing database as the will not be in the original XML files. Therefore, I see the only way to add KUs is to redo the **complete** data import each time we add a new a KU. Comments? Ideas? In the transition from data access to presentation I see an "issue" with the different data sources. I would like to see the code that the presentation layer uses to be **independant** of the data source. That is, the presenation layer makes a request of the data access layer to deliver a specific piece or set of data and the data access layer takes care of the rest. For example, the presentation layer would call functions similar to this: get_single_content_ku(KU_UD) get_all_moreinfo_ku(TOPIC) get_all_related_moreinfo_ku(KU_ID) In each case the presentation layer wants a piece or set of data and asks the data access layer to deliver it. It is then the responsibility of the data access layer to determine if the data source and the proper method to access that data source in order to deliver the requested data. By standardizing the interface, I see it a lot easier to have different delivery methods. Each can load the necessary perl modules (for example) and simple call the appropriate functions. How the delivery process interacts with the web server is dependant on the delivery and web server. Comments? Ideas? Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info |
From: Hal F. G. <hgo...@pr...> - 2002-11-25 03:03:38
Attachments:
Hal F Gottfried (hgottfried@protechpts.com).vcf
|
I have been on the road, I'll read through all the emails this week. What are my tasks? -----Original Message----- From: lin...@li... [mailto:lin...@li...] On Behalf Of James Mohr Sent: Sunday, November 24, 2002 12:02 PM To: lin...@li... Subject: [Linkbat-devel] Linkbat Dataflow Hi All! I have included a diagram of how I envision the dataflow within linkbat. It is also available online (linkbat.sourceforge.net/dataflow.html). At the bottom we have the data management layer, which, as it's name implies, is where the management of the data occurs. That is new data, new KUs, new KU types, changes etc are inserted into the system at this level. Above that is the data access layer, which provides the data to the presentation layer. Between the data management and data access layers are the perl routines that read the XML files and convert the data into the appropriate files. Note that there are two lines from the XML files into the data access layer, one into CSV and one into SQL. My ultimate goal is to give the user a choice as to which format the data should be used. My perception is that it will be unavoidable to store the data in intermediate files during the conversion, or at the very least the data will be in a form in memory that allows for easy conversion to CSV. That is, the KUs and reference indexes will probably be stored in an array and it would be a simple matter of looping through the arrays and then writing out the data to text files. However, instead of writing the data to text files, it could simply be inserted into the database tables. Therefore, I could imagine a command line options that determines where the data is writen CSV file or DB. One important aspect of the code is adding data to the system. Once it is in the database, then adding new KUs becomes becomes a key issue. Each KU will need to have a unique ID, which I think should be added during the conversion to CSV/DB and not stored within the XML file. This will be used for the references between KUs. A see a problem there when we add data to the system since the new data will not have any knowledge of the KU IDs. I see one solution as a table that contrains the relationship between KU text and KU ID, which probably should also contain the KU type. So each row looks like this: KUID:KU_TEXT_FROM_XML_FILE:KU_TYPE When a new KU is added we can reference existing KUs using this table. However, how do we add the references from existing KUs to the new ones. We cannot simply add them to the existing database as the will not be in the original XML files. Therefore, I see the only way to add KUs is to redo the **complete** data import each time we add a new a KU. Comments? Ideas? In the transition from data access to presentation I see an "issue" with the different data sources. I would like to see the code that the presentation layer uses to be **independant** of the data source. That is, the presenation layer makes a request of the data access layer to deliver a specific piece or set of data and the data access layer takes care of the rest. For example, the presentation layer would call functions similar to this: get_single_content_ku(KU_UD) get_all_moreinfo_ku(TOPIC) get_all_related_moreinfo_ku(KU_ID) In each case the presentation layer wants a piece or set of data and asks the data access layer to deliver it. It is then the responsibility of the data access layer to determine if the data source and the proper method to access that data source in order to deliver the requested data. By standardizing the interface, I see it a lot easier to have different delivery methods. Each can load the necessary perl modules (for example) and simple call the appropriate functions. How the delivery process interacts with the web server is dependant on the delivery and web server. Comments? Ideas? Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002 |
From: James M. <lin...@ji...> - 2002-11-26 20:51:02
|
Hi All! Sorry, I'm doing it again. I forget to keep a discussion with Hal Gottfried on the list. Hal and I have been discussing the XML implementation in the data files and he has said he can write the DTD when we get the model finalized. So, to sum up our discussions, we both seem to agree that accessing the data will be speeded up by a conversion to either CSV or a database. Hal made a couple of comments that me think that we could actually create a lot of the pages in advanced and not at run-time, thus speeding up things even more so. Since the data is static, once copied to the server, there is no reason to parse most of it on the fly. For example: - Content pages. They are parsed at runtime to create the links and popups for the glossary, links to other tutorial pages, etc. - Glossary pages. Created completely at run time and include links to other glossary terms and pages that references this glossary term. - MoreInfo pages. Created completely at run time and include links to other sites. I see no reason why these cannot be created in advance. We would just need to make sure that whatever display mechanism we use knows how to load the correct page. To me that's a heck of a lot easier than creating the pages on the fly. This brings up a **huge** question, which is particularly of interest to Shanta. If we create all of the pages in advance, there does to seem to be a need a pressing need for an SQL Database. The reason for it was the efficiency during data access. So, the question is whether we gain anything by creating one. In my mind, it doesn't matter if the pre-processing takes longer as it would with an SQL DB. How do the rest of you see this? Hal suggested including the PageID as an attriube to the <Type> tag and we were both thinking that we could the sub-type as an attribute of the <Type> tag. <Type id="45">Page</Type> <Type url="http://www.linux.org">MoreInfo</Type> <Type subtype="verb">Glossary</Type> This makes sense as the sub-type is a characteristic of the type and not the actual KU. For the questions, we have the issue of an empty <Reason> tag as a reminder to add something. It might be simpler to have the text of the Reason an attribute of the tag, like this: <reason answer="why this is correct" /> If it's empty we don't carry arounf the extra baggage. Did I forget anything important, Hal? Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info |
From: Hal F. G. <hgo...@pr...> - 2002-11-26 21:07:39
|
Nope James, sounds good to me, I think you covered everything. I'm also reviewing all of the other mails that where sent to see if there is anything that needs my attention. BTW: I am Hal, an XML instructor with a IT training company. -----Original Message----- From: lin...@li... [mailto:lin...@li...] On Behalf Of James Mohr Sent: Tuesday, November 26, 2002 5:20 PM To: lin...@li... Subject: Re: [Linkbat-devel] Linkbat Dataflow Hi All! Sorry, I'm doing it again. I forget to keep a discussion with Hal Gottfried on the list. Hal and I have been discussing the XML implementation in the data files and he has said he can write the DTD when we get the model finalized. So, to sum up our discussions, we both seem to agree that accessing the data will be speeded up by a conversion to either CSV or a database. Hal made a couple of comments that me think that we could actually create a lot of the pages in advanced and not at run-time, thus speeding up things even more so. Since the data is static, once copied to the server, there is no reason to parse most of it on the fly. For example: - Content pages. They are parsed at runtime to create the links and popups for the glossary, links to other tutorial pages, etc. - Glossary pages. Created completely at run time and include links to other glossary terms and pages that references this glossary term. - MoreInfo pages. Created completely at run time and include links to other sites. I see no reason why these cannot be created in advance. We would just need to make sure that whatever display mechanism we use knows how to load the correct page. To me that's a heck of a lot easier than creating the pages on the fly. This brings up a **huge** question, which is particularly of interest to Shanta. If we create all of the pages in advance, there does to seem to be a need a pressing need for an SQL Database. The reason for it was the efficiency during data access. So, the question is whether we gain anything by creating one. In my mind, it doesn't matter if the pre-processing takes longer as it would with an SQL DB. How do the rest of you see this? Hal suggested including the PageID as an attriube to the <Type> tag and we were both thinking that we could the sub-type as an attribute of the <Type> tag. <Type id="45">Page</Type> <Type url="http://www.linux.org">MoreInfo</Type> <Type subtype="verb">Glossary</Type> This makes sense as the sub-type is a characteristic of the type and not the actual KU. For the questions, we have the issue of an empty <Reason> tag as a reminder to add something. It might be simpler to have the text of the Reason an attribute of the tag, like this: <reason answer="why this is correct" /> If it's empty we don't carry arounf the extra baggage. Did I forget anything important, Hal? Regards, jimmo -- --------------------------------------- "Be more concerned with your character than with your reputation. Your character is what you really are while your reputation is merely what others think you are." -- John Wooden --------------------------------------- Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ Linkbat-devel mailing list Lin...@li... https://lists.sourceforge.net/lists/listinfo/linkbat-devel --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002 |