RE: [Linkbat-devel] Linkbat Dataflow

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I have been on the road, I'll read through all the emails this week.
What are my tasks?

-----Original Message-----
From: lin...@li...
[mailto:lin...@li...] On Behalf Of James
Mohr
Sent: Sunday, November 24, 2002 12:02 PM
To: lin...@li...
Subject: [Linkbat-devel] Linkbat Dataflow

Hi All!

I have included a diagram of how I envision the dataflow within linkbat.
It is 
also available online (linkbat.sourceforge.net/dataflow.html). 

At the bottom we have the data management layer, which, as it's name
implies, 
is where the management of the data occurs. That is new data, new KUs,
new KU 
types, changes etc are inserted into the system at this level. Above
that is 
the data access layer, which provides the data to the presentation
layer. 

Between the data management and data access layers are the perl routines
that 
read the XML files and convert the data into the appropriate files. Note
that 
there are two lines from the XML files into the data access layer, one
into 
CSV and one into SQL. My ultimate goal is to give the user a choice as
to 
which format the data should be used. 

My perception is that it will be unavoidable to store the data in
intermediate 
files during the conversion, or at the very least the data will be in a
form 
in memory that allows for easy conversion to CSV. That is, the KUs and 
reference indexes will probably be stored in an array and it would be a 
simple matter of looping through the arrays and then writing out the
data to 
text files. However, instead of writing the data to text files, it could

simply be inserted into the database tables. Therefore, I could imagine
a 
command line options that determines where the data is writen CSV file
or DB. 

One important aspect of the code is adding data to the system. Once it
is in 
the database, then adding new KUs becomes becomes a key issue. Each KU
will 
need to have a unique ID, which I think should be added during the
conversion 
to CSV/DB and not stored within the XML file. This will be used for the 
references between KUs. A see a problem there when we add data to the
system 
since the new data will not have any knowledge of the KU IDs. 

I see one solution as a table that contrains the relationship between KU
text 
and KU ID, which probably should also contain the KU type. So each row
looks 
like this:

KUID:KU_TEXT_FROM_XML_FILE:KU_TYPE

When a new KU is added we can reference existing KUs using this table. 
However, how do we add the references from existing KUs to the new ones.
We 
cannot simply add them to the existing database as the will not be in
the 
original XML files. Therefore, I see the only way to add KUs is to redo
the 
**complete** data import each time we add a new a KU.

Comments? Ideas?

In the transition from data access to presentation I see an "issue" with
the 
different data sources. I would like to see the code that the
presentation 
layer uses to be **independant** of the data source. That is, the
presenation 
layer makes a request of the data access layer to deliver a specific
piece or 
set of data and the data access layer takes care of the rest. For
example, 
the presentation layer would call functions similar to this:

get_single_content_ku(KU_UD)
get_all_moreinfo_ku(TOPIC)
get_all_related_moreinfo_ku(KU_ID)

In each case the presentation layer wants a piece or set of data and
asks the 
data access layer to deliver it. It is then the responsibility of the
data 
access layer to determine if the data source and the proper method to
access 
that data source in order to deliver the requested data. By
standardizing the 
interface, I see it a lot easier to have different delivery methods.
Each can 
load the necessary perl modules (for example) and simple call the
appropriate 
functions. How the delivery process interacts with the web server is 
dependant on the delivery and web server.

Comments? Ideas?

Regards,

jimmo
-- 
---------------------------------------
"Be more concerned with your character than with your reputation. Your
character is what you really are while your reputation is merely what
others think you are." -- John Wooden
---------------------------------------
Be sure to visit the Linux Tutorial:  http://www.linux-tutorial.info

---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.422 / Virus Database: 237 - Release Date: 11/20/2002