[Linkbat-devel] Re: questions (CSV to XML conversion)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Luan!

(for the rest of you, please take a look at this as well, I would like some 
feedback).

Wow! That was quick. 

First, note that I replied to the linkbat-devel mailing list. I think all of 
these discussions should be on the list.   The rest of the comments are 
below.

On Monday 18 November 2002 21:20, Luan Luu wrote:
> 1.  Do you want the code automatic create the output file, or we could
> manually in the command line to concatenate to new xml file?

I think you should just write to standard output. This gives us greater 
flexability to specify any file name we choose. 

> 2.  Only one top level is allowed, so, i put another <KUs> tag wrap around
> the <knowledgeUnit> tags. is that ok?

In the specification, I had defined the top-level as being <KnowledgeUnits> 
(with the 's' at the end). In general, I had it so that the container were 
all plurals like <KnowledgeUnits>  and <MoreInfos>. 

> check the first one out, see if it is correct. let me know. thanks.

So far that looks great!!! However, a couple of things. The  <PageRef> should 
contain the file name out of the page.data file. These numbers will more or 
less disappear once the conversion is made. Also, please put in an line 
between the KnowledgeUnits.

What I was thinking about was going through the original data files and add 
topics. It would be fairly straight forward to define a handful of topics 
(administration, network, security, users, hardware, etc.) and add them to 
the CSV files. That would mean an extra field to parse, but the code is 
pretty much written. In fact, I could add multiple topics and if the field is 
empty, just don't print a topic tag. How does that sound?

I am thinking that we could save a fair bit of time that way. If at least one 
(or maybe even 2 or 3) topics are already present and can be added 
automatically, then we don't need to do it by hand. Granted, sub-topics will 
probably need to be added, later. But we have saved some work. In the CVS 
state right now, it is alot easier to add information in bulk. 

Comments, anyone? Are there other references that we could add in bulk now?

Regards,

Jim
=================
NOTE: I snipped most of the stuff. This is just an example to see the results 
and the original file. 

 <KnowledgeUnit>
   <Attributes>
      <Type>Concept</Type>
      <Text>Linux can be started from any partition.</Text>
   </Attributes>
   <Pages>
      <PageRef primary="true">106</PageRef>
   </Pages>
   <Questions>

   </Questions>
 </KnowledgeUnit>
 <KnowledgeUnit>
   <Attributes>
      <Type>Concept</Type>
      <Text>Linux can combine multiple drives into a single RAID system, even 
if the drives are of different types.</Text>
   </Attributes>
   <Pages>
      <PageRef primary="true">127</PageRef>
   </Pages>
   <Questions>

   </Questions>
 </KnowledgeUnit>
 <KnowledgeUnit>
   <Attributes>
      <Type>Concept</Type>
      <Text>Unwanted cron output can be redirected just like any other 
command.</Text>
   </Attributes>
   <Pages>
      <PageRef primary="true">21</PageRef>
   </Pages>
   <Questions>

   </Questions>
 </KnowledgeUnit>
 <KnowledgeUnit>
   <Attributes>
      <Type>Concept</Type>
      <Text>Cron is started through an rc-script like most system daemons. 
</Text>
   </Attributes>
   <Pages>
      <PageRef primary="true">67</PageRef>
   </Pages>
   <Questions>

   </Questions>
 </KnowledgeUnit>
 <KnowledgeUnit>
   <Attributes>
      <Type>Concept</Type>
      <Text>Cron can be disabled through the /etc/rc.config file.</Text>
   </Attributes>
   <Pages>
      <PageRef primary="true">67</PageRef>
   </Pages>
   <Questions>

1:106:Linux can be started from any partition.
2:127:Linux can combine multiple drives into a single RAID system, even if 
the drives are of different types.
3:21:Unwanted cron output can be redirected just like any other command.
4:67:Cron is started through an rc-script like most system daemons.
5:67:Cron can be disabled through the /etc/rc.config file.

#!perl
#!/usr/bin/perl
#convert the dyk.data into xml data.

open(DATA,"<dyk.data");

$TYPE = "Concept"; #hardcoded.
$TEXT = "put your text here";
$PAGE_ID = "your Page id from the file.";

        #required, only one top level is allowed.
        print "<KUs>\n";
while (<DATA>){
        chomp;
        $READLINE=$_;
        ($ID,$PAGE_ID,$TEXT) = split(/:/,$READLINE);
        #print the output...
        print " <KnowledgeUnit>\n";
        print "   <Attributes>\n";
        print "      <Type>$TYPE</Type>\n";
        print "      <Text>$TEXT</Text>\n";
        print "   </Attributes>\n";
        print "   <Pages>\n";
        print "      <PageRef primary=\"true\">$PAGE_ID</PageRef>\n";
        print "   </Pages>\n";
        print "   <Questions>\n";
        print "      \n"; #blank for now, insert the questions later.
        print "   </Questions>\n";
        print " </KnowledgeUnit>\n";
}
        print "</KUs>\n";
close DATA;