[Linkbat-devel] Re: question on moreinfo.data (Everyone please read)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

(Note this was sent to the list.)

Hey everyone, the conversion is almost done (well, at least the code for it). 
Thanks Luu! However, there are some important questions to answer NOW, before 
we continue. PLEASE, please, please read this and give me your input.

On Tuesday 19 November 2002 00:31, Luan Luu wrote:
> according the the moreinfo.data, the format:
> ID#:TYPE:DESCRIPTION:LOCATION
>
> the XML are
>
> <KnowledgeUnit>
> <Atrributes>
>   <Type sub-type="[TYPE]" location="[LOCATION]">MoreInfo</Type>
>   <Text>[DESCRIPTION]</Text>
> </Atrributes>
> </KnowledgeUnit>
>
> In the reference to the brackets, is the pointer the the type, location,
> and description like that?

Perfect. The only question is whether we should actually do it that way or 
not. That is, should the sub-type and location be attributes within the 
<Type> tag or should they be seperate tags, i.e. <SubType>  <Location>?

By gut feeling is that they should be attributes within the  <Type> tag. They 
not necessarily attributes of KU, but rather provide additional info for the 
type. 

Comments anyone? This needs to be answered before we continue. 

> is the url be absolute path with the http infront right?

Yes. You will note that in the data file, they just begin with // and not 
http://. This was because I made an unwise decision to use the colon (:) as 
the field seperator. However, the colon appears frequently in Linux 
(especially with URLs) so it became a problem. 

I was going to change to something like a pipe (|) which comes less 
frequently. Regardless, we will have a problem since the odds are that 
whatever character we use, it will appear in text somewhere in the data. 

Obviously this is not a problem when we import directly into a database. 
However, as I said to Shanta, I don't have a problem with importing the files 
directly into a database for the first release. However, eventually I want 
the system to be independant of the data source (CSV, database)  and 
independant of the presentation (eXtropia, other portal).  Therefore, we need 
to consider a new seperator.

Suggestions?

> Inside the tyk xml question tags, there is a topicRef tag, which is the 
> reference of the PAGE_ID.  So, do you want to put the Page_id in the 
> topicRef tags or the actual topic name in the page.data ?

The TopicRef is a references to a topic, such as Administration, Networking, 
Security, etc. This is just text. There are PageRef tags and these contains 
the page *name* from page.data. However, I am nop longer sure we should do it 
that way (see below). This needs to be changed in tykToXML.pl. 

Keep in mind that the questions will exist only within a KU. This KU will have 
a primary page so we automatically have the primary page for the question. 
However, the question could reference multiple topics. 

We have a problem with some of the questions where an angle bracket is one of 
the answer ("What symbol is used to "pipe" two commands together?") This 
means we have two angle brackets together (<< or >>) which could confuse the 
XML parser. There are only a few and we can change them by hand. We just need 
to be aware of them. 

Also watch the format of the answers, even for the T/F questions:

        <Correct>
                <Text>T</Text>
                <Reason>why this answer is correct</Reason>
        </Correct>

not just 

<Correct>T</Correct>

I think that as much as possible it is better to have the same format for all 
types of questions.  More than likely, "fill in the blank" type questions 
won't have an <Incorrect> answer, but I still want to have a <Reason> tag to 
provide an explanation why the answer is correct. 

However, since we do not yet have the reasons, I think you should simply leave  
as <Reason></Reason>. I think if we leave the text as "put your reason why is 
correct/incorrect. " we might forget to change it and then displaying that 
text would look silly. If the <Reason> tag is empty, we can just ignore it. 
OR we could simply not include the <Reason> tag at this point. What do you 
think?

With the Glossary KUs please create a <GlossaryTerms> container with the 
GlossaryRefs to the other terms. These are the numbers at the end of each 
line in glossary.data. They are the ID numbers of the other glossary terms. 
Therefore, instead of reading in each line from glossary.data and processing 
it, you will need to read it all at once and put it into an array, then parse 
that array.

EVERYONE PLEASE READ AND COMMENT:
Currently the glossary.idx file contains a list of pages that contain each 
glossary item. This is created by an external script and is **not** done when 
the glossary item is loaded. That would take way too much time. The question 
is whether we should have PageRefs within the Glossary KU.

Personally, I do not think so. We can create the index of  glossary-page_id 
along  with everything else. If we include the page ID/page name within the 
Glossary KU and add a new glossary item, then we would need to go looking for 
all of the pages that have that glossary term. Obviously we need to search 
for the pages to add the <Glossary> tag within the page. However, I just see 
it as unnecessary work to add PageRefs withing the Glossary KU since we can 
create the index by other, more efficient means.

EVERYONE PLEASE READ AND COMMENT:
It just hit me that we might be building a trap for ourselves. If we use the 
full path instead of the ID number, we will have problems if we ever rename 
the file, move it to a different directory, etc. I **expect** to be moving 
files to different directories real soon! I want to change the order of the 
files and their locations. As we get more content, I can imagine that we 
change locations again. 

I see three options:

Use the full-path as the PageRef:
- Easy to find/insert the reference we want
- Tracking down the actual page from a KU is easy
- In the display code we don't need to do a look up to display the page.
- PROBLEM: moving/renaming the file. Since the XML files are text, we can use 
sed/perl to make a global change.

Use just the page name without the path:
- Once the file name is defined, it is less likely that the page name will be 
changed. 
PROBLEM: We must have *completely unique* page names. We cannot have a "Known 
Problems" in the Network section AND in the Printing section. They must be 
named "Known Problems-Network"  and "Known Problems-Printing" (or something 
like that). 

Use the ID as the PageRef:
- Remains constant, independant of the actual name of the file.
- PROBLEM: Need to do a look-up to find the correct file. However, since 
page.data is current sorted by chapter/section, I have found that  it is not 
al that hard. For the existing moreinfo and DYK entries one PageRef can be 
inserted automatically. Still, if we want to include more PageRefs, we will 
hve to do it by hand and look up the ID, but we will have to look it up any 
way to get the full path. So whether we lookup and insert the page name or 
the page ID it's the same amount of work.

I still like the idea of using the full-path and NOT and ID number. You need 
to do a look-up anyway to find the ID or the correct text for the full path. 
Tracking down the original page from the XML file is straight forward. Making 
a change would be a simple matter of running a sed/perl script. We could even 
write it in advance and it becomes a part of our "utility" package:

rename_page.pl [-f filename] original_name new_name

It then scans all PageRefs in the named file and changes them accordingly. 

Using an ID number bothers me because makes the construct dependant on an 
external file or we are imposing a structure on it unnecessarily. Therefore, 
the knowledge base is not self-contained. 

EVERYONE PLEASE READ AND COMMENT:
We have a similar problem with the MoreInfoRefs for the Page KUs. Currently 
they are referenced by their ID number and Luu did the same thing in her 
code. However, once again, I am not happy with idea of using ID numbers 
instead of text. So, do we reference the text of the MoreInfo KUs??

I have pretty much decided to go through the existing data files and add up to 
three topics. I will add these to the **end** of each line for all of the 
data files. So, Luu, could you change the code to create a <Topics> container 
and <TopicRefs> for all of the data files? Note that I will probably not list 
three topics for everything. Therefore, the code will need to be smart enough 
to recognized this. Since you are probably asleep already I can work on it 
today and send you at least one file with the topics, so you will see the 
format.

Regards,

jimmo
-- 
---------------------------------------
"Be more concerned with your character than with your reputation. Your
character is what you really are while your reputation is merely what others
think you are." -- John Wooden
---------------------------------------
Be sure to visit the Linux Tutorial:  http://www.linux-tutorial.info