(Note this was sent to the list.)
Hey everyone, the conversion is almost done (well, at least the code for it).
Thanks Luu! However, there are some important questions to answer NOW, before
we continue. PLEASE, please, please read this and give me your input.
On Tuesday 19 November 2002 00:31, Luan Luu wrote:
> according the the moreinfo.data, the format:
> the XML are
> <Type sub-type="[TYPE]" location="[LOCATION]">MoreInfo</Type>
> In the reference to the brackets, is the pointer the the type, location,
> and description like that?
Perfect. The only question is whether we should actually do it that way or
not. That is, should the sub-type and location be attributes within the
<Type> tag or should they be seperate tags, i.e. <SubType> <Location>?
By gut feeling is that they should be attributes within the <Type> tag. They
not necessarily attributes of KU, but rather provide additional info for the
Comments anyone? This needs to be answered before we continue.
> is the url be absolute path with the http infront right?
Yes. You will note that in the data file, they just begin with // and not
http://. This was because I made an unwise decision to use the colon (:) as
the field seperator. However, the colon appears frequently in Linux
(especially with URLs) so it became a problem.
I was going to change to something like a pipe (|) which comes less
frequently. Regardless, we will have a problem since the odds are that
whatever character we use, it will appear in text somewhere in the data.
Obviously this is not a problem when we import directly into a database.
However, as I said to Shanta, I don't have a problem with importing the files
directly into a database for the first release. However, eventually I want
the system to be independant of the data source (CSV, database) and
independant of the presentation (eXtropia, other portal). Therefore, we need
to consider a new seperator.
> Inside the tyk xml question tags, there is a topicRef tag, which is the
> reference of the PAGE_ID. So, do you want to put the Page_id in the
> topicRef tags or the actual topic name in the page.data ?
The TopicRef is a references to a topic, such as Administration, Networking,
Security, etc. This is just text. There are PageRef tags and these contains
the page *name* from page.data. However, I am nop longer sure we should do it
that way (see below). This needs to be changed in tykToXML.pl.
Keep in mind that the questions will exist only within a KU. This KU will have
a primary page so we automatically have the primary page for the question.
However, the question could reference multiple topics.
We have a problem with some of the questions where an angle bracket is one of
the answer ("What symbol is used to "pipe" two commands together?") This
means we have two angle brackets together (<< or >>) which could confuse the
XML parser. There are only a few and we can change them by hand. We just need
to be aware of them.
Also watch the format of the answers, even for the T/F questions:
<Reason>why this answer is correct</Reason>
I think that as much as possible it is better to have the same format for all
types of questions. More than likely, "fill in the blank" type questions
won't have an <Incorrect> answer, but I still want to have a <Reason> tag to
provide an explanation why the answer is correct.
However, since we do not yet have the reasons, I think you should simply leave
as <Reason></Reason>. I think if we leave the text as "put your reason why is
correct/incorrect. " we might forget to change it and then displaying that
text would look silly. If the <Reason> tag is empty, we can just ignore it.
OR we could simply not include the <Reason> tag at this point. What do you
With the Glossary KUs please create a <GlossaryTerms> container with the
GlossaryRefs to the other terms. These are the numbers at the end of each
line in glossary.data. They are the ID numbers of the other glossary terms.
Therefore, instead of reading in each line from glossary.data and processing
it, you will need to read it all at once and put it into an array, then parse
EVERYONE PLEASE READ AND COMMENT:
Currently the glossary.idx file contains a list of pages that contain each
glossary item. This is created by an external script and is **not** done when
the glossary item is loaded. That would take way too much time. The question
is whether we should have PageRefs within the Glossary KU.
Personally, I do not think so. We can create the index of glossary-page_id
along with everything else. If we include the page ID/page name within the
Glossary KU and add a new glossary item, then we would need to go looking for
all of the pages that have that glossary term. Obviously we need to search
for the pages to add the <Glossary> tag within the page. However, I just see
it as unnecessary work to add PageRefs withing the Glossary KU since we can
create the index by other, more efficient means.
EVERYONE PLEASE READ AND COMMENT:
It just hit me that we might be building a trap for ourselves. If we use the
full path instead of the ID number, we will have problems if we ever rename
the file, move it to a different directory, etc. I **expect** to be moving
files to different directories real soon! I want to change the order of the
files and their locations. As we get more content, I can imagine that we
change locations again.
I see three options:
Use the full-path as the PageRef:
- Easy to find/insert the reference we want
- Tracking down the actual page from a KU is easy
- In the display code we don't need to do a look up to display the page.
- PROBLEM: moving/renaming the file. Since the XML files are text, we can use
sed/perl to make a global change.
Use just the page name without the path:
- Once the file name is defined, it is less likely that the page name will be
PROBLEM: We must have *completely unique* page names. We cannot have a "Known
Problems" in the Network section AND in the Printing section. They must be
named "Known Problems-Network" and "Known Problems-Printing" (or something
Use the ID as the PageRef:
- Remains constant, independant of the actual name of the file.
- PROBLEM: Need to do a look-up to find the correct file. However, since
page.data is current sorted by chapter/section, I have found that it is not
al that hard. For the existing moreinfo and DYK entries one PageRef can be
inserted automatically. Still, if we want to include more PageRefs, we will
hve to do it by hand and look up the ID, but we will have to look it up any
way to get the full path. So whether we lookup and insert the page name or
the page ID it's the same amount of work.
I still like the idea of using the full-path and NOT and ID number. You need
to do a look-up anyway to find the ID or the correct text for the full path.
Tracking down the original page from the XML file is straight forward. Making
a change would be a simple matter of running a sed/perl script. We could even
write it in advance and it becomes a part of our "utility" package:
rename_page.pl [-f filename] original_name new_name
It then scans all PageRefs in the named file and changes them accordingly.
Using an ID number bothers me because makes the construct dependant on an
external file or we are imposing a structure on it unnecessarily. Therefore,
the knowledge base is not self-contained.
EVERYONE PLEASE READ AND COMMENT:
We have a similar problem with the MoreInfoRefs for the Page KUs. Currently
they are referenced by their ID number and Luu did the same thing in her
code. However, once again, I am not happy with idea of using ID numbers
instead of text. So, do we reference the text of the MoreInfo KUs??
I have pretty much decided to go through the existing data files and add up to
three topics. I will add these to the **end** of each line for all of the
data files. So, Luu, could you change the code to create a <Topics> container
and <TopicRefs> for all of the data files? Note that I will probably not list
three topics for everything. Therefore, the code will need to be smart enough
to recognized this. Since you are probably asleep already I can work on it
today and send you at least one file with the topics, so you will see the
"Be more concerned with your character than with your reputation. Your
character is what you really are while your reputation is merely what others
think you are." -- John Wooden
Be sure to visit the Linux Tutorial: http://www.linux-tutorial.info