Re: [Linkbat-devel] Re: question on moreinfo.data (Everyone please read)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Ps I have already added a Category field in anticipation of Jim's idea
of giving the user more questions in a category if they make errors. 

On Wed, 2002-11-20 at 12:18, Shanta McBain wrote:
> On Wed, 2002-11-20 at 12:05, Luan Luu wrote:
> > Hi All,
> > 
> > The seperator ":" is good for now in my opinion, since the other characters, 
> > we will also have in some of the other data files.
> > 
> 
> > The reason tag should be there and stay empty, not removed, because when you 
> > extract XML to DB, i guess we expect a REASON tag to be there.
> > 
> > Is it possible the tyk.data file be modified a little bit?  Like you 
> > mentioned, there are a couple of character in there which can not be convert 
> > into valid XML format, such as the alone '&' need to change to '&amp;', in 
> > the line number 45.  The othere '&' in the data file was already in 
> > converted format, so it is alright. only the previous one mentioned causes 
> > problem. Also, I tried to convert all the < and > into the '&lt;' and '&gt;' 
> > respectively.  Is that ok?
> > 
> 
> This file is all ready converted to MySQL.
> Add what ever field you like.
> to see go to my devel site
> 
> http://webcthelpdesk.com/cgi-bin/Linkbat/linkbat.cgi?site=Linkbat
> Create an account and log in to see the link to the table.
> 
> 
> > You mentioned the Topics and TopicRef tag should be insert into all the data 
> > file?  Where do you think it should be?
> > 
> > For the new dyk.data.NEW you sent me, it would produce this.
> > 
> > <KnowledgeUnit>
> >    <Attributes>
> >       <Type>Concept</Type>
> >       <Text>Linux can be started from any partition.</Text>
> >    </Attributes>
> >    <Pages>
> >       <PageRef primary="true">106</PageRef>
> >    </Pages>
> >    <Questions>
> > 
> >    </Questions>
> > </KnowledgeUnit>
> > 
> > Where should the topics and topicref goes?
> > 
> > thanks.
> > 
> > Best Regards
> > -Luu
> > 
> > 
> > 
> > 
> > 
> > >From: James Mohr <lin...@ji...>
> > >Reply-To: lin...@ji...
> > >To: lin...@li...
> > >Subject: [Linkbat-devel] Re: question on moreinfo.data (Everyone please 
> > >read)
> > >Date: Tue, 19 Nov 2002 11:03:51 +0100
> > >
> > >(Note this was sent to the list.)
> > >
> > >Hey everyone, the conversion is almost done (well, at least the code for 
> > >it).
> > >Thanks Luu! However, there are some important questions to answer NOW, 
> > >before
> > >we continue. PLEASE, please, please read this and give me your input.
> > >
> > >On Tuesday 19 November 2002 00:31, Luan Luu wrote:
> > > > according the the moreinfo.data, the format:
> > > > ID#:TYPE:DESCRIPTION:LOCATION
> > > >
> > > > the XML are
> > > >
> > > > <KnowledgeUnit>
> > > > <Atrributes>
> > > >   <Type sub-type="[TYPE]" location="[LOCATION]">MoreInfo</Type>
> > > >   <Text>[DESCRIPTION]</Text>
> > > > </Atrributes>
> > > > </KnowledgeUnit>
> > > >
> > > > In the reference to the brackets, is the pointer the the type, location,
> > > > and description like that?
> > >
> > >Perfect. The only question is whether we should actually do it that way or
> > >not. That is, should the sub-type and location be attributes within the
> > ><Type> tag or should they be seperate tags, i.e. <SubType>  <Location>?
> > >
> > >By gut feeling is that they should be attributes within the  <Type> tag. 
> > >They
> > >not necessarily attributes of KU, but rather provide additional info for 
> > >the
> > >type.
> > >
> > >Comments anyone? This needs to be answered before we continue.
> > >
> > > > is the url be absolute path with the http infront right?
> > >
> > >Yes. You will note that in the data file, they just begin with // and not
> > >http://. This was because I made an unwise decision to use the colon (:) as
> > >the field seperator. However, the colon appears frequently in Linux
> > >(especially with URLs) so it became a problem.
> > >
> > >I was going to change to something like a pipe (|) which comes less
> > >frequently. Regardless, we will have a problem since the odds are that
> > >whatever character we use, it will appear in text somewhere in the data.
> > >
> > >Obviously this is not a problem when we import directly into a database.
> > >However, as I said to Shanta, I don't have a problem with importing the 
> > >files
> > >directly into a database for the first release. However, eventually I want
> > >the system to be independant of the data source (CSV, database)  and
> > >independant of the presentation (eXtropia, other portal).  Therefore, we 
> > >need
> > >to consider a new seperator.
> > >
> > >Suggestions?
> > >
> > > > Inside the tyk xml question tags, there is a topicRef tag, which is the
> > > > reference of the PAGE_ID.  So, do you want to put the Page_id in the
> > > > topicRef tags or the actual topic name in the page.data ?
> > >
> > >The TopicRef is a references to a topic, such as Administration, 
> > >Networking,
> > >Security, etc. This is just text. There are PageRef tags and these contains
> > >the page *name* from page.data. However, I am nop longer sure we should do 
> > >it
> > >that way (see below). This needs to be changed in tykToXML.pl.
> > >
> > >Keep in mind that the questions will exist only within a KU. This KU will 
> > >have
> > >a primary page so we automatically have the primary page for the question.
> > >However, the question could reference multiple topics.
> > >
> > >We have a problem with some of the questions where an angle bracket is one 
> > >of
> > >the answer ("What symbol is used to "pipe" two commands together?") This
> > >means we have two angle brackets together (<< or >>) which could confuse 
> > >the
> > >XML parser. There are only a few and we can change them by hand. We just 
> > >need
> > >to be aware of them.
> > >
> > >Also watch the format of the answers, even for the T/F questions:
> > >
> > >         <Correct>
> > >                 <Text>T</Text>
> > >                 <Reason>why this answer is correct</Reason>
> > >         </Correct>
> > >
> > >not just
> > >
> > ><Correct>T</Correct>
> > >
> > >I think that as much as possible it is better to have the same format for 
> > >all
> > >types of questions.  More than likely, "fill in the blank" type questions
> > >won't have an <Incorrect> answer, but I still want to have a <Reason> tag 
> > >to
> > >provide an explanation why the answer is correct.
> > >
> > >However, since we do not yet have the reasons, I think you should simply 
> > >leave
> > >as <Reason></Reason>. I think if we leave the text as "put your reason why 
> > >is
> > >correct/incorrect. " we might forget to change it and then displaying that
> > >text would look silly. If the <Reason> tag is empty, we can just ignore it.
> > >OR we could simply not include the <Reason> tag at this point. What do you
> > >think?
> > >
> > >With the Glossary KUs please create a <GlossaryTerms> container with the
> > >GlossaryRefs to the other terms. These are the numbers at the end of each
> > >line in glossary.data. They are the ID numbers of the other glossary terms.
> > >Therefore, instead of reading in each line from glossary.data and 
> > >processing
> > >it, you will need to read it all at once and put it into an array, then 
> > >parse
> > >that array.
> > >
> > >EVERYONE PLEASE READ AND COMMENT:
> > >Currently the glossary.idx file contains a list of pages that contain each
> > >glossary item. This is created by an external script and is **not** done 
> > >when
> > >the glossary item is loaded. That would take way too much time. The 
> > >question
> > >is whether we should have PageRefs within the Glossary KU.
> > >
> > >Personally, I do not think so. We can create the index of  glossary-page_id
> > >along  with everything else. If we include the page ID/page name within the
> > >Glossary KU and add a new glossary item, then we would need to go looking 
> > >for
> > >all of the pages that have that glossary term. Obviously we need to search
> > >for the pages to add the <Glossary> tag within the page. However, I just 
> > >see
> > >it as unnecessary work to add PageRefs withing the Glossary KU since we can
> > >create the index by other, more efficient means.
> > >
> > >
> > >EVERYONE PLEASE READ AND COMMENT:
> > >It just hit me that we might be building a trap for ourselves. If we use 
> > >the
> > >full path instead of the ID number, we will have problems if we ever rename
> > >the file, move it to a different directory, etc. I **expect** to be moving
> > >files to different directories real soon! I want to change the order of the
> > >files and their locations. As we get more content, I can imagine that we
> > >change locations again.
> > >
> > >I see three options:
> > >
> > >Use the full-path as the PageRef:
> > >- Easy to find/insert the reference we want
> > >- Tracking down the actual page from a KU is easy
> > >- In the display code we don't need to do a look up to display the page.
> > >- PROBLEM: moving/renaming the file. Since the XML files are text, we can 
> > >use
> > >sed/perl to make a global change.
> > >
> > >Use just the page name without the path:
> > >- Once the file name is defined, it is less likely that the page name will 
> > >be
> > >changed.
> > >PROBLEM: We must have *completely unique* page names. We cannot have a 
> > >"Known
> > >Problems" in the Network section AND in the Printing section. They must be
> > >named "Known Problems-Network"  and "Known Problems-Printing" (or something
> > >like that).
> > >
> > >Use the ID as the PageRef:
> > >- Remains constant, independant of the actual name of the file.
> > >- PROBLEM: Need to do a look-up to find the correct file. However, since
> > >page.data is current sorted by chapter/section, I have found that  it is 
> > >not
> > >al that hard. For the existing moreinfo and DYK entries one PageRef can be
> > >inserted automatically. Still, if we want to include more PageRefs, we will
> > >hve to do it by hand and look up the ID, but we will have to look it up any
> > >way to get the full path. So whether we lookup and insert the page name or
> > >the page ID it's the same amount of work.
> > >
> > >I still like the idea of using the full-path and NOT and ID number. You 
> > >need
> > >to do a look-up anyway to find the ID or the correct text for the full 
> > >path.
> > >Tracking down the original page from the XML file is straight forward. 
> > >Making
> > >a change would be a simple matter of running a sed/perl script. We could 
> > >even
> > >write it in advance and it becomes a part of our "utility" package:
> > >
> > >rename_page.pl [-f filename] original_name new_name
> > >
> > >It then scans all PageRefs in the named file and changes them accordingly.
> > >
> > >Using an ID number bothers me because makes the construct dependant on an
> > >external file or we are imposing a structure on it unnecessarily. 
> > >Therefore,
> > >the knowledge base is not self-contained.
> > >
> > >EVERYONE PLEASE READ AND COMMENT:
> > >We have a similar problem with the MoreInfoRefs for the Page KUs. Currently
> > >they are referenced by their ID number and Luu did the same thing in her
> > >code. However, once again, I am not happy with idea of using ID numbers
> > >instead of text. So, do we reference the text of the MoreInfo KUs??
> > >
> > >I have pretty much decided to go through the existing data files and add up 
> > >to
> > >three topics. I will add these to the **end** of each line for all of the
> > >data files. So, Luu, could you change the code to create a <Topics> 
> > >container
> > >and <TopicRefs> for all of the data files? Note that I will probably not 
> > >list
> > >three topics for everything. Therefore, the code will need to be smart 
> > >enough
> > >to recognized this. Since you are probably asleep already I can work on it
> > >today and send you at least one file with the topics, so you will see the
> > >format.
> > >
> > >Regards,
> > >
> > >jimmo
> > >--
> > >---------------------------------------
> > >"Be more concerned with your character than with your reputation. Your
> > >character is what you really are while your reputation is merely what 
> > >others
> > >think you are." -- John Wooden
> > >---------------------------------------
> > >Be sure to visit the Linux Tutorial:  http://www.linux-tutorial.info
> > >
> > >
> > >-------------------------------------------------------
> > >This sf.net email is sponsored by: To learn the basics of securing
> > >your web site with SSL, click here to get a FREE TRIAL of a Thawte
> > >Server Certificate: http://www.gothawte.com/rd524.html
> > >_______________________________________________
> > >Linkbat-devel mailing list
> > >Lin...@li...
> > >https://lists.sourceforge.net/lists/listinfo/linkbat-devel
> > 
> > 
> > _________________________________________________________________
> > Add photos to your e-mail with MSN 8. Get 2 months FREE*. 
> > http://join.msn.com/?page=features/featuredemail
> > 
> > 
> > 
> > -------------------------------------------------------
> > This sf.net email is sponsored by: 
> > Battle your brains against the best in the Thawte Crypto 
> > Challenge. Be the first to crack the code - register now: 
> > http://www.gothawte.com/rd521.html
> > _______________________________________________
> > Linkbat-devel mailing list
> > Lin...@li...
> > https://lists.sourceforge.net/lists/listinfo/linkbat-devel
> -- 
> Shanta McBain <sh...@fo...>
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by: 
> Battle your brains against the best in the Thawte Crypto 
> Challenge. Be the first to crack the code - register now: 
> http://www.gothawte.com/rd521.html
> _______________________________________________
> Linkbat-devel mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linkbat-devel
-- 
Shanta McBain <sh...@fo...>