Re: [Tsaphan-developers] Verse Diff program

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Sorry the package mike is

import java.text.*;

For the StringTokenizer class..  but you probably knew that ;)

-P

Patrick Lacson wrote:
> 
> Nick/Mike,
> 
> Great thoughts guys!  I agree w/ you Nick, the Word of God is the Word
> of God.  A visual difference will be displayed however, perfection *is*
> the goal.
> 
> word ordering may increase the complexity of this algo since now we have
> more expression cases to handle.  However it is a very interesting to
> tackle.
> 
> Mike, try and come up with a simple implementation of the dual
> token-array algorithm and if you're not familiar with the tokenizer
> here's how to get it going..
> 
> import java.util.*;
> 
> public class VerseDiff {
>         public static void main(String[] args) {
>                 StringTokenizer st = new String("For God so loved the world");
>                 String[] attempt_arr = new String[st.countTokens()];
>                 for(int i=0; st.hasMoreElements(); i++) {
>                         attempt_arr[i] = (String)st.nextElement();
>                 }
> 
>                 // AGain tokenize the actual verse by querying the db
>                 StringTokenizer st_actual = new String(actual_verse_value);
> 
> 
>                 /**
>                   * Do comparison and spit out result in some format
>                   * Handle each case for matching
>                   */
> 
>                 // case 1
> 
>                 // case 2
> 
>                 // case 3
> 
>                 // repeat if neccessary (or recurse ;)
> 
>                 /**
>                   * return the actual score or percentage correct
>                   * with some debug info regarding what words were missed/incorrect
>                   */
> 
>         }
> }
> 
> Have fun!
> 
> -P
> 
> Nick Haight wrote:
> >
> > Some thoughts...
> >
> > Should there be a score or should it be pass/fail?  This is the Word of God,
> > you know.  Is anything less than perfect acceptable?
> >
> > How about a visual summary, showing the user's entry compared to the correct
> > verse.  All errors could be highlighted.  This might simplify the task of
> > specifically identifying what words were out of order, what words were
> > omitted, etc.
> >
> > ==========================================================================
> > Your entry:
> > All scripture is *insired* by God, * *profitable for teaching, *for
> > correction, for reproof*, for training in righteousness; so that the man of
> > God
> > may be adequate, equipped for every good work.
> >
> > Verse:
> > All scripture is inspired by God, and profitable for teaching, for reproof,
> > for correction, for training in righteousness; so that the man of God
> > may be adequate, equipped for every good work.
> >
> > ===========================================================================
> >
> > The highlights (represented by *) reveal that:
> > - "insired" is a spelling error.
> > - " " before profitable shows "and" was omitted.
> > - "profitable...reprof" is out of order.  "for training" is where the order
> > resumes correctly.
> >
> > Different colored highlights can be used, too.
> >
> > Does this help?
> >
> > Nick
> >
> > -----Original Message-----
> > From: Patrick Lacson [mailto:pa...@la...]
> > Sent: Wednesday, April 18, 2001 2:22 PM
> > To: Mike Lucas
> > Cc: htw-list
> > Subject: [Tsaphan-developers] Verse Diff program
> >
> > Mike,
> >
> > I'm ccing the htw list because we need everybody's feedback on this Diff
> > algorithm.  Also the basic requirement needs feedback:
> >
> > 1)  Compare 2 String types
> > 2)  Allow comparison preference level (check for punctuation,
> > CaPITilaZaTion, etc.. -- how accurate)
> > 3)  Compute the percentage based on how accurate/inaccurate the
> > attempted verse is vs. the actual verse
> >
> > So here's a suggested test case (for 2Tim 3:16-17)
> > ##################################################################
> > All Scripture is inspired by God and profitable for teaching,
> > for reproof, for correction, for training in righteousness; so
> > that the man of God may be adequate, equipped for every good work.
> > ##################################################################
> >
> > Attempted verse:
> > ----------------
> > All scripture is insired by God, profitable for teaching, for
> > correction,
> > for reproof, for training in righteousness; so that the man of God
> > may be adequate, equipped for every good work.
> >
> > RESPONSE:
> > ----------
> > a)  Mispelled -- "insired"
> > b)  Incorrect -- "God, profitable for teaching, for correction, for
> > reproof, "
> > c)  Score is 70%
> >
> > My basic approach would be to use 2 arrays and tokenize the 2 strings
> > into the arrays:
> >
> > actual_arr[0] = "All";
> > actual_arr[1] = "Scripture";
> > actual_arr[2] = "is";
> > actual_arr[3] = "inspired";
> > actual_arr[4] = "by"
> > ...
> >
> > attemp_arr[0] = "All";
> > attempt_arr[1] = "Scripture";
> > attempt_arr[2] = "is";
> > attempt_arr[3] = "insired";  // red flag here for misspelled
> > ...
> >
> > // continue to process
> >
> > Compare the 2 arrays (attempted/actual) for word matches, mispelled
> > words, punctionation marks.  So familiarize yourself with the
> > StringTokenizer class and the Diff algo in the jcore.utils.Diff
> > package.  This may not be the best way to do this, but this atleast
> > allows some easy answers right off the bat regarding the accuracy of
> > their verse attempt, missing words/punctuation (basically any token),
> > non-matching words.
> >
> > However the algo *may* get lost from a few missing words, so we have to
> > make it smarter in figuring out where the remaining words are.. this is
> > where the real challenge of this diff algo lies: maintaining context and
> > pattern matching (via regular expressions??).
> >
> > So think about this approach and let me know what pros/cons you see --
> > shoot your ideas out and let me know cuz I need as much feedback from
> > everybody on this as I can.. Here's some questions I had to ask myself
> > about this design:
> >
> > 1) does it make sense to use 2 token-arrays for comparison
> > 2) how do we maintain context if the 2nd array is missing words, how do
> > we catch up to the original
> > 3) Should we even do this approach?
> > 4) Are there other systems out there that have a text parser already
> > available that we can reuse
> > 5) How should we score the attempt?? amount of words, mispelled words,
> > missing words, etc..
> >
> > Sorry if I'm being a bit verbose, but I'm very excited that we have
> > another developer on the squad to help us out with this..  I'd like to
> > bounce all ideas to the list and get everybody involved developer or not
> > just to see if things are making sense -- I tend to think too short-term
> > and neglect long-term implications.
> >
> > -P
> >
> > Michael Lucas wrote:
> > >
> > > Pat,
> > >
> > >         I've done a limited amount of network programming and know nothing
> > > on threads.  I don't mind trying to work on it, if you are not too
> > > constrained on time while I learn about it.  Either project is good for
> > > me.
> > >
> > >                                         - Mike
> >
> > _______________________________________________
> > Tsaphan-developers mailing list
> > Tsa...@li...
> > http://lists.sourceforge.net/lists/listinfo/tsaphan-developers
> >
> > ***********************************************************************
> > This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they
> > are addressed. Any unauthorized review, use, disclosure or distribution
> > is prohibited. If you are not the intended recipient, please contact
> > the sender by reply e-mail and destroy all copies of the original
> > message.
> > ***********************************************************************
> >
> > _______________________________________________
> > Tsaphan-developers mailing list
> > Tsa...@li...
> > http://lists.sourceforge.net/lists/listinfo/tsaphan-developers
> 
> _______________________________________________
> Tsaphan-developers mailing list
> Tsa...@li...
> http://lists.sourceforge.net/lists/listinfo/tsaphan-developers