|
From: Patrick L. <pa...@la...> - 2001-04-18 23:17:48
|
Sorry the package mike is
import java.text.*;
For the StringTokenizer class.. but you probably knew that ;)
-P
Patrick Lacson wrote:
>
> Nick/Mike,
>
> Great thoughts guys! I agree w/ you Nick, the Word of God is the Word
> of God. A visual difference will be displayed however, perfection *is*
> the goal.
>
> word ordering may increase the complexity of this algo since now we have
> more expression cases to handle. However it is a very interesting to
> tackle.
>
> Mike, try and come up with a simple implementation of the dual
> token-array algorithm and if you're not familiar with the tokenizer
> here's how to get it going..
>
> import java.util.*;
>
> public class VerseDiff {
> public static void main(String[] args) {
> StringTokenizer st = new String("For God so loved the world");
> String[] attempt_arr = new String[st.countTokens()];
> for(int i=0; st.hasMoreElements(); i++) {
> attempt_arr[i] = (String)st.nextElement();
> }
>
> // AGain tokenize the actual verse by querying the db
> StringTokenizer st_actual = new String(actual_verse_value);
>
>
> /**
> * Do comparison and spit out result in some format
> * Handle each case for matching
> */
>
> // case 1
>
> // case 2
>
> // case 3
>
> // repeat if neccessary (or recurse ;)
>
> /**
> * return the actual score or percentage correct
> * with some debug info regarding what words were missed/incorrect
> */
>
> }
> }
>
> Have fun!
>
> -P
>
> Nick Haight wrote:
> >
> > Some thoughts...
> >
> > Should there be a score or should it be pass/fail? This is the Word of God,
> > you know. Is anything less than perfect acceptable?
> >
> > How about a visual summary, showing the user's entry compared to the correct
> > verse. All errors could be highlighted. This might simplify the task of
> > specifically identifying what words were out of order, what words were
> > omitted, etc.
> >
> > ==========================================================================
> > Your entry:
> > All scripture is *insired* by God, * *profitable for teaching, *for
> > correction, for reproof*, for training in righteousness; so that the man of
> > God
> > may be adequate, equipped for every good work.
> >
> > Verse:
> > All scripture is inspired by God, and profitable for teaching, for reproof,
> > for correction, for training in righteousness; so that the man of God
> > may be adequate, equipped for every good work.
> >
> > ===========================================================================
> >
> > The highlights (represented by *) reveal that:
> > - "insired" is a spelling error.
> > - " " before profitable shows "and" was omitted.
> > - "profitable...reprof" is out of order. "for training" is where the order
> > resumes correctly.
> >
> > Different colored highlights can be used, too.
> >
> > Does this help?
> >
> > Nick
> >
> > -----Original Message-----
> > From: Patrick Lacson [mailto:pa...@la...]
> > Sent: Wednesday, April 18, 2001 2:22 PM
> > To: Mike Lucas
> > Cc: htw-list
> > Subject: [Tsaphan-developers] Verse Diff program
> >
> > Mike,
> >
> > I'm ccing the htw list because we need everybody's feedback on this Diff
> > algorithm. Also the basic requirement needs feedback:
> >
> > 1) Compare 2 String types
> > 2) Allow comparison preference level (check for punctuation,
> > CaPITilaZaTion, etc.. -- how accurate)
> > 3) Compute the percentage based on how accurate/inaccurate the
> > attempted verse is vs. the actual verse
> >
> > So here's a suggested test case (for 2Tim 3:16-17)
> > ##################################################################
> > All Scripture is inspired by God and profitable for teaching,
> > for reproof, for correction, for training in righteousness; so
> > that the man of God may be adequate, equipped for every good work.
> > ##################################################################
> >
> > Attempted verse:
> > ----------------
> > All scripture is insired by God, profitable for teaching, for
> > correction,
> > for reproof, for training in righteousness; so that the man of God
> > may be adequate, equipped for every good work.
> >
> > RESPONSE:
> > ----------
> > a) Mispelled -- "insired"
> > b) Incorrect -- "God, profitable for teaching, for correction, for
> > reproof, "
> > c) Score is 70%
> >
> > My basic approach would be to use 2 arrays and tokenize the 2 strings
> > into the arrays:
> >
> > actual_arr[0] = "All";
> > actual_arr[1] = "Scripture";
> > actual_arr[2] = "is";
> > actual_arr[3] = "inspired";
> > actual_arr[4] = "by"
> > ...
> >
> > attemp_arr[0] = "All";
> > attempt_arr[1] = "Scripture";
> > attempt_arr[2] = "is";
> > attempt_arr[3] = "insired"; // red flag here for misspelled
> > ...
> >
> > // continue to process
> >
> > Compare the 2 arrays (attempted/actual) for word matches, mispelled
> > words, punctionation marks. So familiarize yourself with the
> > StringTokenizer class and the Diff algo in the jcore.utils.Diff
> > package. This may not be the best way to do this, but this atleast
> > allows some easy answers right off the bat regarding the accuracy of
> > their verse attempt, missing words/punctuation (basically any token),
> > non-matching words.
> >
> > However the algo *may* get lost from a few missing words, so we have to
> > make it smarter in figuring out where the remaining words are.. this is
> > where the real challenge of this diff algo lies: maintaining context and
> > pattern matching (via regular expressions??).
> >
> > So think about this approach and let me know what pros/cons you see --
> > shoot your ideas out and let me know cuz I need as much feedback from
> > everybody on this as I can.. Here's some questions I had to ask myself
> > about this design:
> >
> > 1) does it make sense to use 2 token-arrays for comparison
> > 2) how do we maintain context if the 2nd array is missing words, how do
> > we catch up to the original
> > 3) Should we even do this approach?
> > 4) Are there other systems out there that have a text parser already
> > available that we can reuse
> > 5) How should we score the attempt?? amount of words, mispelled words,
> > missing words, etc..
> >
> > Sorry if I'm being a bit verbose, but I'm very excited that we have
> > another developer on the squad to help us out with this.. I'd like to
> > bounce all ideas to the list and get everybody involved developer or not
> > just to see if things are making sense -- I tend to think too short-term
> > and neglect long-term implications.
> >
> > -P
> >
> > Michael Lucas wrote:
> > >
> > > Pat,
> > >
> > > I've done a limited amount of network programming and know nothing
> > > on threads. I don't mind trying to work on it, if you are not too
> > > constrained on time while I learn about it. Either project is good for
> > > me.
> > >
> > > - Mike
> >
> > _______________________________________________
> > Tsaphan-developers mailing list
> > Tsa...@li...
> > http://lists.sourceforge.net/lists/listinfo/tsaphan-developers
> >
> > ***********************************************************************
> > This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they
> > are addressed. Any unauthorized review, use, disclosure or distribution
> > is prohibited. If you are not the intended recipient, please contact
> > the sender by reply e-mail and destroy all copies of the original
> > message.
> > ***********************************************************************
> >
> > _______________________________________________
> > Tsaphan-developers mailing list
> > Tsa...@li...
> > http://lists.sourceforge.net/lists/listinfo/tsaphan-developers
>
> _______________________________________________
> Tsaphan-developers mailing list
> Tsa...@li...
> http://lists.sourceforge.net/lists/listinfo/tsaphan-developers
|