You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(5) |
Sep
|
Oct
(14) |
Nov
(37) |
Dec
(13) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(14) |
Feb
|
Mar
|
Apr
(15) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(2) |
2003 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2004 |
Jan
(1) |
Feb
(3) |
Mar
|
Apr
|
May
(4) |
Jun
(3) |
Jul
(1) |
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(17) |
Nov
(3) |
Dec
|
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(23) |
Dec
|
2007 |
Jan
|
Feb
|
Mar
(7) |
Apr
(17) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
(20) |
Oct
|
Nov
(15) |
Dec
(2) |
2009 |
Jan
(38) |
Feb
(4) |
Mar
(20) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
(17) |
Sep
(26) |
Oct
|
Nov
(2) |
Dec
|
From: Thomas M. <tsm...@us...> - 2002-12-11 16:18:46
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory sc8-pr-cvs1:/tmp/cvs-serv5862/java/opennlp/maxent Modified Files: GISModel.java Log Message: Added int getNumOutcomes method Index: GISModel.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** GISModel.java 20 Nov 2002 02:44:12 -0000 1.8 --- GISModel.java 11 Dec 2002 16:18:41 -0000 1.9 *************** *** 201,204 **** --- 201,211 ---- } + /** Returns the number of outcomes for this model. + * @return The number of outcomes. + **/ + public int getNumOutcomes() { + return(numOutcomes); + } + /** |
From: Thomas M. <tsm...@us...> - 2002-11-20 03:05:28
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory sc8-pr-cvs1:/tmp/cvs-serv5209/maxent Modified Files: GISTrainer.java Log Message: Fixed cases where parameters which only occured with a single output weren't getting updated. Ended up getting rid of pabi and cfvals structures. These have been replaced with the data for a single event, double[] modelDistribution, and this is used to update the modifiers for a single event and then updated for each additional event. This change made it easier to initialize the modleDistribution to the uniform distribution which was necessary to fix teh above problem. Also moved the computation of modelDistribution into it's own routine which is name eval and is almost exactly the same as GISModel.eval w/o doing the context string to integer mappings. Made correction constant non-optional. When the events all have the same number of contexts then the model tries to make the expected value of the correction constant nearly 0. This is needed because while the number of contexts may be same it is very unlikly that all context occur with all outcomes. Finally I made nextItteration return a double which is the log-likelihood from the previous itteration. At some point there isn't enough accuracy in a double to make further iterations useful so the routine may stop prematurly when the decrease in log-likelihood is too small. Index: GISTrainer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** GISTrainer.java 23 Apr 2002 16:10:07 -0000 1.6 --- GISTrainer.java 20 Nov 2002 03:05:25 -0000 1.7 *************** *** 1,3 **** ! /////////////////////////////////////////////////////////////////////////////// // Copyright (C) 2001 Jason Baldridge and Gann Bierner // --- 1,3 ---- ! ///////////////////////////////////////////////////////////////////////////// // Copyright (C) 2001 Jason Baldridge and Gann Bierner // *************** *** 49,53 **** private boolean printMessages = false; ! private int numTokens; // # of event tokens private int numPreds; // # of predicates --- 49,53 ---- private boolean printMessages = false; ! private int numTokens; // # of event tokens private int numPreds; // # of predicates *************** *** 58,66 **** // a global variable for adding probabilities in an array ! private double PABISUM; // records the array of predicates seen in each event private int[][] contexts; // records the num of times an event has been seen, paired to // int[][] contexts --- 58,69 ---- // a global variable for adding probabilities in an array ! private double SUM; // records the array of predicates seen in each event private int[][] contexts; + // records the array of outcomes seen in each event + private int[] outcomes; + // records the num of times an event has been seen, paired to // int[][] contexts *************** *** 91,102 **** private int[] predkeys; ! // a boolean to track if all events have same number of active features ! private boolean needCorrection; ! // initialize the GIS constant ! private int constant = 1; // stores inverse of constant after it is determined private double constantInverse; // the correction parameter of the model ! private double correctionParam = 0.0; // observed expectation of correction feature private double cfObservedExpect; --- 94,103 ---- private int[] predkeys; ! // GIS constant number of feattures fired ! private int constant; // stores inverse of constant after it is determined private double constantInverse; // the correction parameter of the model ! private double correctionParam; // observed expectation of correction feature private double cfObservedExpect; *************** *** 105,116 **** private double CFMOD; ! // stores the value of corrections feature for each event's predicate list, ! // expanded to include all outcomes which might come from those predicates. ! private TIntIntHashMap[] cfvals; ! // Normalized Probabilities Of Outcomes Given Context: p(a|b_i) ! // Stores the computation of each iterations for the update to the ! // modifiers (and therefore the params) ! private TIntDoubleHashMap[] pabi; // make all values in an TIntDoubleHashMap return to 0.0 --- 106,119 ---- private double CFMOD; ! private final double NEAR_ZERO = 0.01; ! private final double LLThreshold = 0.0001; ! // Stores the output of the current model on a single event durring ! // training. This we be reset for every event for every itteration. ! double[] modelDistribution; ! // Stores the number of features that get fired per event ! int[] numfeats; ! // initial probability for all outcomes. ! double iprob; // make all values in an TIntDoubleHashMap return to 0.0 *************** *** 120,154 **** }; ! // divide all values in the TIntDoubleHashMap pabi[TID] by the sum of ! // all values in the map. ! private TDoubleFunction normalizePABI = ! new TDoubleFunction() { ! public double execute(double arg) { return arg / PABISUM; } ! }; ! ! // add the previous iteration's parameters to the computation of the ! // modifiers of this iteration. ! private TIntDoubleProcedure addParamsToPABI = ! new TIntDoubleProcedure() { ! public boolean execute(int oid, double arg) { ! pabi[TID].adjustValue(oid, arg); ! return true; ! } ! }; ! ! // add the correction parameter and exponentiate it ! private TIntDoubleProcedure addCorrectionToPABIandExponentiate = ! new TIntDoubleProcedure() { ! public boolean execute(int oid, double arg) { ! if (needCorrection) ! arg = arg + (correctionParam * cfvals[TID].get(oid)); ! arg = Math.exp(arg); ! PABISUM += arg; ! pabi[TID].put(oid, arg); ! return true; ! } ! }; ! ! // update the modifiers based on the new pabi values private TIntDoubleProcedure updateModifiers = new TIntDoubleProcedure() { --- 123,127 ---- }; ! // update the modifiers based on the modelDistribution for this event values private TIntDoubleProcedure updateModifiers = new TIntDoubleProcedure() { *************** *** 156,160 **** modifiers[PID].put(oid, arg ! + (pabi[TID].get(oid) * numTimesEventsSeen[TID])); return true; --- 129,133 ---- modifiers[PID].put(oid, arg ! + (modelDistribution[oid] * numTimesEventsSeen[TID])); return true; *************** *** 167,185 **** public boolean execute(int oid, double arg) { params[PID].put(oid, ! arg ! + (constantInverse * ! (observedExpects[PID].get(oid) ! - Math.log(modifiers[PID].get(oid))))); ! return true; ! } ! }; ! ! // update the correction feature modifier, which will then be used to ! // updated the correction parameter ! private TIntDoubleProcedure updateCorrectionFeatureModifier = ! new TIntDoubleProcedure() { ! public boolean execute(int oid, double arg) { ! CFMOD += ! arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID]; return true; } --- 140,145 ---- public boolean execute(int oid, double arg) { params[PID].put(oid, ! arg +(observedExpects[PID].get(oid) ! - Math.log(modifiers[PID].get(oid)))); return true; } *************** *** 250,259 **** display("Incorporating indexed data for training... \n"); contexts = di.contexts; numTimesEventsSeen = di.numTimesEventsSeen; numTokens = contexts.length; ! //printTable(contexts); ! needCorrection = false; // determine the correction constant and its inverse, and check to see --- 210,221 ---- display("Incorporating indexed data for training... \n"); contexts = di.contexts; + outcomes = di.outcomeList; numTimesEventsSeen = di.numTimesEventsSeen; numTokens = contexts.length; ! //printTable(contexts); ! // a boolean to track if all events have same number of active features ! boolean needCorrection = false; // determine the correction constant and its inverse, and check to see *************** *** 269,277 **** } } constantInverse = 1.0/constant; - outcomeLabels = di.outcomeLabels; numOutcomes = outcomeLabels.length; predLabels = di.predLabels; --- 231,252 ---- } } + + int cfvalSum = 0; + for (TID=0; TID<numTokens; TID++) + cfvalSum += (constant - contexts[TID].length) + * numTimesEventsSeen[TID]; + if (cfvalSum == 0) { + cfObservedExpect = Math.log(NEAR_ZERO);//nearly zero so log is defined + } + else { + cfObservedExpect = Math.log(cfvalSum); + } + + display("done.\n"); constantInverse = 1.0/constant; outcomeLabels = di.outcomeLabels; numOutcomes = outcomeLabels.length; + iprob = Math.log(1.0/numOutcomes); predLabels = di.predLabels; *************** *** 296,300 **** // the data. The default is to assume that we observed "1/10th" of a // feature during training. ! final double smoothingObservation = Math.log(_smoothingObservation); // Get the observed expectations of the features. Strictly speaking, --- 271,276 ---- // the data. The default is to assume that we observed "1/10th" of a // feature during training. ! final double smoothingObservation = _smoothingObservation; ! final double logSmoothingObservation = Math.log(_smoothingObservation); // Get the observed expectations of the features. Strictly speaking, *************** *** 338,404 **** observedExpects[PID].compact(); } ! predCount = null; // don't need it anymore display("...done.\n"); ! pabi = new TIntDoubleHashMap[numTokens]; ! ! if (needCorrection) { ! // initialize both the pabi table and the cfvals matrix ! display("Computing correction feature matrix... "); ! ! cfvals = new TIntIntHashMap[numTokens]; ! for (TID=0; TID<numTokens; TID++) { ! cfvals[TID] = new TIntIntHashMap(initialCapacity, loadFactor); ! pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor); ! for (int j=0; j<contexts[TID].length; j++) { ! PID = contexts[TID][j]; ! predkeys = params[PID].keys(); ! for (int i=0; i<predkeys.length; i++) { ! OID = predkeys[i]; ! if (!cfvals[TID].increment(OID)) { ! cfvals[TID].put(OID, 1); ! pabi[TID].put(OID, 0.0); ! } ! } ! } ! cfvals[TID].compact(); ! pabi[TID].compact(); ! } ! ! for (TID=0; TID<numTokens; TID++) { ! predkeys = cfvals[TID].keys(); ! for (int i=0; i<predkeys.length; i++) { ! OID = predkeys[i]; ! cfvals[TID].put(OID, constant - cfvals[TID].get(OID)); ! } ! } ! ! // compute observed expectation of correction feature (E_p~ f_l) ! int cfvalSum = 0; ! for (TID=0; TID<numTokens; TID++) ! cfvalSum += (constant - contexts[TID].length) ! * numTimesEventsSeen[TID]; ! ! cfObservedExpect = Math.log(cfvalSum); ! ! display("done.\n"); ! ! } ! else { ! // initialize just the pabi table ! pabi = new TIntDoubleHashMap[numTokens]; ! for (TID=0; TID<numTokens; TID++) { ! pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor); ! for (int j=0; j<contexts[TID].length; j++) { ! PID = contexts[TID][j]; ! predkeys = params[PID].keys(); ! for (int i=0; i<predkeys.length; i++) ! pabi[TID].put(predkeys[i], 0.0); ! } ! pabi[TID].compact(); ! } ! } /***************** Find the parameters ************************/ --- 314,324 ---- observedExpects[PID].compact(); } ! correctionParam = 0.0; predCount = null; // don't need it anymore display("...done.\n"); ! modelDistribution = new double[numOutcomes]; ! numfeats = new int[numOutcomes]; /***************** Find the parameters ************************/ *************** *** 418,421 **** --- 338,343 ---- /* Estimate and return the model parameters. */ private void findParameters(int iterations) { + double prevLL = 0.0; + double currLL = 0.0; display("Performing " + iterations + " iterations.\n"); for (int i=1; i<=iterations; i++) { *************** *** 423,434 **** else if (i<100) display(" " + i + ": "); else display(i + ": "); ! nextIteration(); } // kill a bunch of these big objects now that we don't need them observedExpects = null; - pabi = null; modifiers = null; - cfvals = null; numTimesEventsSeen = null; contexts = null; --- 345,364 ---- else if (i<100) display(" " + i + ": "); else display(i + ": "); ! currLL=nextIteration(); ! if (i > 1) { ! if (prevLL > currLL) { ! System.err.println("Model Diverging: loglikelihood decreased"); ! break; ! } ! if (currLL-prevLL < LLThreshold) { ! break; ! } ! } ! prevLL=currLL; } // kill a bunch of these big objects now that we don't need them observedExpects = null; modifiers = null; numTimesEventsSeen = null; contexts = null; *************** *** 436,468 **** ! /* Compute one iteration of GIS */ ! private void nextIteration() { ! ! // compute table probabilities of outcomes given contexts ! CFMOD = 0.0; ! for (TID=0; TID<numTokens; TID++) { ! pabi[TID].transformValues(backToZeros); ! ! for (int j=0; j<contexts[TID].length; j++) ! params[contexts[TID][j]].forEachEntry(addParamsToPABI); ! PABISUM = 0.0; // PABISUM is computed in the next line's procedure ! pabi[TID].forEachEntry(addCorrectionToPABIandExponentiate); ! if (PABISUM > 0.0) pabi[TID].transformValues(normalizePABI); ! if (needCorrection) ! pabi[TID].forEachEntry(updateCorrectionFeatureModifier); ! } ! display("."); // compute contribution of p(a|b_i) for each feature and the new // correction parameter for (TID=0; TID<numTokens; TID++) { ! for (int j=0; j<contexts[TID].length; j++) { ! // do not remove the next line since we need to know PID ! // globally for the updateModifiers procedure used after it ! PID = contexts[TID][j]; ! modifiers[PID].forEachEntry(updateModifiers); ! } } display("."); --- 366,433 ---- ! /** ! * Use this model to evaluate a context and return an array of the ! * likelihood of each outcome given that context. ! * ! * @param context The integers of the predicates which have been ! * observed at the present decision point. ! * @return The normalized probabilities for the outcomes given the ! * context. The indexes of the double[] are the outcome ! * ids, and the actual string representation of the ! * outcomes can be obtained from the method ! * getOutcome(int i). ! */ ! public void eval(int[] context, double[] outsums) { ! for (int oid=0; oid<numOutcomes; oid++) { ! outsums[oid] = iprob; ! numfeats[oid] = 0; ! } ! int[] activeOutcomes; ! for (int i=0; i<context.length; i++) { ! TIntDoubleHashMap predParams = params[context[i]]; ! activeOutcomes = predParams.keys(); ! for (int j=0; j<activeOutcomes.length; j++) { ! int oid = activeOutcomes[j]; ! numfeats[oid]++; ! outsums[oid] += constantInverse * predParams.get(oid); ! } ! } ! double SUM = 0.0; ! for (int oid=0; oid<numOutcomes; oid++) { ! outsums[oid] = Math.exp(outsums[oid] ! + ((1.0 - ! (numfeats[oid]/constant)) ! * correctionParam)); ! SUM += outsums[oid]; ! } ! for (int oid=0; oid<numOutcomes; oid++) ! outsums[oid] /= SUM; ! ! } ! + /* Compute one iteration of GIS and retutn log-likelihood.*/ + private double nextIteration() { // compute contribution of p(a|b_i) for each feature and the new // correction parameter + double loglikelihood = 0.0; + CFMOD=0.0; for (TID=0; TID<numTokens; TID++) { ! // modeldistribution and PID are globals used in ! // the updateModifiers procedure. They need to be set. ! eval(contexts[TID],modelDistribution); ! for (int j=0; j<contexts[TID].length; j++) { ! PID = contexts[TID][j]; ! modifiers[PID].forEachEntry(updateModifiers); ! for (OID=0;OID<numOutcomes;OID++) { ! if (!modifiers[PID].containsKey(OID)) { ! CFMOD+=modelDistribution[OID]*numTimesEventsSeen[TID]; ! } ! } ! loglikelihood+=Math.log(modelDistribution[outcomes[TID]]); ! } ! CFMOD+=constant-contexts[TID].length; } display("."); *************** *** 473,483 **** modifiers[PID].transformValues(backToZeros); // re-initialize to 0.0's } - if (CFMOD > 0.0) ! correctionParam += ! constantInverse * (cfObservedExpect - Math.log(CFMOD)); ! display(".\n"); ! } --- 438,446 ---- modifiers[PID].transformValues(backToZeros); // re-initialize to 0.0's } if (CFMOD > 0.0) ! correctionParam +=(cfObservedExpect - Math.log(CFMOD)); ! display(". loglikelihood="+loglikelihood+"\n"); ! return(loglikelihood); } |
From: Thomas M. <tsm...@us...> - 2002-11-20 02:44:15
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory sc8-pr-cvs1:/tmp/cvs-serv30495/maxent Modified Files: GISModel.java Log Message: Added eval method so distribution could be passed in rathar then allocated durring each call. Left old interface in place but modified it to use the new eval method. Also made numfeats a class level variable. Index: GISModel.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** GISModel.java 19 Apr 2002 09:29:24 -0000 1.7 --- GISModel.java 20 Nov 2002 02:44:12 -0000 1.8 *************** *** 38,41 **** --- 38,43 ---- private final double iprob; private final double fval; + + private int[] numfeats; public GISModel (TIntDoubleHashMap[] _params, *************** *** 57,65 **** iprob = Math.log(1.0/numOutcomes); fval = 1.0/correctionConstant; ! } - - /** * Use this model to evaluate a context and return an array of the --- 59,65 ---- iprob = Math.log(1.0/numOutcomes); fval = 1.0/correctionConstant; ! numfeats = new int[numOutcomes]; } /** * Use this model to evaluate a context and return an array of the *************** *** 75,98 **** */ public final double[] eval(String[] context) { ! double[] outsums = new double[numOutcomes]; ! int[] numfeats = new int[numOutcomes]; ! ! for (int oid=0; oid<numOutcomes; oid++) { outsums[oid] = iprob; numfeats[oid] = 0; } - - int[] activeOutcomes; for (int i=0; i<context.length; i++) { ! if (pmap.containsKey(context[i])) { ! TIntDoubleHashMap predParams = ! params[pmap.get(context[i])]; ! activeOutcomes = predParams.keys(); ! for (int j=0; j<activeOutcomes.length; j++) { ! int oid = activeOutcomes[j]; ! numfeats[oid]++; ! outsums[oid] += fval * predParams.get(oid); ! } ! } } --- 75,111 ---- */ public final double[] eval(String[] context) { ! return(eval(context,new double[numOutcomes])); ! } ! ! /** ! * Use this model to evaluate a context and return an array of the ! * likelihood of each outcome given that context. ! * ! * @param context The names of the predicates which have been observed at ! * the present decision point. ! * @param outsums This is where the distribution is stored. ! * @return The normalized probabilities for the outcomes given the ! * context. The indexes of the double[] are the outcome ! * ids, and the actual string representation of the ! * outcomes can be obtained from the method ! * getOutcome(int i). ! */ ! public final double[] eval(String[] context, double[] outsums) { ! int[] activeOutcomes; ! for (int oid=0; oid<numOutcomes; oid++) { outsums[oid] = iprob; numfeats[oid] = 0; } for (int i=0; i<context.length; i++) { ! if (pmap.containsKey(context[i])) { ! TIntDoubleHashMap predParams = ! params[pmap.get(context[i])]; ! activeOutcomes = predParams.keys(); ! for (int j=0; j<activeOutcomes.length; j++) { ! int oid = activeOutcomes[j]; ! numfeats[oid]++; ! outsums[oid] += fval * predParams.get(oid); ! } ! } } |
From: Thomas M. <tsm...@us...> - 2002-11-20 02:41:33
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory sc8-pr-cvs1:/tmp/cvs-serv30038/maxent Modified Files: DataIndexer.java Log Message: Fixed bug where singleton events are dropped. Index: DataIndexer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** DataIndexer.java 19 Apr 2002 09:59:53 -0000 1.9 --- DataIndexer.java 20 Nov 2002 02:41:30 -0000 1.10 *************** *** 66,70 **** System.out.print("\tComputing event counts... "); events = computeEventCounts(eventStream,predicateIndex,cutoff); ! System.out.println("done."); System.out.print("\tIndexing... "); --- 66,70 ---- System.out.print("\tComputing event counts... "); events = computeEventCounts(eventStream,predicateIndex,cutoff); ! System.out.println("done. "+events.size()+" events"); System.out.print("\tIndexing... "); *************** *** 157,167 **** if (! predicatesInOut.containsKey(ec[j])) { if (counter.increment(ec[j])) { - if (counter.get(ec[j]) >= cutoff) { - predicatesInOut.put(ec[j], predicateIndex++); - counter.remove(ec[j]); - } } else { counter.put(ec[j], 1); } } } --- 157,167 ---- if (! predicatesInOut.containsKey(ec[j])) { if (counter.increment(ec[j])) { } else { counter.put(ec[j], 1); } + if (counter.get(ec[j]) >= cutoff) { + predicatesInOut.put(ec[j], predicateIndex++); + counter.remove(ec[j]); + } } } *************** *** 208,211 **** --- 208,214 ---- eventsToCompare.add(ce); } + else { + System.err.println("Dropped event "+ev.getOutcome()+":"+Arrays.asList(ev.getContext())); + } // recycle the TIntArrayList indexedContext.resetQuick(); |
From: Jason B. <jas...@us...> - 2002-04-30 08:48:39
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv19942/src/java/opennlp/maxent Modified Files: BasicContextGenerator.java Log Message: Fixed bug: BasicContextGenerator was retaining whitespace. Index: BasicContextGenerator.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/BasicContextGenerator.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** BasicContextGenerator.java 20 Nov 2001 17:05:37 -0000 1.1 --- BasicContextGenerator.java 30 Apr 2002 08:48:35 -0000 1.2 *************** *** 39,51 **** public String[] getContext(Object o) { String s = (String)o; ! int prevIndex = 0; int index = s.indexOf(' '); List cuts = new ArrayList(); while (index != -1) { ! cuts.add(s.substring(prevIndex, index)); prevIndex = index; index = s.indexOf(' ', ++index); } ! cuts.add(s.substring(prevIndex, s.length())); return (String[])cuts.toArray(new String[cuts.size()]); } --- 39,51 ---- public String[] getContext(Object o) { String s = (String)o; ! int prevIndex = -1; int index = s.indexOf(' '); List cuts = new ArrayList(); while (index != -1) { ! cuts.add(s.substring(prevIndex+1, index)); prevIndex = index; index = s.indexOf(' ', ++index); } ! cuts.add(s.substring(prevIndex+1, s.length())); return (String[])cuts.toArray(new String[cuts.size()]); } |
From: Jason B. <jas...@us...> - 2002-04-30 08:48:39
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv19942 Modified Files: CHANGES Log Message: Fixed bug: BasicContextGenerator was retaining whitespace. Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** CHANGES 25 Apr 2002 15:01:07 -0000 1.14 --- CHANGES 30 Apr 2002 08:48:35 -0000 1.15 *************** *** 1,4 **** --- 1,6 ---- 1.2.10 ------ + Fixed minor bug (found by Arno Erpenbeck) in BasicContextGenerator: it + was retaining whitespace in the contextual predicates. (Jason) Added error message to TrainEval's eval() method. (Jason) |
From: Jason B. <jas...@us...> - 2002-04-25 15:08:06
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv23879/src/java/opennlp/maxent Modified Files: TrainEval.java Log Message: Added usage message to TrainEval's eval() method. Index: TrainEval.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/TrainEval.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** TrainEval.java 14 Nov 2001 17:39:56 -0000 1.2 --- TrainEval.java 25 Apr 2002 15:01:07 -0000 1.3 *************** *** 96,115 **** } } ! FileReader datafr = new FileReader(args[g.getOptind()]); ! ! if(train) { MaxentModel m = train(new EventCollectorAsStream(e.getEventCollector(datafr)), cutoff); ! new BinaryGISModelWriter((GISModel)m, new File(dir+stem)).persist(); } else { MaxentModel model = ! new BinaryGISModelReader(new File(dir+stem)).getModel(); ! if(local) e.localEval(model, datafr, e, verbose); ! else eval(model, datafr, e, verbose); } } --- 96,139 ---- } } + + int lastIndex = g.getOptind(); + if (lastIndex >= args.length) { + System.out.println("This is a usage message from opennlp.maxent.TrainEval. You have called the training procedure for a maxent application with the incorrect arguments. These are the options:"); + + System.out.println("\nOptions for defining the model location and name:"); + System.out.println(" -d <directoryName>"); + System.out.println("\tThe directory in which to store the model."); + System.out.println(" -s <modelName>"); + System.out.println("\tThe name of the model, e.g. EnglishPOS.bin.gz or NameFinder.txt."); + + System.out.println("\nOptions for training:"); + System.out.println(" -c <cutoff>"); + System.out.println("\tAn integer cutoff level to reduce infrequent contextual predicates."); + System.out.println(" -t\tTrain a model. If absent, the given model will be loaded and evaluated."); + System.out.println("\nOptions for evaluation:"); + System.out.println(" -l\t the evaluation method of class that uses the model. If absent, TrainEval's eval method is used."); + System.out.println(" -v\t verbose."); + System.out.println("\nThe final argument is the data file to be loaded and used for either training or evaluation."); + System.out.println("\nAs an example for training:\n java opennlp.grok.preprocess.postag.POSTaggerME -t -d ./ -s EnglishPOS.bin.gz -c 7 postag.data"); + System.exit(0); + } + + FileReader datafr = new FileReader(args[lastIndex]); ! if (train) { MaxentModel m = train(new EventCollectorAsStream(e.getEventCollector(datafr)), cutoff); ! new SuffixSensitiveGISModelWriter((GISModel)m, ! new File(dir+stem)).persist(); } else { MaxentModel model = ! new SuffixSensitiveGISModelReader(new File(dir+stem)).getModel(); ! if (local) { e.localEval(model, datafr, e, verbose); ! } else { eval(model, datafr, e, verbose); + } } } |
From: Jason B. <jas...@us...> - 2002-04-25 15:08:05
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv23879 Modified Files: CHANGES Log Message: Added usage message to TrainEval's eval() method. Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** CHANGES 19 Apr 2002 12:34:02 -0000 1.13 --- CHANGES 25 Apr 2002 15:01:07 -0000 1.14 *************** *** 2,5 **** --- 2,7 ---- ------ + Added error message to TrainEval's eval() method. (Jason) + 1.2.9 (Bug fix release) |
From: Jason B. <jas...@us...> - 2002-04-23 16:10:12
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv15793 Modified Files: GISTrainer.java Log Message: Index: GISTrainer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** GISTrainer.java 9 Apr 2002 09:15:10 -0000 1.5 --- GISTrainer.java 23 Apr 2002 16:10:07 -0000 1.6 *************** *** 236,240 **** * will be trained. * @param iterations The number of GIS iterations to perform. ! * @param cutoff The number of times a feature must be seen in order * to be relevant for training. * @return The newly trained model, which can be used immediately or saved --- 236,240 ---- * will be trained. * @param iterations The number of GIS iterations to perform. ! * @param cutoff The number of times a predicate must be seen in order * to be relevant for training. * @return The newly trained model, which can be used immediately or saved |
From: Jason B. <jas...@us...> - 2002-04-19 12:34:09
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv1852 Modified Files: CHANGES build.xml Log Message: Updated version number after release. Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** CHANGES 9 Apr 2002 09:46:59 -0000 1.12 --- CHANGES 19 Apr 2002 12:34:02 -0000 1.13 *************** *** 1,5 **** ! 1.2.9 _____ --- 1,15 ---- ! 1.2.10 ! ------ ! ! ! 1.2.9 (Bug fix release) _____ + Modified the cutoff loop in DataIndexer to use the increment() method + of TObjectIntHashMap. (Jason) + + Fixed a bug (found by Chieu Hai Leong) in which the correctionConstant + of GISModel was an int that was used in division. Now, + correctionConstant is a double. (Jason) Index: build.xml =================================================================== RCS file: /cvsroot/maxent/maxent/build.xml,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** build.xml 9 Apr 2002 09:46:59 -0000 1.17 --- build.xml 19 Apr 2002 12:34:02 -0000 1.18 *************** *** 10,14 **** <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.9"/> <property name="year" value="2002"/> --- 10,14 ---- <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.10"/> <property name="year" value="2002"/> |
From: Jason B. <jas...@us...> - 2002-04-19 09:59:56
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv25071/src/java/opennlp/maxent Modified Files: DataIndexer.java Log Message: Modified the cutoff loop to use the increment() method of TObjectIntHashMap. Index: DataIndexer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** DataIndexer.java 3 Jan 2002 16:43:23 -0000 1.8 --- DataIndexer.java 19 Apr 2002 09:59:53 -0000 1.9 *************** *** 156,165 **** for (int j=0; j<ec.length; j++) { if (! predicatesInOut.containsKey(ec[j])) { ! int count = counter.get(ec[j]) + 1; ! if (count >= cutoff) { ! predicatesInOut.put(ec[j], predicateIndex++); ! counter.remove(ec[j]); ! } else { ! counter.put(ec[j], count); } } --- 156,166 ---- for (int j=0; j<ec.length; j++) { if (! predicatesInOut.containsKey(ec[j])) { ! if (counter.increment(ec[j])) { ! if (counter.get(ec[j]) >= cutoff) { ! predicatesInOut.put(ec[j], predicateIndex++); ! counter.remove(ec[j]); ! } ! } else { ! counter.put(ec[j], 1); } } |
From: Jason B. <jas...@us...> - 2002-04-19 09:29:28
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv14897/src/java/opennlp/maxent Modified Files: GISModel.java Log Message: Fixed bug: correctionConstant is now a double rather than an int. Index: GISModel.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** GISModel.java 27 Dec 2001 19:20:26 -0000 1.6 --- GISModel.java 19 Apr 2002 09:29:24 -0000 1.7 *************** *** 32,36 **** private final TObjectIntHashMap pmap; private final String[] ocNames; ! private final int correctionConstant; private final double correctionParam; --- 32,36 ---- private final TObjectIntHashMap pmap; private final String[] ocNames; ! private final double correctionConstant; private final double correctionParam; *************** *** 51,55 **** params = _params; ocNames = _ocNames; ! correctionConstant = _correctionConstant; correctionParam = _correctionParam; --- 51,55 ---- params = _params; ocNames = _ocNames; ! correctionConstant = (double)_correctionConstant; correctionParam = _correctionParam; *************** *** 214,218 **** data[1] = pmap; data[2] = ocNames; ! data[3] = new Integer(correctionConstant); data[4] = new Double(correctionParam); return data; --- 214,218 ---- data[1] = pmap; data[2] = ocNames; ! data[3] = new Integer((int)correctionConstant); data[4] = new Double(correctionParam); return data; |
From: Jason B. <jas...@us...> - 2002-04-09 11:13:34
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv27381/src/java/opennlp/maxent Modified Files: GISTrainer.java Log Message: Made use of new increment() and adjustValue() methods available for Trove hashmaps. Index: GISTrainer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** GISTrainer.java 8 Apr 2002 16:14:06 -0000 1.4 --- GISTrainer.java 9 Apr 2002 09:15:10 -0000 1.5 *************** *** 132,136 **** new TIntDoubleProcedure() { public boolean execute(int oid, double arg) { ! pabi[TID].put(oid, pabi[TID].get(oid) + arg); return true; } --- 132,136 ---- new TIntDoubleProcedure() { public boolean execute(int oid, double arg) { ! pabi[TID].adjustValue(oid, arg); return true; } *************** *** 180,184 **** new TIntDoubleProcedure() { public boolean execute(int oid, double arg) { ! CFMOD += arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID]; return true; } --- 180,185 ---- new TIntDoubleProcedure() { public boolean execute(int oid, double arg) { ! CFMOD += ! arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID]; return true; } *************** *** 357,363 **** for (int i=0; i<predkeys.length; i++) { OID = predkeys[i]; ! if (cfvals[TID].containsKey(OID)) { ! cfvals[TID].put(OID, cfvals[TID].get(OID) + 1); ! } else { cfvals[TID].put(OID, 1); pabi[TID].put(OID, 0.0); --- 358,362 ---- for (int i=0; i<predkeys.length; i++) { OID = predkeys[i]; ! if (!cfvals[TID].increment(OID)) { cfvals[TID].put(OID, 1); pabi[TID].put(OID, 0.0); |
From: Jason B. <jas...@us...> - 2002-04-09 11:08:24
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv27381 Modified Files: CHANGES Log Message: Made use of new increment() and adjustValue() methods available for Trove hashmaps. Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** CHANGES 8 Apr 2002 16:14:06 -0000 1.10 --- CHANGES 9 Apr 2002 09:15:08 -0000 1.11 *************** *** 2,5 **** --- 2,8 ---- _____ + Modified GISTrainer to use the new increment() and adjustValue() + methods available in Trove 0.1.4 hashmaps. (Jason) + Set up the GISTrainer to use an initial capacity and load factor for the big hashmaps it uses. The initial capacity is half the number of |
From: Jason B. <jas...@us...> - 2002-04-09 11:08:12
|
Update of /cvsroot/maxent/maxent/lib In directory usw-pr-cvs1:/tmp/cvs-serv27381/lib Modified Files: LIBNOTES trove.jar Log Message: Made use of new increment() and adjustValue() methods available for Trove hashmaps. Index: LIBNOTES =================================================================== RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** LIBNOTES 8 Apr 2002 16:13:27 -0000 1.9 --- LIBNOTES 9 Apr 2002 09:15:08 -0000 1.10 *************** *** 29,33 **** trove.jar ! GNU Trove, version 0.1.3 Homepage: http://trove4j.sf.net License: LGPL --- 29,33 ---- trove.jar ! GNU Trove, version 0.1.4 Homepage: http://trove4j.sf.net License: LGPL Index: trove.jar =================================================================== RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 Binary files /tmp/cvsetPal3 and /tmp/cvscfpul1 differ |
From: Jason B. <jas...@us...> - 2002-04-09 09:47:05
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv4172 Modified Files: CHANGES build.xml Log Message: Upgraded the version to 1.2.9 Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** CHANGES 9 Apr 2002 09:15:08 -0000 1.11 --- CHANGES 9 Apr 2002 09:46:59 -0000 1.12 *************** *** 1,3 **** ! 1.2.7 _____ --- 1,8 ---- ! 1.2.9 ! _____ ! ! ! ! 1.2.8 _____ Index: build.xml =================================================================== RCS file: /cvsroot/maxent/maxent/build.xml,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** build.xml 14 Jan 2002 14:58:14 -0000 1.16 --- build.xml 9 Apr 2002 09:46:59 -0000 1.17 *************** *** 10,14 **** <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.7"/> <property name="year" value="2002"/> --- 10,14 ---- <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.9"/> <property name="year" value="2002"/> *************** *** 131,136 **** <tar tarfile="${name}-${version}-src.tar" basedir="../" ! includes="${Name}/**" > ! <exclude name="${Name}/docs/api/**"/> <exclude name="**/CVS"/> </tar> --- 131,136 ---- <tar tarfile="${name}-${version}-src.tar" basedir="../" ! includes="${name}/**" > ! <exclude name="${name}/docs/api/**"/> <exclude name="**/CVS"/> </tar> |
From: Jason B. <jas...@us...> - 2002-04-08 16:14:13
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv7102/src/java/opennlp/maxent Modified Files: GISTrainer.java Log Message: Set up the GISTrainer to use an initial capacity and load factor for the big hashmaps it uses Index: GISTrainer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** GISTrainer.java 27 Dec 2001 19:20:26 -0000 1.3 --- GISTrainer.java 8 Apr 2002 16:14:06 -0000 1.4 *************** *** 306,313 **** observedExpects = new TIntDoubleHashMap[numPreds]; for (PID=0; PID<numPreds; PID++) { ! params[PID] = new TIntDoubleHashMap(); ! modifiers[PID] = new TIntDoubleHashMap(); ! observedExpects[PID] = new TIntDoubleHashMap(); for (OID=0; OID<numOutcomes; OID++) { if (predCount[PID][OID] > 0) { --- 306,324 ---- observedExpects = new TIntDoubleHashMap[numPreds]; + int initialCapacity; + float loadFactor = (float)0.9; + if (numOutcomes < 3) { + initialCapacity = 2; + loadFactor = (float)1.0; + } else if (numOutcomes < 5) { + initialCapacity = 2; + } else { + initialCapacity = (int)numOutcomes/2; + } for (PID=0; PID<numPreds; PID++) { ! params[PID] = new TIntDoubleHashMap(initialCapacity, loadFactor); ! modifiers[PID] = new TIntDoubleHashMap(initialCapacity, loadFactor); ! observedExpects[PID] = ! new TIntDoubleHashMap(initialCapacity, loadFactor); for (OID=0; OID<numOutcomes; OID++) { if (predCount[PID][OID] > 0) { *************** *** 339,344 **** cfvals = new TIntIntHashMap[numTokens]; for (TID=0; TID<numTokens; TID++) { ! cfvals[TID] = new TIntIntHashMap(); ! pabi[TID] = new TIntDoubleHashMap(); for (int j=0; j<contexts[TID].length; j++) { PID = contexts[TID][j]; --- 350,355 ---- cfvals = new TIntIntHashMap[numTokens]; for (TID=0; TID<numTokens; TID++) { ! cfvals[TID] = new TIntIntHashMap(initialCapacity, loadFactor); ! pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor); for (int j=0; j<contexts[TID].length; j++) { PID = contexts[TID][j]; *************** *** 381,385 **** pabi = new TIntDoubleHashMap[numTokens]; for (TID=0; TID<numTokens; TID++) { ! pabi[TID] = new TIntDoubleHashMap(); for (int j=0; j<contexts[TID].length; j++) { PID = contexts[TID][j]; --- 392,396 ---- pabi = new TIntDoubleHashMap[numTokens]; for (TID=0; TID<numTokens; TID++) { ! pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor); for (int j=0; j<contexts[TID].length; j++) { PID = contexts[TID][j]; |
From: Jason B. <jas...@us...> - 2002-04-08 16:14:12
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv7102 Modified Files: CHANGES Log Message: Set up the GISTrainer to use an initial capacity and load factor for the big hashmaps it uses Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** CHANGES 3 Jan 2002 16:14:41 -0000 1.9 --- CHANGES 8 Apr 2002 16:14:06 -0000 1.10 *************** *** 2,5 **** --- 2,8 ---- _____ + Set up the GISTrainer to use an initial capacity and load factor for + the big hashmaps it uses. The initial capacity is half the number of + outcomes, and the load factor is 0.9. (Jason) (opennlp.maxent.DataIndexer) Do not index events with 0 active features. |
From: Jason B. <jas...@us...> - 2002-04-08 16:13:33
|
Update of /cvsroot/maxent/maxent/lib In directory usw-pr-cvs1:/tmp/cvs-serv6747 Modified Files: LIBNOTES trove.jar Log Message: Index: LIBNOTES =================================================================== RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** LIBNOTES 14 Jan 2002 14:58:15 -0000 1.8 --- LIBNOTES 8 Apr 2002 16:13:27 -0000 1.9 *************** *** 29,33 **** trove.jar ! GNU Trove, version 0.1.2 Homepage: http://trove4j.sf.net License: LGPL --- 29,33 ---- trove.jar ! GNU Trove, version 0.1.3 Homepage: http://trove4j.sf.net License: LGPL Index: trove.jar =================================================================== RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 Binary files /tmp/cvsaat9na and /tmp/cvsA5a8Ob differ |
From: Jason B. <jas...@us...> - 2002-01-20 15:09:29
|
Update of /cvsroot/maxent/maxent/lib In directory usw-pr-cvs1:/tmp/cvs-serv12828/lib Modified Files: trove.jar Log Message: Updated to v0.1.2 of trove. Index: trove.jar =================================================================== RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 Binary files /tmp/cvsh5f0uO and /tmp/cvsGbBVau differ |
From: Jason B. <jas...@us...> - 2002-01-14 14:58:19
|
Update of /cvsroot/maxent/maxent/lib In directory usw-pr-cvs1:/tmp/cvs-serv25510/lib Modified Files: LIBNOTES trove.jar Log Message: Upgraded to trove v0.1.2 and moved maxent to devel v1.2.7. Index: LIBNOTES =================================================================== RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** LIBNOTES 2002/01/02 20:00:39 1.7 --- LIBNOTES 2002/01/14 14:58:15 1.8 *************** *** 29,33 **** trove.jar ! GNU Trove, version 0.1.1 Homepage: http://trove4j.sf.net License: LGPL --- 29,33 ---- trove.jar ! GNU Trove, version 0.1.2 Homepage: http://trove4j.sf.net License: LGPL Index: trove.jar =================================================================== RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 Binary files /tmp/cvsVkUnLd and /tmp/cvsIOr0ui differ |
From: Jason B. <jas...@us...> - 2002-01-14 14:58:18
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv25510 Modified Files: build.xml Log Message: Upgraded to trove v0.1.2 and moved maxent to devel v1.2.7. Index: build.xml =================================================================== RCS file: /cvsroot/maxent/maxent/build.xml,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** build.xml 2002/01/03 16:14:41 1.15 --- build.xml 2002/01/14 14:58:14 1.16 *************** *** 10,14 **** <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.6"/> <property name="year" value="2002"/> --- 10,14 ---- <property name="Name" value="Maxent"/> <property name="name" value="maxent"/> ! <property name="version" value="1.2.7"/> <property name="year" value="2002"/> |
From: Eric F. <er...@us...> - 2002-01-03 16:43:26
|
Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent In directory usw-pr-cvs1:/tmp/cvs-serv11068/src/java/opennlp/maxent Modified Files: DataIndexer.java Log Message: bug fix: replace ComparableEvent[] array with an ArrayList so that we don't make assumptions about the size of the event index until we've filtered out events that have no active features. The native array approach was a problem inasmuch as it could contain null entries (for the dropped events) that would break the sorting routine. ArrayList avoids this pitfall by sorting just the parts of the underlying array that have entries. Index: DataIndexer.java =================================================================== RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** DataIndexer.java 2002/01/03 14:34:29 1.7 --- DataIndexer.java 2002/01/03 16:43:23 1.8 *************** *** 59,63 **** TObjectIntHashMap predicateIndex; TLinkedList events; ! ComparableEvent[] eventsToCompare; predicateIndex = new TObjectIntHashMap(); --- 59,63 ---- TObjectIntHashMap predicateIndex; TLinkedList events; ! List eventsToCompare; predicateIndex = new TObjectIntHashMap(); *************** *** 90,114 **** * @since maxent 1.2.6 */ ! private void sortAndMerge(ComparableEvent[] eventsToCompare) { ! Arrays.sort(eventsToCompare); ! int numEvents = eventsToCompare.length; int numUniqueEvents = 1; // assertion: eventsToCompare.length >= 1 ! if (eventsToCompare.length <= 1) { return; // nothing to do; edge case (see assertion) } ! ComparableEvent ce = eventsToCompare[0]; for (int i=1; i<numEvents; i++) { ! if (ce.compareTo(eventsToCompare[i]) == 0) { ce.seen++; // increment the seen count ! eventsToCompare[i] = null; // kill the duplicate } else { ! ce = eventsToCompare[i]; // a new champion emerges... numUniqueEvents++; // increment the # of unique events } } ! System.out.println("done. Reduced " + eventsToCompare.length + " events to " + numUniqueEvents + "."); --- 90,116 ---- * @since maxent 1.2.6 */ ! private void sortAndMerge(List eventsToCompare) { ! Collections.sort(eventsToCompare); ! int numEvents = eventsToCompare.size(); int numUniqueEvents = 1; // assertion: eventsToCompare.length >= 1 ! if (numEvents <= 1) { return; // nothing to do; edge case (see assertion) } ! ComparableEvent ce = (ComparableEvent)eventsToCompare.get(0); for (int i=1; i<numEvents; i++) { ! ComparableEvent ce2 = (ComparableEvent)eventsToCompare.get(i); ! ! if (ce.compareTo(ce2) == 0) { ce.seen++; // increment the seen count ! eventsToCompare.set(i, null); // kill the duplicate } else { ! ce = ce2; // a new champion emerges... numUniqueEvents++; // increment the # of unique events } } ! System.out.println("done. Reduced " + numEvents + " events to " + numUniqueEvents + "."); *************** *** 118,122 **** for (int i = 0, j = 0; i<numEvents; i++) { ! ComparableEvent evt = eventsToCompare[i]; if (null == evt) { continue; // this was a dupe, skip over it. --- 120,124 ---- for (int i = 0, j = 0; i<numEvents; i++) { ! ComparableEvent evt = (ComparableEvent)eventsToCompare.get(i); if (null == evt) { continue; // this was a dupe, skip over it. *************** *** 168,173 **** } ! private ComparableEvent[] index(TLinkedList events, ! TObjectIntHashMap predicateIndex) { TObjectIntHashMap omap = new TObjectIntHashMap(); --- 170,175 ---- } ! private List index(TLinkedList events, ! TObjectIntHashMap predicateIndex) { TObjectIntHashMap omap = new TObjectIntHashMap(); *************** *** 175,179 **** int outcomeCount = 0; int predCount = 0; ! ComparableEvent[] eventsToCompare = new ComparableEvent[numEvents]; TIntArrayList indexedContext = new TIntArrayList(); --- 177,181 ---- int outcomeCount = 0; int predCount = 0; ! List eventsToCompare = new ArrayList(numEvents); TIntArrayList indexedContext = new TIntArrayList(); *************** *** 181,184 **** --- 183,187 ---- Event ev = (Event)events.removeFirst(); String[] econtext = ev.getContext(); + ComparableEvent ce; int predID, ocID; *************** *** 201,206 **** // drop events with no active features if (indexedContext.size() > 0) { ! eventsToCompare[eventIndex] = ! new ComparableEvent(ocID, indexedContext.toNativeArray()); } // recycle the TIntArrayList --- 204,209 ---- // drop events with no active features if (indexedContext.size() > 0) { ! ce = new ComparableEvent(ocID, indexedContext.toNativeArray()); ! eventsToCompare.add(ce); } // recycle the TIntArrayList |
From: Jason B. <jas...@us...> - 2002-01-03 16:14:45
|
Update of /cvsroot/maxent/maxent In directory usw-pr-cvs1:/tmp/cvs-serv4931 Modified Files: CHANGES build.xml Log Message: Just some text modifications that I did while making the release and forgot to commit. Index: CHANGES =================================================================== RCS file: /cvsroot/maxent/maxent/CHANGES,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** CHANGES 2002/01/03 14:34:29 1.8 --- CHANGES 2002/01/03 16:14:41 1.9 *************** *** 39,42 **** --- 39,44 ---- 1.2.6 ----- + Summary: efficiency improvements for model training. + Removed Colt dependency in favor of GNU Trove. (Eric) *************** *** 52,56 **** There is still more to be done in this department, however. (Eric) ! The output directory is now "output" instead of "build". (Jason) 1.2.4 --- 54,59 ---- There is still more to be done in this department, however. (Eric) ! The output directory of the build structure is now "output" instead of ! "build". (Jason) 1.2.4 Index: build.xml =================================================================== RCS file: /cvsroot/maxent/maxent/build.xml,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** build.xml 2002/01/02 11:31:30 1.14 --- build.xml 2002/01/03 16:14:41 1.15 *************** *** 11,15 **** <property name="name" value="maxent"/> <property name="version" value="1.2.6"/> ! <property name="year" value="2001"/> <echo message="----------- ${Name} ${version} [${year}] ------------"/> --- 11,15 ---- <property name="name" value="maxent"/> <property name="version" value="1.2.6"/> ! <property name="year" value="2002"/> <echo message="----------- ${Name} ${version} [${year}] ------------"/> *************** *** 122,126 **** </addfiles> </jlink> - <delete file="${build.dir}/${name}-${DSTAMP}.jar" /> </target> --- 122,125 ---- *************** *** 132,137 **** <tar tarfile="${name}-${version}-src.tar" basedir="../" ! includes="${name}/**" > ! <exclude name="${name}/docs/api/**"/> <exclude name="**/CVS"/> </tar> --- 131,136 ---- <tar tarfile="${name}-${version}-src.tar" basedir="../" ! includes="${Name}/**" > ! <exclude name="${Name}/docs/api/**"/> <exclude name="**/CVS"/> </tar> |
From: Jason B. <jas...@us...> - 2002-01-03 16:14:45
|
Update of /cvsroot/maxent/maxent/docs In directory usw-pr-cvs1:/tmp/cvs-serv4931/docs Modified Files: about.html index.html Log Message: Just some text modifications that I did while making the release and forgot to commit. Index: about.html =================================================================== RCS file: /cvsroot/maxent/maxent/docs/about.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** about.html 2001/10/30 09:52:44 1.1 --- about.html 2002/01/03 16:14:41 1.2 *************** *** 155,159 **** <h2>Authors</h2> ! <p>The opennlp.maxent package was built by <a href="http://www.cogsci.ed.ac.uk/~jmb/">Jason Baldridge</a>, <a href="http://www.cis.upenn.edu/~tsmorton/">Tom Morton</a>, and <a --- 155,159 ---- <h2>Authors</h2> ! <p>The opennlp.maxent package was originally built by <a href="http://www.cogsci.ed.ac.uk/~jmb/">Jason Baldridge</a>, <a href="http://www.cis.upenn.edu/~tsmorton/">Tom Morton</a>, and <a *************** *** 169,172 **** --- 169,176 ---- (POS tagger, end of sentence detector, tokenizer, name finder) possible! + </p> + + <p>Eric Friedman has been steadily improving the efficiency and design + of the package since version 1.2.0. </p> Index: index.html =================================================================== RCS file: /cvsroot/maxent/maxent/docs/index.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** index.html 2001/10/30 09:52:44 1.3 --- index.html 2002/01/03 16:14:41 1.4 *************** *** 41,46 **** <p> This web page contains some details about maximum entropy and using ! the opennlp.maxent package. It is updated periodically, but check out ! the <a href="https://sourceforge.net/project/?group_id=5961">Sourceforge page for Maxent</a> for the latest news. You can also ask questions and --- 41,46 ---- <p> This web page contains some details about maximum entropy and using ! the opennlp.maxent package. It is updated only periodically, so check ! out the <a href="https://sourceforge.net/project/?group_id=5961">Sourceforge page for Maxent</a> for the latest news. You can also ask questions and *************** *** 69,73 **** <h3> Email: <a href="mailto:jm...@co...">jm...@co...</a><br> ! 2001 October 29 <br> <br> <A href="http://sourceforge.net"> <IMG src="http://sourceforge.net/sflogo.php?group_id=5961&type=1" width="88" height="31" border="0"></A> <br> --- 69,73 ---- <h3> Email: <a href="mailto:jm...@co...">jm...@co...</a><br> ! 2002 January 02 <br> <br> <A href="http://sourceforge.net"> <IMG src="http://sourceforge.net/sflogo.php?group_id=5961&type=1" width="88" height="31" border="0"></A> <br> |