maxent-commit Mailing List for The OpenNLP Maximum Entropy Package (Page 11)

Brought to you by: gann, jasonbaldridge, joernkottmann, tsmorton

maxent-commit — Receive email whenever a new commit is made to the repository.

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (5)	Sep	Oct (14)	Nov (37)	Dec (13)
2002	Jan (14)	Feb	Mar	Apr (15)	May	Jun	Jul	Aug	Sep	Oct	Nov (3)	Dec (2)
2003	Jan (4)	Feb	Mar (1)	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (4)
2004	Jan (1)	Feb (3)	Mar	Apr	May (4)	Jun (3)	Jul (1)	Aug (6)	Sep	Oct	Nov	Dec
2005	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (17)	Nov (3)	Dec
2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (23)	Dec
2007	Jan	Feb	Mar (7)	Apr (17)	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug (3)	Sep (20)	Oct	Nov (15)	Dec (2)
2009	Jan (38)	Feb (4)	Mar (20)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun (4)	Jul	Aug (17)	Sep (26)	Oct	Nov (2)	Dec

Flat | Threaded

<< < 1 .. 9 10 11 12 13 .. 15 > >> (Page 11 of 15)

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISModel.java,1.8,1.9

From: Thomas M. <tsm...@us...> - 2002-12-11 16:18:46

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory sc8-pr-cvs1:/tmp/cvs-serv5862/java/opennlp/maxent

Modified Files:
	GISModel.java 
Log Message:
Added int getNumOutcomes method 


Index: GISModel.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** GISModel.java	20 Nov 2002 02:44:12 -0000	1.8
--- GISModel.java	11 Dec 2002 16:18:41 -0000	1.9
***************
*** 201,204 ****
--- 201,211 ----
      } 
  
+   /** Returns the number of outcomes for this model.
+    *  @return The number of outcomes.
+    **/
+   public int getNumOutcomes() {
+     return(numOutcomes);
+   }
+ 
      
      /**

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISTrainer.java,1.6,1.7

From: Thomas M. <tsm...@us...> - 2002-11-20 03:05:28

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory sc8-pr-cvs1:/tmp/cvs-serv5209/maxent

Modified Files:
	GISTrainer.java 
Log Message:
Fixed cases where parameters which only occured with a single output
weren't getting updated.  Ended up getting rid of pabi and cfvals
structures.  These have been replaced with the data for a single event,
double[] modelDistribution, and this is used to update the modifiers
for a single event and then updated for each additional event.  This
change made it easier to initialize the modleDistribution to the
uniform distribution which was necessary to fix teh above problem.
Also moved the computation of modelDistribution into it's own routine
which is name eval and is almost exactly the same as GISModel.eval w/o
doing the context string to integer mappings.

Made correction constant non-optional.  When the events all have the same
number of contexts then the model tries to make the expected value of the
correction constant nearly 0.  This is needed because while the number of
contexts may be same it is very unlikly that all context occur with all
outcomes.

Finally I made nextItteration return a double which is the
log-likelihood from the previous itteration.  At some point there isn't
enough accuracy in a double to make further iterations useful so the
routine may stop prematurly when the decrease in log-likelihood is too
small.



Index: GISTrainer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** GISTrainer.java	23 Apr 2002 16:10:07 -0000	1.6
--- GISTrainer.java	20 Nov 2002 03:05:25 -0000	1.7
***************
*** 1,3 ****
! ///////////////////////////////////////////////////////////////////////////////
  // Copyright (C) 2001 Jason Baldridge and Gann Bierner
  //
--- 1,3 ----
! /////////////////////////////////////////////////////////////////////////////
  // Copyright (C) 2001 Jason Baldridge and Gann Bierner
  //
***************
*** 49,53 ****
      
      private boolean printMessages = false;
! 
      private int numTokens;   // # of event tokens
      private int numPreds;    // # of predicates
--- 49,53 ----
      
      private boolean printMessages = false;
!   
      private int numTokens;   // # of event tokens
      private int numPreds;    // # of predicates
***************
*** 58,66 ****
  
      // a global variable for adding probabilities in an array
!     private double PABISUM; 
  
      // records the array of predicates seen in each event
      private int[][] contexts; 
  
      // records the num of times an event has been seen, paired to
      // int[][] contexts
--- 58,69 ----
  
      // a global variable for adding probabilities in an array
!     private double SUM; 
  
      // records the array of predicates seen in each event
      private int[][] contexts; 
  
+     // records the array of outcomes seen in each event
+     private int[] outcomes; 
+ 
      // records the num of times an event has been seen, paired to
      // int[][] contexts
***************
*** 91,102 ****
      private int[] predkeys; 
  
!     // a boolean to track if all events have same number of active features
!     private boolean needCorrection;
!     // initialize the GIS constant
!     private int constant = 1;
      // stores inverse of constant after it is determined
      private double constantInverse;
      // the correction parameter of the model
!     private double correctionParam = 0.0; 
      // observed expectation of correction feature
      private double cfObservedExpect;
--- 94,103 ----
      private int[] predkeys; 
  
!     // GIS constant number of feattures fired
!     private int constant;
      // stores inverse of constant after it is determined
      private double constantInverse;
      // the correction parameter of the model
!     private double correctionParam; 
      // observed expectation of correction feature
      private double cfObservedExpect;
***************
*** 105,116 ****
      private double CFMOD;
  
!     // stores the value of corrections feature for each event's predicate list,
!     // expanded to include all outcomes which might come from those predicates.
!     private TIntIntHashMap[] cfvals;
  
!     // Normalized Probabilities Of Outcomes Given Context: p(a|b_i)
!     // Stores the computation of each iterations for the update to the
!     // modifiers (and therefore the params)
!     private TIntDoubleHashMap[] pabi;
  
      // make all values in an TIntDoubleHashMap return to 0.0
--- 106,119 ----
      private double CFMOD;
  
!     private final double NEAR_ZERO = 0.01;
!     private final double LLThreshold = 0.0001;
  
!     // Stores the output of the current model on a single event durring
!     // training.  This we be reset for every event for every itteration.
!     double[] modelDistribution;
!     // Stores the number of features that get fired per event
!     int[] numfeats;
!     // initial probability for all outcomes.
!     double iprob;
  
      // make all values in an TIntDoubleHashMap return to 0.0
***************
*** 120,154 ****
              };
  
!     // divide all values in the TIntDoubleHashMap pabi[TID] by the sum of
!     // all values in the map.
!     private TDoubleFunction normalizePABI =
!         new TDoubleFunction() {
!                 public double execute(double arg) { return arg / PABISUM; }
!             };
! 
!     // add the previous iteration's parameters to the computation of the
!     // modifiers of this iteration.
!     private TIntDoubleProcedure addParamsToPABI =
!         new TIntDoubleProcedure() {
!                 public boolean execute(int oid, double arg) {
!                     pabi[TID].adjustValue(oid, arg);
!                     return true;
!                 }
!             };
! 
!     // add the correction parameter and exponentiate it
!     private TIntDoubleProcedure addCorrectionToPABIandExponentiate =
!         new TIntDoubleProcedure() {
!                 public boolean execute(int oid, double arg) {
!                     if (needCorrection)
!                         arg = arg + (correctionParam * cfvals[TID].get(oid));
!                     arg = Math.exp(arg);
!                     PABISUM += arg;
!                     pabi[TID].put(oid, arg);
!                     return true;
!                 }
!             };
! 
!     // update the modifiers based on the new pabi values
      private TIntDoubleProcedure updateModifiers =
          new TIntDoubleProcedure() {
--- 123,127 ----
              };
  
!     // update the modifiers based on the modelDistribution for this event values
      private TIntDoubleProcedure updateModifiers =
          new TIntDoubleProcedure() {
***************
*** 156,160 ****
                      modifiers[PID].put(oid,
                                         arg
!                                        + (pabi[TID].get(oid)
                                            * numTimesEventsSeen[TID]));
                      return true;
--- 129,133 ----
                      modifiers[PID].put(oid,
                                         arg
!                                        + (modelDistribution[oid]
                                            * numTimesEventsSeen[TID]));
                      return true;
***************
*** 167,185 ****
                  public boolean execute(int oid, double arg) {
                      params[PID].put(oid,
!                                     arg
!                                     + (constantInverse *
!                                        (observedExpects[PID].get(oid)
!                                         - Math.log(modifiers[PID].get(oid)))));
!                     return true;
!                 }
!             };
! 
!     // update the correction feature modifier, which will then be used to
!     // updated the correction parameter
!     private TIntDoubleProcedure updateCorrectionFeatureModifier =
!         new TIntDoubleProcedure() {
!                 public boolean execute(int oid, double arg) {
!                     CFMOD +=
! 			arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID];
                      return true;
                  }
--- 140,145 ----
                  public boolean execute(int oid, double arg) {
                      params[PID].put(oid,
!                                     arg +(observedExpects[PID].get(oid)
! 					  - Math.log(modifiers[PID].get(oid))));
                      return true;
                  }
***************
*** 250,259 ****
          display("Incorporating indexed data for training...  \n");
          contexts = di.contexts;
          numTimesEventsSeen = di.numTimesEventsSeen;
          numTokens = contexts.length;
! 
          //printTable(contexts);
  
!         needCorrection = false; 
  
          // determine the correction constant and its inverse, and check to see
--- 210,221 ----
          display("Incorporating indexed data for training...  \n");
          contexts = di.contexts;
+ 	outcomes = di.outcomeList;
          numTimesEventsSeen = di.numTimesEventsSeen;
          numTokens = contexts.length;
! 	
          //printTable(contexts);
  
! 	// a boolean to track if all events have same number of active features
!         boolean needCorrection = false; 
  
          // determine the correction constant and its inverse, and check to see
***************
*** 269,277 ****
              }
          }
  
          constantInverse = 1.0/constant;
- 	
          outcomeLabels = di.outcomeLabels;
          numOutcomes = outcomeLabels.length;
  
          predLabels = di.predLabels;
--- 231,252 ----
              }
          }
+ 	
+ 	int cfvalSum = 0;
+ 	for (TID=0; TID<numTokens; TID++)
+ 	  cfvalSum += (constant - contexts[TID].length)
+ 	    * numTimesEventsSeen[TID];
+ 	if (cfvalSum == 0) {
+ 	  cfObservedExpect = Math.log(NEAR_ZERO);//nearly zero so log is defined
+ 	}
+ 	else {
+ 	  cfObservedExpect = Math.log(cfvalSum);
+ 	}
+ 	
+ 	display("done.\n");
  
          constantInverse = 1.0/constant;
          outcomeLabels = di.outcomeLabels;
          numOutcomes = outcomeLabels.length;
+ 	iprob = Math.log(1.0/numOutcomes);
  
          predLabels = di.predLabels;
***************
*** 296,300 ****
  	// the data.  The default is to assume that we observed "1/10th" of a
  	// feature during training.
! 	final double smoothingObservation = Math.log(_smoothingObservation);
  
          // Get the observed expectations of the features. Strictly speaking,
--- 271,276 ----
  	// the data.  The default is to assume that we observed "1/10th" of a
  	// feature during training.
! 	final double smoothingObservation = _smoothingObservation;
! 	final double logSmoothingObservation = Math.log(_smoothingObservation);
  
          // Get the observed expectations of the features. Strictly speaking,
***************
*** 338,404 ****
              observedExpects[PID].compact();
          }
! 
          predCount = null; // don't need it anymore
  	
          display("...done.\n");
  
!         pabi = new TIntDoubleHashMap[numTokens];
! 
!         if (needCorrection) {
!             // initialize both the pabi table and the cfvals matrix
!             display("Computing correction feature matrix... ");
! 	
!             cfvals = new TIntIntHashMap[numTokens];
!             for (TID=0; TID<numTokens; TID++) {
!                 cfvals[TID] = new TIntIntHashMap(initialCapacity, loadFactor);
!                 pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
!                 for (int j=0; j<contexts[TID].length; j++) {
!                     PID = contexts[TID][j];
!                     predkeys = params[PID].keys();
!                     for (int i=0; i<predkeys.length; i++) {
!                         OID = predkeys[i];
!                         if (!cfvals[TID].increment(OID)) {
!                             cfvals[TID].put(OID, 1);
!                             pabi[TID].put(OID, 0.0);
!                         }
!                     }
!                 }
!                 cfvals[TID].compact();
!                 pabi[TID].compact();
!             }
! 	
!             for (TID=0; TID<numTokens; TID++) {
!                 predkeys = cfvals[TID].keys();
!                 for (int i=0; i<predkeys.length; i++) {
!                     OID = predkeys[i];
!                     cfvals[TID].put(OID, constant - cfvals[TID].get(OID));
!                 }
!             }
! 
!             // compute observed expectation of correction feature (E_p~ f_l)
!             int cfvalSum = 0;
!             for (TID=0; TID<numTokens; TID++)
!                 cfvalSum += (constant - contexts[TID].length)
! 		            * numTimesEventsSeen[TID];
! 	    
!             cfObservedExpect = Math.log(cfvalSum);
! 	    
!             display("done.\n");
! 
!         }
!         else {
!             // initialize just the pabi table
!             pabi = new TIntDoubleHashMap[numTokens];
!             for (TID=0; TID<numTokens; TID++) {
!                 pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
!                 for (int j=0; j<contexts[TID].length; j++) {
!                     PID = contexts[TID][j];
!                     predkeys = params[PID].keys();
!                     for (int i=0; i<predkeys.length; i++)
!                         pabi[TID].put(predkeys[i], 0.0);
!                 }
!                 pabi[TID].compact();
!             }
!         }
  
          /***************** Find the parameters ************************/
--- 314,324 ----
              observedExpects[PID].compact();
          }
! 	correctionParam = 0.0;
          predCount = null; // don't need it anymore
  	
          display("...done.\n");
  
! 	modelDistribution = new double[numOutcomes];
! 	numfeats = new int[numOutcomes];
  
          /***************** Find the parameters ************************/
***************
*** 418,421 ****
--- 338,343 ----
      /* Estimate and return the model parameters. */
      private void findParameters(int iterations) {
+       double prevLL = 0.0;
+       double currLL = 0.0;
          display("Performing " + iterations + " iterations.\n");
          for (int i=1; i<=iterations; i++) {
***************
*** 423,434 ****
              else if (i<100) display(" " + i + ":  ");
              else display(i + ":  ");
!             nextIteration();
          }
  
          // kill a bunch of these big objects now that we don't need them
          observedExpects = null;
-         pabi = null;
          modifiers = null;
-         cfvals = null;
          numTimesEventsSeen = null;
          contexts = null;
--- 345,364 ----
              else if (i<100) display(" " + i + ":  ");
              else display(i + ":  ");
!             currLL=nextIteration();
! 	    if (i > 1) {
! 	      if (prevLL > currLL) {
! 		System.err.println("Model Diverging: loglikelihood decreased");
! 		break;
! 	      }
! 	      if (currLL-prevLL < LLThreshold) {
! 		break;
! 	      }
! 	    }
! 	    prevLL=currLL;
          }
  
          // kill a bunch of these big objects now that we don't need them
          observedExpects = null;
          modifiers = null;
          numTimesEventsSeen = null;
          contexts = null;
***************
*** 436,468 ****
  
  
!     /* Compute one iteration of GIS */
!     private void nextIteration() {
! 
!         // compute table probabilities of outcomes given contexts 
!         CFMOD = 0.0;
!         for (TID=0; TID<numTokens; TID++) {
!             pabi[TID].transformValues(backToZeros);
! 
!             for (int j=0; j<contexts[TID].length; j++)
!                 params[contexts[TID][j]].forEachEntry(addParamsToPABI);
  
!             PABISUM = 0.0; // PABISUM is computed in the next line's procedure
!             pabi[TID].forEachEntry(addCorrectionToPABIandExponentiate);
!             if (PABISUM > 0.0) pabi[TID].transformValues(normalizePABI);
  
!             if (needCorrection)
!                 pabi[TID].forEachEntry(updateCorrectionFeatureModifier);
!         }
!         display(".");
  
          // compute contribution of p(a|b_i) for each feature and the new
          // correction parameter
          for (TID=0; TID<numTokens; TID++) {
!             for (int j=0; j<contexts[TID].length; j++) {
!                 // do not remove the next line since we need to know PID
!                 // globally for the updateModifiers procedure used after it
!                 PID = contexts[TID][j]; 
!                 modifiers[PID].forEachEntry(updateModifiers);
!             }
          }
          display(".");
--- 366,433 ----
  
  
!     /**
!      * Use this model to evaluate a context and return an array of the
!      * likelihood of each outcome given that context.
!      *
!      * @param context The integers of the predicates which have been
!      *                observed at the present decision point.
!      * @return        The normalized probabilities for the outcomes given the
!      *                context. The indexes of the double[] are the outcome
!      *                ids, and the actual string representation of the
!      *                outcomes can be obtained from the method
!      *  	      getOutcome(int i).
!      */
!     public void eval(int[] context, double[] outsums) {
!       for (int oid=0; oid<numOutcomes; oid++) {
! 	outsums[oid] = iprob;
! 	numfeats[oid] = 0;
!       }
!       int[] activeOutcomes;
!       for (int i=0; i<context.length; i++) {
! 	TIntDoubleHashMap predParams = params[context[i]];
! 	activeOutcomes = predParams.keys();
! 	for (int j=0; j<activeOutcomes.length; j++) {
! 	  int oid = activeOutcomes[j];
! 	  numfeats[oid]++;
! 	  outsums[oid] += constantInverse * predParams.get(oid);
! 	}
!       }
  
!       double SUM = 0.0;
!       for (int oid=0; oid<numOutcomes; oid++) {
! 	outsums[oid] = Math.exp(outsums[oid]
! 				+ ((1.0 -
! 				    (numfeats[oid]/constant))
! 				    * correctionParam));
! 	SUM += outsums[oid];
!       }
  
!       for (int oid=0; oid<numOutcomes; oid++)
! 	outsums[oid] /= SUM;
!       
!     }
!     
  
+     /* Compute one iteration of GIS and retutn log-likelihood.*/
+     private double nextIteration() {
          // compute contribution of p(a|b_i) for each feature and the new
          // correction parameter
+         double loglikelihood = 0.0; 
+         CFMOD=0.0;
          for (TID=0; TID<numTokens; TID++) {
! 	  // modeldistribution and PID are globals used in 
! 	  // the updateModifiers procedure.  They need to be set.
! 	  eval(contexts[TID],modelDistribution);
! 	  for (int j=0; j<contexts[TID].length; j++) {
! 	    PID = contexts[TID][j]; 
! 	    modifiers[PID].forEachEntry(updateModifiers);
! 	    for (OID=0;OID<numOutcomes;OID++) {
! 	      if (!modifiers[PID].containsKey(OID)) {
! 		CFMOD+=modelDistribution[OID]*numTimesEventsSeen[TID];
! 	      }
! 	    }
! 	    loglikelihood+=Math.log(modelDistribution[outcomes[TID]]);
! 	  }
! 	  CFMOD+=constant-contexts[TID].length;
          }
          display(".");
***************
*** 473,483 ****
              modifiers[PID].transformValues(backToZeros); // re-initialize to 0.0's
          }
- 
          if (CFMOD > 0.0) 
!             correctionParam +=
!                 constantInverse * (cfObservedExpect - Math.log(CFMOD));
  
!         display(".\n");
! 	
      }    
  
--- 438,446 ----
              modifiers[PID].transformValues(backToZeros); // re-initialize to 0.0's
          }
          if (CFMOD > 0.0) 
!             correctionParam +=(cfObservedExpect - Math.log(CFMOD));
  
!         display(". loglikelihood="+loglikelihood+"\n");
! 	return(loglikelihood);
      }

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISModel.java,1.7,1.8

From: Thomas M. <tsm...@us...> - 2002-11-20 02:44:15

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory sc8-pr-cvs1:/tmp/cvs-serv30495/maxent

Modified Files:
	GISModel.java 
Log Message:
Added eval method so distribution could be passed in rathar then allocated 
durring each call.  Left old interface in place but modified it to use the 
new eval method.  Also made numfeats a class level variable.


Index: GISModel.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** GISModel.java	19 Apr 2002 09:29:24 -0000	1.7
--- GISModel.java	20 Nov 2002 02:44:12 -0000	1.8
***************
*** 38,41 ****
--- 38,43 ----
      private final double iprob;
      private final double fval;
+ 
+     private int[] numfeats;
      
      public GISModel (TIntDoubleHashMap[] _params,
***************
*** 57,65 ****
          iprob = Math.log(1.0/numOutcomes);
          fval = 1.0/correctionConstant;
! 	
      }
-     
  
-     
      /**
       * Use this model to evaluate a context and return an array of the
--- 59,65 ----
          iprob = Math.log(1.0/numOutcomes);
          fval = 1.0/correctionConstant;
!         numfeats = new int[numOutcomes];
      }
  
      /**
       * Use this model to evaluate a context and return an array of the
***************
*** 75,98 ****
       */
      public final double[] eval(String[] context) {
!         double[] outsums = new double[numOutcomes];
!         int[] numfeats = new int[numOutcomes];
! 
!         for (int oid=0; oid<numOutcomes; oid++) {
              outsums[oid] = iprob;
              numfeats[oid] = 0;
          }
- 
-         int[] activeOutcomes;
          for (int i=0; i<context.length; i++) {
!             if (pmap.containsKey(context[i])) {
!                 TIntDoubleHashMap predParams =
!                     params[pmap.get(context[i])];
!                 activeOutcomes = predParams.keys();
!                 for (int j=0; j<activeOutcomes.length; j++) {
!                     int oid = activeOutcomes[j];
!                     numfeats[oid]++;
!                     outsums[oid] += fval * predParams.get(oid);
!                 }
!             }
          }
  
--- 75,111 ----
       */
      public final double[] eval(String[] context) {
!       return(eval(context,new double[numOutcomes]));
!     }
!     
!     /**
!      * Use this model to evaluate a context and return an array of the
!      * likelihood of each outcome given that context.
!      *
!      * @param context The names of the predicates which have been observed at
!      *                the present decision point.
!      * @param outsums This is where the distribution is stored.
!      * @return        The normalized probabilities for the outcomes given the
!      *                context. The indexes of the double[] are the outcome
!      *                ids, and the actual string representation of the
!      *                outcomes can be obtained from the method
!      *  	      getOutcome(int i).
!      */
!     public final double[] eval(String[] context, double[] outsums) {
! 	int[] activeOutcomes;
! 	for (int oid=0; oid<numOutcomes; oid++) {
              outsums[oid] = iprob;
              numfeats[oid] = 0;
          }
          for (int i=0; i<context.length; i++) {
! 	  if (pmap.containsKey(context[i])) {
! 	    TIntDoubleHashMap predParams =
! 	      params[pmap.get(context[i])];
! 	    activeOutcomes = predParams.keys();
! 	    for (int j=0; j<activeOutcomes.length; j++) {
! 	      int oid = activeOutcomes[j];
! 	      numfeats[oid]++;
! 	      outsums[oid] += fval * predParams.get(oid);
! 	    }
! 	  }
          }

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent DataIndexer.java,1.9,1.10

From: Thomas M. <tsm...@us...> - 2002-11-20 02:41:33

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory sc8-pr-cvs1:/tmp/cvs-serv30038/maxent

Modified Files:
	DataIndexer.java 
Log Message:
Fixed bug where singleton events are dropped.


Index: DataIndexer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** DataIndexer.java	19 Apr 2002 09:59:53 -0000	1.9
--- DataIndexer.java	20 Nov 2002 02:41:30 -0000	1.10
***************
*** 66,70 ****
          System.out.print("\tComputing event counts...  ");
          events = computeEventCounts(eventStream,predicateIndex,cutoff);
!         System.out.println("done.");
  
          System.out.print("\tIndexing...  ");
--- 66,70 ----
          System.out.print("\tComputing event counts...  ");
          events = computeEventCounts(eventStream,predicateIndex,cutoff);
!         System.out.println("done. "+events.size()+" events");
  
          System.out.print("\tIndexing...  ");
***************
*** 157,167 ****
                  if (! predicatesInOut.containsKey(ec[j])) {
  		    if (counter.increment(ec[j])) {
- 			if (counter.get(ec[j]) >= cutoff) {
- 			    predicatesInOut.put(ec[j], predicateIndex++);
- 			    counter.remove(ec[j]);
- 			}
  		    } else {
                          counter.put(ec[j], 1);
                      }
                  }
              }
--- 157,167 ----
                  if (! predicatesInOut.containsKey(ec[j])) {
  		    if (counter.increment(ec[j])) {
  		    } else {
                          counter.put(ec[j], 1);
                      }
+ 		    if (counter.get(ec[j]) >= cutoff) {
+ 		      predicatesInOut.put(ec[j], predicateIndex++);
+ 		      counter.remove(ec[j]);
+ 		    }
                  }
              }
***************
*** 208,211 ****
--- 208,214 ----
                  eventsToCompare.add(ce);
              }
+ 	    else {
+ 	      System.err.println("Dropped event "+ev.getOutcome()+":"+Arrays.asList(ev.getContext()));
+ 	    }
              // recycle the TIntArrayList
              indexedContext.resetQuick();

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent BasicContextGenerator.java,1.1,1.2

From: Jason B. <jas...@us...> - 2002-04-30 08:48:39

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv19942/src/java/opennlp/maxent

Modified Files:
	BasicContextGenerator.java 
Log Message:
Fixed bug: BasicContextGenerator was retaining whitespace.

Index: BasicContextGenerator.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/BasicContextGenerator.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** BasicContextGenerator.java	20 Nov 2001 17:05:37 -0000	1.1
--- BasicContextGenerator.java	30 Apr 2002 08:48:35 -0000	1.2
***************
*** 39,51 ****
      public String[] getContext(Object o) {
  	String s = (String)o;
! 	int prevIndex = 0;
  	int index = s.indexOf(' ');
  	List cuts = new ArrayList();
  	while (index != -1) {
! 	    cuts.add(s.substring(prevIndex, index));
  	    prevIndex = index;
  	    index = s.indexOf(' ', ++index);
  	}
! 	cuts.add(s.substring(prevIndex, s.length()));
  	return (String[])cuts.toArray(new String[cuts.size()]);
      }
--- 39,51 ----
      public String[] getContext(Object o) {
  	String s = (String)o;
! 	int prevIndex = -1;
  	int index = s.indexOf(' ');
  	List cuts = new ArrayList();
  	while (index != -1) {
! 	    cuts.add(s.substring(prevIndex+1, index));
  	    prevIndex = index;
  	    index = s.indexOf(' ', ++index);
  	}
! 	cuts.add(s.substring(prevIndex+1, s.length()));
  	return (String[])cuts.toArray(new String[cuts.size()]);
      }

[Maxent-commit] CVS: maxent CHANGES,1.14,1.15

From: Jason B. <jas...@us...> - 2002-04-30 08:48:39

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv19942

Modified Files:
	CHANGES 
Log Message:
Fixed bug: BasicContextGenerator was retaining whitespace.

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** CHANGES	25 Apr 2002 15:01:07 -0000	1.14
--- CHANGES	30 Apr 2002 08:48:35 -0000	1.15
***************
*** 1,4 ****
--- 1,6 ----
  1.2.10
  ------
+ Fixed minor bug (found by Arno Erpenbeck) in BasicContextGenerator: it
+ was retaining whitespace in the contextual predicates. (Jason)
  
  Added error message to TrainEval's eval() method. (Jason)

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent TrainEval.java,1.2,1.3

From: Jason B. <jas...@us...> - 2002-04-25 15:08:06

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv23879/src/java/opennlp/maxent

Modified Files:
	TrainEval.java 
Log Message:
Added usage message to TrainEval's eval() method.

Index: TrainEval.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/TrainEval.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** TrainEval.java	14 Nov 2001 17:39:56 -0000	1.2
--- TrainEval.java	25 Apr 2002 15:01:07 -0000	1.3
***************
*** 96,115 ****
  	    }
  	}
  	
! 	FileReader datafr = new FileReader(args[g.getOptind()]);
! 	
! 	if(train) {
  	    MaxentModel m =
  		train(new EventCollectorAsStream(e.getEventCollector(datafr)),
  		      cutoff);
! 	    new BinaryGISModelWriter((GISModel)m, new File(dir+stem)).persist();
  	}
  	else {
  	    MaxentModel model =
! 		new BinaryGISModelReader(new File(dir+stem)).getModel();
! 	    if(local)
  		e.localEval(model, datafr, e, verbose);
! 	    else
  		eval(model, datafr, e, verbose);
  	}
      }
--- 96,139 ----
  	    }
  	}
+ 
+ 	int lastIndex = g.getOptind();
+ 	if (lastIndex >= args.length) {
+ 	    System.out.println("This is a usage message from opennlp.maxent.TrainEval. You have called the training procedure for a maxent application with the incorrect arguments.  These are the options:");
+ 
+ 	    System.out.println("\nOptions for defining the model location and name:");
+ 	    System.out.println(" -d <directoryName>");
+ 	    System.out.println("\tThe directory in which to store the model.");
+ 	    System.out.println(" -s <modelName>");
+ 	    System.out.println("\tThe name of the model, e.g. EnglishPOS.bin.gz or NameFinder.txt.");
+ 	    
+ 	    System.out.println("\nOptions for training:");
+ 	    System.out.println(" -c <cutoff>");
+ 	    System.out.println("\tAn integer cutoff level to reduce infrequent contextual predicates.");
+ 	    System.out.println(" -t\tTrain a model. If absent, the given model will be loaded and evaluated.");
+ 	    System.out.println("\nOptions for evaluation:");
+ 	    System.out.println(" -l\t the evaluation method of class that uses the model. If absent, TrainEval's eval method is used.");
+ 	    System.out.println(" -v\t verbose.");
+ 	    System.out.println("\nThe final argument is the data file to be loaded and used for either training or evaluation.");
+ 	    System.out.println("\nAs an example for training:\n java opennlp.grok.preprocess.postag.POSTaggerME -t -d ./ -s EnglishPOS.bin.gz -c 7 postag.data");
+ 	    System.exit(0);
+ 	}
+ 
+ 	FileReader datafr = new FileReader(args[lastIndex]);
  	
! 	if (train) {
  	    MaxentModel m =
  		train(new EventCollectorAsStream(e.getEventCollector(datafr)),
  		      cutoff);
! 	    new SuffixSensitiveGISModelWriter((GISModel)m,
! 					      new File(dir+stem)).persist();
  	}
  	else {
  	    MaxentModel model =
! 		new SuffixSensitiveGISModelReader(new File(dir+stem)).getModel();
! 	    if (local) {
  		e.localEval(model, datafr, e, verbose);
! 	    } else {
  		eval(model, datafr, e, verbose);
+ 	    }
  	}
      }

[Maxent-commit] CVS: maxent CHANGES,1.13,1.14

From: Jason B. <jas...@us...> - 2002-04-25 15:08:05

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv23879

Modified Files:
	CHANGES 
Log Message:
Added usage message to TrainEval's eval() method.

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** CHANGES	19 Apr 2002 12:34:02 -0000	1.13
--- CHANGES	25 Apr 2002 15:01:07 -0000	1.14
***************
*** 2,5 ****
--- 2,7 ----
  ------
  
+ Added error message to TrainEval's eval() method. (Jason)
+ 
  
  1.2.9 (Bug fix release)

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISTrainer.java,1.5,1.6

From: Jason B. <jas...@us...> - 2002-04-23 16:10:12

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv15793

Modified Files:
	GISTrainer.java 
Log Message:


Index: GISTrainer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** GISTrainer.java	9 Apr 2002 09:15:10 -0000	1.5
--- GISTrainer.java	23 Apr 2002 16:10:07 -0000	1.6
***************
*** 236,240 ****
       *                    will be trained.
       * @param iterations  The number of GIS iterations to perform.
!      * @param cutoff      The number of times a feature must be seen in order
       *                    to be relevant for training.
       * @return The newly trained model, which can be used immediately or saved
--- 236,240 ----
       *                    will be trained.
       * @param iterations  The number of GIS iterations to perform.
!      * @param cutoff      The number of times a predicate must be seen in order
       *                    to be relevant for training.
       * @return The newly trained model, which can be used immediately or saved

[Maxent-commit] CVS: maxent CHANGES,1.12,1.13 build.xml,1.17,1.18

From: Jason B. <jas...@us...> - 2002-04-19 12:34:09

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv1852

Modified Files:
	CHANGES build.xml 
Log Message:
Updated version number after release.

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** CHANGES	9 Apr 2002 09:46:59 -0000	1.12
--- CHANGES	19 Apr 2002 12:34:02 -0000	1.13
***************
*** 1,5 ****
! 1.2.9
  _____
  
  
  
--- 1,15 ----
! 1.2.10
! ------
! 
! 
! 1.2.9 (Bug fix release)
  _____
  
+ Modified the cutoff loop in DataIndexer to use the increment() method
+ of TObjectIntHashMap. (Jason)
+ 
+ Fixed a bug (found by Chieu Hai Leong) in which the correctionConstant
+ of GISModel was an int that was used in division.  Now,
+ correctionConstant is a double. (Jason)
  
  

Index: build.xml
===================================================================
RCS file: /cvsroot/maxent/maxent/build.xml,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** build.xml	9 Apr 2002 09:46:59 -0000	1.17
--- build.xml	19 Apr 2002 12:34:02 -0000	1.18
***************
*** 10,14 ****
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.9"/>
      <property name="year" value="2002"/>
  
--- 10,14 ----
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.10"/>
      <property name="year" value="2002"/>

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent DataIndexer.java,1.8,1.9

From: Jason B. <jas...@us...> - 2002-04-19 09:59:56

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv25071/src/java/opennlp/maxent

Modified Files:
	DataIndexer.java 
Log Message:
Modified the cutoff loop to use the increment() method of TObjectIntHashMap.

Index: DataIndexer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** DataIndexer.java	3 Jan 2002 16:43:23 -0000	1.8
--- DataIndexer.java	19 Apr 2002 09:59:53 -0000	1.9
***************
*** 156,165 ****
              for (int j=0; j<ec.length; j++) {
                  if (! predicatesInOut.containsKey(ec[j])) {
!                     int count = counter.get(ec[j]) + 1;
!                     if (count >= cutoff) {
!                         predicatesInOut.put(ec[j], predicateIndex++);
!                         counter.remove(ec[j]);
!                     } else {
!                         counter.put(ec[j], count);
                      }
                  }
--- 156,166 ----
              for (int j=0; j<ec.length; j++) {
                  if (! predicatesInOut.containsKey(ec[j])) {
! 		    if (counter.increment(ec[j])) {
! 			if (counter.get(ec[j]) >= cutoff) {
! 			    predicatesInOut.put(ec[j], predicateIndex++);
! 			    counter.remove(ec[j]);
! 			}
! 		    } else {
!                         counter.put(ec[j], 1);
                      }
                  }

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISModel.java,1.6,1.7

From: Jason B. <jas...@us...> - 2002-04-19 09:29:28

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv14897/src/java/opennlp/maxent

Modified Files:
	GISModel.java 
Log Message:
Fixed bug: correctionConstant is now a double rather than an int.

Index: GISModel.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISModel.java,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** GISModel.java	27 Dec 2001 19:20:26 -0000	1.6
--- GISModel.java	19 Apr 2002 09:29:24 -0000	1.7
***************
*** 32,36 ****
      private final TObjectIntHashMap pmap;
      private final String[] ocNames;
!     private final int correctionConstant;
      private final double correctionParam;
  
--- 32,36 ----
      private final TObjectIntHashMap pmap;
      private final String[] ocNames;
!     private final double correctionConstant;
      private final double correctionParam;
  
***************
*** 51,55 ****
          params = _params;
          ocNames =  _ocNames;
!         correctionConstant = _correctionConstant;
          correctionParam = _correctionParam;
  	
--- 51,55 ----
          params = _params;
          ocNames =  _ocNames;
!         correctionConstant = (double)_correctionConstant;
          correctionParam = _correctionParam;
  	
***************
*** 214,218 ****
          data[1] = pmap;
          data[2] = ocNames;
!         data[3] = new Integer(correctionConstant);
          data[4] = new Double(correctionParam);
          return data;
--- 214,218 ----
          data[1] = pmap;
          data[2] = ocNames;
!         data[3] = new Integer((int)correctionConstant);
          data[4] = new Double(correctionParam);
          return data;

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISTrainer.java,1.4,1.5

From: Jason B. <jas...@us...> - 2002-04-09 11:13:34

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv27381/src/java/opennlp/maxent

Modified Files:
	GISTrainer.java 
Log Message:
Made use of new increment() and adjustValue() methods available for Trove hashmaps.

Index: GISTrainer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** GISTrainer.java	8 Apr 2002 16:14:06 -0000	1.4
--- GISTrainer.java	9 Apr 2002 09:15:10 -0000	1.5
***************
*** 132,136 ****
          new TIntDoubleProcedure() {
                  public boolean execute(int oid, double arg) {
!                     pabi[TID].put(oid, pabi[TID].get(oid) + arg);
                      return true;
                  }
--- 132,136 ----
          new TIntDoubleProcedure() {
                  public boolean execute(int oid, double arg) {
!                     pabi[TID].adjustValue(oid, arg);
                      return true;
                  }
***************
*** 180,184 ****
          new TIntDoubleProcedure() {
                  public boolean execute(int oid, double arg) {
!                     CFMOD +=  arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID];
                      return true;
                  }
--- 180,185 ----
          new TIntDoubleProcedure() {
                  public boolean execute(int oid, double arg) {
!                     CFMOD +=
! 			arg * cfvals[TID].get(oid) * numTimesEventsSeen[TID];
                      return true;
                  }
***************
*** 357,363 ****
                      for (int i=0; i<predkeys.length; i++) {
                          OID = predkeys[i];
!                         if (cfvals[TID].containsKey(OID)) {
!                             cfvals[TID].put(OID, cfvals[TID].get(OID) + 1);
!                         } else {
                              cfvals[TID].put(OID, 1);
                              pabi[TID].put(OID, 0.0);
--- 358,362 ----
                      for (int i=0; i<predkeys.length; i++) {
                          OID = predkeys[i];
!                         if (!cfvals[TID].increment(OID)) {
                              cfvals[TID].put(OID, 1);
                              pabi[TID].put(OID, 0.0);

[Maxent-commit] CVS: maxent CHANGES,1.10,1.11

From: Jason B. <jas...@us...> - 2002-04-09 11:08:24

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv27381

Modified Files:
	CHANGES 
Log Message:
Made use of new increment() and adjustValue() methods available for Trove hashmaps.

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** CHANGES	8 Apr 2002 16:14:06 -0000	1.10
--- CHANGES	9 Apr 2002 09:15:08 -0000	1.11
***************
*** 2,5 ****
--- 2,8 ----
  _____
  
+ Modified GISTrainer to use the new increment() and adjustValue()
+ methods available in Trove 0.1.4 hashmaps. (Jason)
+ 
  Set up the GISTrainer to use an initial capacity and load factor for
  the big hashmaps it uses.  The initial capacity is half the number of

[Maxent-commit] CVS: maxent/lib LIBNOTES,1.9,1.10 trove.jar,1.11,1.12

From: Jason B. <jas...@us...> - 2002-04-09 11:08:12

Update of /cvsroot/maxent/maxent/lib
In directory usw-pr-cvs1:/tmp/cvs-serv27381/lib

Modified Files:
	LIBNOTES trove.jar 
Log Message:
Made use of new increment() and adjustValue() methods available for Trove hashmaps.

Index: LIBNOTES
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** LIBNOTES	8 Apr 2002 16:13:27 -0000	1.9
--- LIBNOTES	9 Apr 2002 09:15:08 -0000	1.10
***************
*** 29,33 ****
  trove.jar
  
! GNU Trove, version 0.1.3
  Homepage: http://trove4j.sf.net
  License: LGPL
--- 29,33 ----
  trove.jar
  
! GNU Trove, version 0.1.4
  Homepage: http://trove4j.sf.net
  License: LGPL

Index: trove.jar
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
Binary files /tmp/cvsetPal3 and /tmp/cvscfpul1 differ

[Maxent-commit] CVS: maxent CHANGES,1.11,1.12 build.xml,1.16,1.17

From: Jason B. <jas...@us...> - 2002-04-09 09:47:05

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv4172

Modified Files:
	CHANGES build.xml 
Log Message:
Upgraded the version to 1.2.9

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** CHANGES	9 Apr 2002 09:15:08 -0000	1.11
--- CHANGES	9 Apr 2002 09:46:59 -0000	1.12
***************
*** 1,3 ****
! 1.2.7
  _____
  
--- 1,8 ----
! 1.2.9
! _____
! 
! 
! 
! 1.2.8
  _____
  

Index: build.xml
===================================================================
RCS file: /cvsroot/maxent/maxent/build.xml,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** build.xml	14 Jan 2002 14:58:14 -0000	1.16
--- build.xml	9 Apr 2002 09:46:59 -0000	1.17
***************
*** 10,14 ****
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.7"/>
      <property name="year" value="2002"/>
  
--- 10,14 ----
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.9"/>
      <property name="year" value="2002"/>
  
***************
*** 131,136 ****
      <tar tarfile="${name}-${version}-src.tar"
           basedir="../"
! 	 includes="${Name}/**" >
!       <exclude name="${Name}/docs/api/**"/>
        <exclude name="**/CVS"/>
      </tar>
--- 131,136 ----
      <tar tarfile="${name}-${version}-src.tar"
           basedir="../"
! 	 includes="${name}/**" >
!       <exclude name="${name}/docs/api/**"/>
        <exclude name="**/CVS"/>
      </tar>

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent GISTrainer.java,1.3,1.4

From: Jason B. <jas...@us...> - 2002-04-08 16:14:13

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv7102/src/java/opennlp/maxent

Modified Files:
	GISTrainer.java 
Log Message:
Set up the GISTrainer to use an initial capacity and load factor for the big hashmaps it uses

Index: GISTrainer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/GISTrainer.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** GISTrainer.java	27 Dec 2001 19:20:26 -0000	1.3
--- GISTrainer.java	8 Apr 2002 16:14:06 -0000	1.4
***************
*** 306,313 ****
          observedExpects = new TIntDoubleHashMap[numPreds];
  
  	for (PID=0; PID<numPreds; PID++) {
! 	    params[PID] = new TIntDoubleHashMap();
!             modifiers[PID] = new TIntDoubleHashMap();
!             observedExpects[PID] = new TIntDoubleHashMap();
              for (OID=0; OID<numOutcomes; OID++) {
                  if (predCount[PID][OID] > 0) {
--- 306,324 ----
          observedExpects = new TIntDoubleHashMap[numPreds];
  
+ 	int initialCapacity;
+ 	float loadFactor = (float)0.9;
+ 	if (numOutcomes < 3) {
+ 	    initialCapacity = 2;
+ 	    loadFactor = (float)1.0;
+ 	} else if (numOutcomes < 5) {
+ 	    initialCapacity = 2;
+ 	} else {
+ 	    initialCapacity = (int)numOutcomes/2;
+ 	}
  	for (PID=0; PID<numPreds; PID++) {
! 	    params[PID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
!             modifiers[PID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
!             observedExpects[PID] =
! 		new TIntDoubleHashMap(initialCapacity, loadFactor);
              for (OID=0; OID<numOutcomes; OID++) {
                  if (predCount[PID][OID] > 0) {
***************
*** 339,344 ****
              cfvals = new TIntIntHashMap[numTokens];
              for (TID=0; TID<numTokens; TID++) {
!                 cfvals[TID] = new TIntIntHashMap();
!                 pabi[TID] = new TIntDoubleHashMap();
                  for (int j=0; j<contexts[TID].length; j++) {
                      PID = contexts[TID][j];
--- 350,355 ----
              cfvals = new TIntIntHashMap[numTokens];
              for (TID=0; TID<numTokens; TID++) {
!                 cfvals[TID] = new TIntIntHashMap(initialCapacity, loadFactor);
!                 pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
                  for (int j=0; j<contexts[TID].length; j++) {
                      PID = contexts[TID][j];
***************
*** 381,385 ****
              pabi = new TIntDoubleHashMap[numTokens];
              for (TID=0; TID<numTokens; TID++) {
!                 pabi[TID] = new TIntDoubleHashMap();
                  for (int j=0; j<contexts[TID].length; j++) {
                      PID = contexts[TID][j];
--- 392,396 ----
              pabi = new TIntDoubleHashMap[numTokens];
              for (TID=0; TID<numTokens; TID++) {
!                 pabi[TID] = new TIntDoubleHashMap(initialCapacity, loadFactor);
                  for (int j=0; j<contexts[TID].length; j++) {
                      PID = contexts[TID][j];

[Maxent-commit] CVS: maxent CHANGES,1.9,1.10

From: Jason B. <jas...@us...> - 2002-04-08 16:14:12

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv7102

Modified Files:
	CHANGES 
Log Message:
Set up the GISTrainer to use an initial capacity and load factor for the big hashmaps it uses

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** CHANGES	3 Jan 2002 16:14:41 -0000	1.9
--- CHANGES	8 Apr 2002 16:14:06 -0000	1.10
***************
*** 2,5 ****
--- 2,8 ----
  _____
  
+ Set up the GISTrainer to use an initial capacity and load factor for
+ the big hashmaps it uses.  The initial capacity is half the number of
+ outcomes, and the load factor is 0.9. (Jason)
  
  (opennlp.maxent.DataIndexer)  Do not index events with 0 active features.

[Maxent-commit] CVS: maxent/lib LIBNOTES,1.8,1.9 trove.jar,1.10,1.11

From: Jason B. <jas...@us...> - 2002-04-08 16:13:33

Update of /cvsroot/maxent/maxent/lib
In directory usw-pr-cvs1:/tmp/cvs-serv6747

Modified Files:
	LIBNOTES trove.jar 
Log Message:


Index: LIBNOTES
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** LIBNOTES	14 Jan 2002 14:58:15 -0000	1.8
--- LIBNOTES	8 Apr 2002 16:13:27 -0000	1.9
***************
*** 29,33 ****
  trove.jar
  
! GNU Trove, version 0.1.2
  Homepage: http://trove4j.sf.net
  License: LGPL
--- 29,33 ----
  trove.jar
  
! GNU Trove, version 0.1.3
  Homepage: http://trove4j.sf.net
  License: LGPL

Index: trove.jar
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
Binary files /tmp/cvsaat9na and /tmp/cvsA5a8Ob differ

[Maxent-commit] CVS: maxent/lib trove.jar,1.9,1.10

From: Jason B. <jas...@us...> - 2002-01-20 15:09:29

Update of /cvsroot/maxent/maxent/lib
In directory usw-pr-cvs1:/tmp/cvs-serv12828/lib

Modified Files:
	trove.jar 
Log Message:
Updated to v0.1.2 of trove.

Index: trove.jar
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
Binary files /tmp/cvsh5f0uO and /tmp/cvsGbBVau differ

[Maxent-commit] CVS: maxent/lib LIBNOTES,1.7,1.8 trove.jar,1.8,1.9

From: Jason B. <jas...@us...> - 2002-01-14 14:58:19

Update of /cvsroot/maxent/maxent/lib
In directory usw-pr-cvs1:/tmp/cvs-serv25510/lib

Modified Files:
	LIBNOTES trove.jar 
Log Message:
Upgraded to trove v0.1.2 and moved maxent to devel v1.2.7.

Index: LIBNOTES
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/LIBNOTES,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** LIBNOTES	2002/01/02 20:00:39	1.7
--- LIBNOTES	2002/01/14 14:58:15	1.8
***************
*** 29,33 ****
  trove.jar
  
! GNU Trove, version 0.1.1
  Homepage: http://trove4j.sf.net
  License: LGPL
--- 29,33 ----
  trove.jar
  
! GNU Trove, version 0.1.2
  Homepage: http://trove4j.sf.net
  License: LGPL

Index: trove.jar
===================================================================
RCS file: /cvsroot/maxent/maxent/lib/trove.jar,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
Binary files /tmp/cvsVkUnLd and /tmp/cvsIOr0ui differ

[Maxent-commit] CVS: maxent build.xml,1.15,1.16

From: Jason B. <jas...@us...> - 2002-01-14 14:58:18

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv25510

Modified Files:
	build.xml 
Log Message:
Upgraded to trove v0.1.2 and moved maxent to devel v1.2.7.

Index: build.xml
===================================================================
RCS file: /cvsroot/maxent/maxent/build.xml,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** build.xml	2002/01/03 16:14:41	1.15
--- build.xml	2002/01/14 14:58:14	1.16
***************
*** 10,14 ****
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.6"/>
      <property name="year" value="2002"/>
  
--- 10,14 ----
      <property name="Name" value="Maxent"/>
      <property name="name" value="maxent"/>
!     <property name="version" value="1.2.7"/>
      <property name="year" value="2002"/>

[Maxent-commit] CVS: maxent/src/java/opennlp/maxent DataIndexer.java,1.7,1.8

From: Eric F. <er...@us...> - 2002-01-03 16:43:26

Update of /cvsroot/maxent/maxent/src/java/opennlp/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv11068/src/java/opennlp/maxent

Modified Files:
	DataIndexer.java 
Log Message:
bug fix: replace ComparableEvent[] array with an ArrayList so that 
we don't make assumptions about the size of the event index until we've
filtered out events that have no active features.  The native array
approach was a problem inasmuch as it could contain null entries (for
the dropped events) that would break the sorting routine.  ArrayList
avoids this pitfall by sorting just the parts of the underlying array
that have entries.


Index: DataIndexer.java
===================================================================
RCS file: /cvsroot/maxent/maxent/src/java/opennlp/maxent/DataIndexer.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** DataIndexer.java	2002/01/03 14:34:29	1.7
--- DataIndexer.java	2002/01/03 16:43:23	1.8
***************
*** 59,63 ****
          TObjectIntHashMap predicateIndex;
          TLinkedList events;
!         ComparableEvent[] eventsToCompare;
  
          predicateIndex = new TObjectIntHashMap();
--- 59,63 ----
          TObjectIntHashMap predicateIndex;
          TLinkedList events;
!         List eventsToCompare;
  
          predicateIndex = new TObjectIntHashMap();
***************
*** 90,114 ****
       * @since maxent 1.2.6
       */
!     private void sortAndMerge(ComparableEvent[] eventsToCompare) {
!         Arrays.sort(eventsToCompare);
!         int numEvents = eventsToCompare.length;
          int numUniqueEvents = 1; // assertion: eventsToCompare.length >= 1
  
!         if (eventsToCompare.length <= 1) {
              return;             // nothing to do; edge case (see assertion)
          }
  
!         ComparableEvent ce = eventsToCompare[0];
          for (int i=1; i<numEvents; i++) {
!             if (ce.compareTo(eventsToCompare[i]) == 0) {
                  ce.seen++;      // increment the seen count
!                 eventsToCompare[i] = null; // kill the duplicate
              } else {
!                 ce = eventsToCompare[i]; // a new champion emerges...
                  numUniqueEvents++; // increment the # of unique events
              }
          }
  
!         System.out.println("done. Reduced " + eventsToCompare.length
                             + " events to " + numUniqueEvents + ".");
  
--- 90,116 ----
       * @since maxent 1.2.6
       */
!     private void sortAndMerge(List eventsToCompare) {
!         Collections.sort(eventsToCompare);
!         int numEvents = eventsToCompare.size();
          int numUniqueEvents = 1; // assertion: eventsToCompare.length >= 1
  
!         if (numEvents <= 1) {
              return;             // nothing to do; edge case (see assertion)
          }
  
!         ComparableEvent ce = (ComparableEvent)eventsToCompare.get(0);
          for (int i=1; i<numEvents; i++) {
!             ComparableEvent ce2 = (ComparableEvent)eventsToCompare.get(i);
!             
!             if (ce.compareTo(ce2) == 0) {
                  ce.seen++;      // increment the seen count
!                 eventsToCompare.set(i, null); // kill the duplicate
              } else {
!                 ce = ce2; // a new champion emerges...
                  numUniqueEvents++; // increment the # of unique events
              }
          }
  
!         System.out.println("done. Reduced " + numEvents
                             + " events to " + numUniqueEvents + ".");
  
***************
*** 118,122 ****
  
          for (int i = 0, j = 0; i<numEvents; i++) {
!             ComparableEvent evt = eventsToCompare[i];
              if (null == evt) {
                  continue;       // this was a dupe, skip over it.
--- 120,124 ----
  
          for (int i = 0, j = 0; i<numEvents; i++) {
!             ComparableEvent evt = (ComparableEvent)eventsToCompare.get(i);
              if (null == evt) {
                  continue;       // this was a dupe, skip over it.
***************
*** 168,173 ****
      }
  
!     private ComparableEvent[] index(TLinkedList events,
!                                     TObjectIntHashMap predicateIndex) {
          TObjectIntHashMap omap = new TObjectIntHashMap();
  
--- 170,175 ----
      }
  
!     private List index(TLinkedList events,
!                        TObjectIntHashMap predicateIndex) {
          TObjectIntHashMap omap = new TObjectIntHashMap();
  
***************
*** 175,179 ****
          int outcomeCount = 0;
          int predCount = 0;
!         ComparableEvent[] eventsToCompare = new ComparableEvent[numEvents];
          TIntArrayList indexedContext = new TIntArrayList();
  
--- 177,181 ----
          int outcomeCount = 0;
          int predCount = 0;
!         List eventsToCompare = new ArrayList(numEvents);
          TIntArrayList indexedContext = new TIntArrayList();
  
***************
*** 181,184 ****
--- 183,187 ----
              Event ev = (Event)events.removeFirst();
              String[] econtext = ev.getContext();
+             ComparableEvent ce;
  	    
              int predID, ocID;
***************
*** 201,206 ****
              // drop events with no active features
              if (indexedContext.size() > 0) {
!                 eventsToCompare[eventIndex] =
!                     new ComparableEvent(ocID, indexedContext.toNativeArray());
              }
              // recycle the TIntArrayList
--- 204,209 ----
              // drop events with no active features
              if (indexedContext.size() > 0) {
!                 ce = new ComparableEvent(ocID, indexedContext.toNativeArray());
!                 eventsToCompare.add(ce);
              }
              // recycle the TIntArrayList

[Maxent-commit] CVS: maxent CHANGES,1.8,1.9 build.xml,1.14,1.15

From: Jason B. <jas...@us...> - 2002-01-03 16:14:45

Update of /cvsroot/maxent/maxent
In directory usw-pr-cvs1:/tmp/cvs-serv4931

Modified Files:
	CHANGES build.xml 
Log Message:
Just some text modifications that I did while making the release and forgot to commit.

Index: CHANGES
===================================================================
RCS file: /cvsroot/maxent/maxent/CHANGES,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** CHANGES	2002/01/03 14:34:29	1.8
--- CHANGES	2002/01/03 16:14:41	1.9
***************
*** 39,42 ****
--- 39,44 ----
  1.2.6
  -----
+ Summary: efficiency improvements for model training.
+ 
  Removed Colt dependency in favor of GNU Trove. (Eric)
  
***************
*** 52,56 ****
  There is still more to be done in this department, however. (Eric)
  
! The output directory is now "output" instead of "build". (Jason)
  
  1.2.4
--- 54,59 ----
  There is still more to be done in this department, however. (Eric)
  
! The output directory of the build structure is now "output" instead of
! "build". (Jason)
  
  1.2.4

Index: build.xml
===================================================================
RCS file: /cvsroot/maxent/maxent/build.xml,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** build.xml	2002/01/02 11:31:30	1.14
--- build.xml	2002/01/03 16:14:41	1.15
***************
*** 11,15 ****
      <property name="name" value="maxent"/>
      <property name="version" value="1.2.6"/>
!     <property name="year" value="2001"/>
  
      <echo message="----------- ${Name} ${version} [${year}] ------------"/>
--- 11,15 ----
      <property name="name" value="maxent"/>
      <property name="version" value="1.2.6"/>
!     <property name="year" value="2002"/>
  
      <echo message="----------- ${Name} ${version} [${year}] ------------"/>
***************
*** 122,126 ****
        </addfiles>
      </jlink>
-     <delete file="${build.dir}/${name}-${DSTAMP}.jar" />
    </target>
  
--- 122,125 ----
***************
*** 132,137 ****
      <tar tarfile="${name}-${version}-src.tar"
           basedir="../"
! 	 includes="${name}/**" >
!       <exclude name="${name}/docs/api/**"/>
        <exclude name="**/CVS"/>
      </tar>
--- 131,136 ----
      <tar tarfile="${name}-${version}-src.tar"
           basedir="../"
! 	 includes="${Name}/**" >
!       <exclude name="${Name}/docs/api/**"/>
        <exclude name="**/CVS"/>
      </tar>

[Maxent-commit] CVS: maxent/docs about.html,1.1,1.2 index.html,1.3,1.4

From: Jason B. <jas...@us...> - 2002-01-03 16:14:45

Update of /cvsroot/maxent/maxent/docs
In directory usw-pr-cvs1:/tmp/cvs-serv4931/docs

Modified Files:
	about.html index.html 
Log Message:
Just some text modifications that I did while making the release and forgot to commit.

Index: about.html
===================================================================
RCS file: /cvsroot/maxent/maxent/docs/about.html,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** about.html	2001/10/30 09:52:44	1.1
--- about.html	2002/01/03 16:14:41	1.2
***************
*** 155,159 ****
  <h2>Authors</h2>
  
! <p>The opennlp.maxent package was built by <a
  href="http://www.cogsci.ed.ac.uk/~jmb/">Jason Baldridge</a>, <a
  href="http://www.cis.upenn.edu/~tsmorton/">Tom Morton</a>, and <a
--- 155,159 ----
  <h2>Authors</h2>
  
! <p>The opennlp.maxent package was originally built by <a
  href="http://www.cogsci.ed.ac.uk/~jmb/">Jason Baldridge</a>, <a
  href="http://www.cis.upenn.edu/~tsmorton/">Tom Morton</a>, and <a
***************
*** 169,172 ****
--- 169,176 ----
  (POS tagger, end of sentence detector, tokenizer, name finder)
  possible!
+ </p>
+ 
+ <p>Eric Friedman has been steadily improving the efficiency and design
+ of the package since version 1.2.0.
  </p>
  

Index: index.html
===================================================================
RCS file: /cvsroot/maxent/maxent/docs/index.html,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** index.html	2001/10/30 09:52:44	1.3
--- index.html	2002/01/03 16:14:41	1.4
***************
*** 41,46 ****
  <p>
  This web page contains some details about maximum entropy and using
! the opennlp.maxent package.  It is updated periodically, but check out
! the <a
  href="https://sourceforge.net/project/?group_id=5961">Sourceforge page
  for Maxent</a> for the latest news.  You can also ask questions and
--- 41,46 ----
  <p>
  This web page contains some details about maximum entropy and using
! the opennlp.maxent package.  It is updated only periodically, so check
! out the <a
  href="https://sourceforge.net/project/?group_id=5961">Sourceforge page
  for Maxent</a> for the latest news.  You can also ask questions and
***************
*** 69,73 ****
      <h3>
      Email: <a href="mailto:jm...@co...">jm...@co...</a><br>
!     2001 October 29 <br>
      <br>
  <A href="http://sourceforge.net"> <IMG src="http://sourceforge.net/sflogo.php?group_id=5961&amp;type=1" width="88" height="31" border="0"></A> <br>
--- 69,73 ----
      <h3>
      Email: <a href="mailto:jm...@co...">jm...@co...</a><br>
!     2002 January 02 <br>
      <br>
  <A href="http://sourceforge.net"> <IMG src="http://sourceforge.net/sflogo.php?group_id=5961&amp;type=1" width="88" height="31" border="0"></A> <br>

Flat | Threaded

<< < 1 .. 9 10 11 12 13 .. 15 > >> (Page 11 of 15)