From: <lor...@us...> - 2013-02-18 14:17:03
|
Revision: 3901 http://dl-learner.svn.sourceforge.net/dl-learner/?rev=3901&view=rev Author: lorenz_b Date: 2013-02-18 14:16:54 +0000 (Mon, 18 Feb 2013) Log Message: ----------- Removed Stanford models because they can be loaded via Maven now. Removed Paths: ------------- trunk/components-ext/src/main/resources/tbsl/models/README-Models.txt trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger.props trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger.props Deleted: trunk/components-ext/src/main/resources/tbsl/models/README-Models.txt =================================================================== --- trunk/components-ext/src/main/resources/tbsl/models/README-Models.txt 2013-02-18 14:14:10 UTC (rev 3900) +++ trunk/components-ext/src/main/resources/tbsl/models/README-Models.txt 2013-02-18 14:16:54 UTC (rev 3901) @@ -1,90 +0,0 @@ -Stanford POS Tagger, v. 3.0.2 - 2011-05-15. -Copyright (c) 2002-2011 The Board of Trustees of -The Leland Stanford Junior University. All Rights Reserved. - -This document contains (some) information about the models included in -this release and that may be downloaded for the POS tagger website at -http://nlp.stanford.edu/software/tagger.shtml . If you have downloaded -the full tagger, all of the models mentioned in this document are in the -downloaded package in the same directory as this readme. Otherwise, -included in the download are two -English taggers, and the other taggers may be downloaded from the -website. All taggers are accompanied by the props files used to create -them; please examine these files for more detailed information about the -creation of the taggers. - -For English, the bidirectional taggers are slightly more accurate, but -tag much more slowly; choose the appropriate tagger based on your -speed/performance needs. - -English taggers ---------------------------- -bidirectional-distsim-wsj-0-18.tagger -Trained on WSJ sections 0-18 using a bidirectional architecture and -including word shape and distributional similarity features. -Penn Treebank tagset. -Performance: -97.28% correct on WSJ 19-21 -(90.46% correct on unknown words) - -left3words-wsj-0-18.tagger -Trained on WSJ sections 0-18 using the left3words architecture and -includes word shape features. Penn tagset. -Performance: -96.97% correct on WSJ 19-21 -(88.85% correct on unknown words) - -left3words-distsim-wsj-0-18.tagger -Trained on WSJ sections 0-18 using the left3words architecture and -includes word shape and distributional similarity features. Penn tagset. -Performance: -97.01% correct on WSJ 19-21 -(89.81% correct on unknown words) - - -Chinese tagger ---------------------------- -chinese.tagger -Trained on a combination of Chinese Treebank texts from Chinese and Hong -Kong sources. -LDC Chinese Treebank POS tag set. -Performance: -94.13% on a combination of Chinese and Hong Kong texts -(78.92% on unknown words) - -Arabic tagger ---------------------------- -arabic-accurate.tagger -Trained on the *entire* ATB p1-3. -When trained on the train part of the ATB p1-3 split done for the 2005 -JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets -the following performance: -Performance: -96.50% on dev portion according to Diab split -(80.59% on unknown words) - -arabic-fast.tagger -4x speed improvement over "accurate". -Performance: -96.34% on dev portion according to Diab split -(80.28% on unknown words) - - -German tagger ---------------------------- -german-accurate.tagger -Trained on the first 80% of the Negra corpus, which uses the STTS tagset. -The Stuttgart-Tübingen Tagset (STTS) is a set of 54 tags for annotating -German text corpora with part-of-speech labels, which was jointly -developed by the Institut für maschinelle Sprachverarbeitung of the -University of Stuttgart and the Seminar für Sprachwissenschaft of the -University of Tübingen. See: -http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html -Performance: -96.90% on the first half of the remaining 20% of the Negra corpus (dev set) -(90.33% on unknown words) - -german-fast.tagger -8x speed improvement over "accurate". -Performance: -96.61% overall / 86.72% unknown. Deleted: trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger =================================================================== (Binary files differ) Deleted: trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger.props =================================================================== --- trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger.props 2013-02-18 14:14:10 UTC (rev 3900) +++ trunk/components-ext/src/main/resources/tbsl/models/bidirectional-distsim-wsj-0-18.tagger.props 2013-02-18 14:16:54 UTC (rev 3901) @@ -1,33 +0,0 @@ -## tagger training invoked at Fri Apr 15 01:00:51 PDT 2011 with arguments: - model = bidirectional-distsim-wsj-0-18.tagger - arch = bidirectional5words,naacl2003unknowns,wordshapes(-1,1),distsim(../data/pos-tagger/training/english/egw.bnc.200.pruned,-1,1),distsimconjunction(../data/pos-tagger/training/english/egw.bnc.200.pruned,-1,1) - trainFile = ../data/pos-tagger/training/english/train-wsj-0-18 - closedClassTags = - closedClassTagThreshold = 40 - curWordMinFeatureThresh = 2 - debug = false - debugPrefix = - tagSeparator = _ - encoding = UTF-8 - iterations = 100 - lang = english - learnClosedClassTags = false - minFeatureThresh = 2 - openClassTags = -rareWordMinFeatureThresh = 5 - rareWordThresh = 5 - search = owlqn - sgml = false - sigmaSquared = 0.5 - regL1 = 0.75 - tagInside = - tokenize = true - tokenizerFactory = - tokenizerOptions = - verbose = false - verboseResults = true - veryCommonWordThresh = 250 - xmlInput = - outputFile = - outputFormat = slashTags - outputFormatOptions = Deleted: trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger =================================================================== (Binary files differ) Deleted: trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger.props =================================================================== --- trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger.props 2013-02-18 14:14:10 UTC (rev 3900) +++ trunk/components-ext/src/main/resources/tbsl/models/left3words-wsj-0-18.tagger.props 2013-02-18 14:16:54 UTC (rev 3901) @@ -1,33 +0,0 @@ -## tagger training invoked at Fri Apr 15 07:35:09 PDT 2011 with arguments: - model = left3words-wsj-0-18.tagger - arch = left3words,naacl2003unknowns,wordshapes(-1,1) - trainFile = ../data/pos-tagger/training/english/train-wsj-0-18 - closedClassTags = - closedClassTagThreshold = 40 - curWordMinFeatureThresh = 2 - debug = false - debugPrefix = - tagSeparator = _ - encoding = UTF-8 - iterations = 100 - lang = english - learnClosedClassTags = false - minFeatureThresh = 2 - openClassTags = -rareWordMinFeatureThresh = 10 - rareWordThresh = 5 - search = owlqn - sgml = false - sigmaSquared = 0.0 - regL1 = 0.75 - tagInside = - tokenize = true - tokenizerFactory = - tokenizerOptions = - verbose = false - verboseResults = true - veryCommonWordThresh = 250 - xmlInput = - outputFile = - outputFormat = slashTags - outputFormatOptions = This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |