ZORE - Browse /SegTagParsing at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
readme_analyzer.txt	2014-12-17	3.5 kB	0
ZAnalyzer1.1.rar	2014-12-17	234.7 MB	0
Totals: 2 Items		234.7 MB	0

This directory provide the Java interface for Chinese word segmentation, POS tagging and dependency parsing using zpar under the POS tag set and dependency tag set of Peking University Multi-view Treebank (PMT) on 64-bit windows and linux (fedora). 

The dll files are compiled with the cross-platform, open-source IDE "Code::blocks" (Version 13.12) with MinGW64 c++ compiler on Windows 8.1 and linux (fedora), respectively.

You may use the dlls for Java interface directly without compiling.
Using zpar in 64bit Java (V1.7) on Windows.
(1)Trained models: One trained model "model_parse_pmt1" for parsing, and two trained model "model_tag_pfr1, model_tag_pfr6" for joint segmentation and tagging are in the dir "model/".
(2)Dlls: Copy the dlls "cn_nlp_Parser.dll,cn_nlp_Tagger.dll" together with the two dlls "libgcc_s_seh-1.dll,libstdc++-6.dll" from "dll/64/" to your java project directory.
(3)Examples: Examples of using ZParser and ZTagger are given in the dir "src/cn". You can use the parser and tagger separately or jointly referring to the usages in the examples. 
(4)User dict for word segmentation and POS tagging: In particular, you can give a userdict usingthe file "userdict.txt". In this file (utf-8 encoding), each line contains a word and a POS tag with a tab between them. If you do not have a proper POS tag for some words, you may use the default tag "n" for these words.
Note: the tagger and parser can not process files whose names contain Chinese characters.

About the models:
(1)The model "model_parser_arceager_mvt_origin_autopos" can be used for Chinese dependency parsing. 

If you use them, please cite the following paper:

@InProceedings{qiu-EtAl:2014:Coling2,
  author    = {Qiu, Likun  and  Zhang, Yue  and  Jin, Peng  and  Wang, Houfeng},
  title     = {Multi-view Chinese Treebanking},
  booktitle = {Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers},
  month     = {August},
  year      = {2014},
  address   = {Dublin, Ireland},
  publisher = {Dublin City University and Association for Computational Linguistics},
  pages     = {257--268},
  url       = {http://www.aclweb.org/anthology/C14-1026}
}
(2)The models "model_tag_science" and "model_tag_pfr6" are trained on the People's Daily Corpus in January 1998 and a few sentences from scientific domain, and the People's Daily Corpus in January to June, 2000, respectively. 2000, respectively.

If you use them in your paper, please cite the following paper:
@InProceedings{qiu-EtAl:2014:Coling2,
  author    = {Qiu, Likun  and  Zhang, Yue  and  Jin, Peng  and  Wang, Houfeng},
  title     = {Multi-view Chinese Treebanking},
  booktitle = {Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers},
  month     = {August},
  year      = {2014},
  address   = {Dublin, Ireland},
  publisher = {Dublin City University and Association for Computational Linguistics},
  pages     = {257--268},
  url       = {http://www.aclweb.org/anthology/C14-1026}
}
@article{yu2003specification,
  title={Specification for corpus processing at Peking University: Word segmentation, {POS} tagging and phonetic notation},
  author={Yu, Shiwen and Duan, Huiming and Zhu, Xuefeng and Swen, Bin and Chang, Baobao},
  journal={Journal of {Chinese} Language and Computing},
  volume={13},
  number={2},
  pages={121--158},
  year={2003}
}

Source: readme_analyzer.txt, updated 2014-12-17

ZORE Files

Get an email when there's a new version of ZORE