Home

Zhongkai Fu

wordseg project is a word segment module implemented by C#. It is used to segment text into tokens and to label token's attribute according its context and semantic by front-maximum matching and CRF algorithms.

The following are some sentences need to be segmented:
张晓晨和付仲恺一起坐在家(西坝河东里社区)里的沙发上看非诚勿扰。
百度公司的名字源于“众里寻他千百度”这诗句。

After above sentences be segmented by wordseg, the result as follows for each sentence:
张晓晨[PER] 和 付仲恺[PER] 一起 坐 在 家 ( 西坝河东里社区[LOC] ) 里 的 沙发[PDT] 上 看 非 诚 勿扰 。
百度公司[ORG] 的 名字 源于 “ 众 里 寻 他 千百度 ” 这 诗句 。

In above, if a token has some attributes, the attribute result will be appended into the corresponding token within "[]".

Since wordseg has introduced statistics model to segment text by context, for same sub string in different context, dif


Project Members: