foray-commit Mailing List for FOray (Page 75)
Modular XSL-FO Implementation for Java.
Status: Alpha
Brought to you by:
victormote
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
(139) |
Apr
(98) |
May
(250) |
Jun
(394) |
Jul
(84) |
Aug
(13) |
Sep
(420) |
Oct
(186) |
Nov
(1) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
(108) |
Feb
(202) |
Mar
(291) |
Apr
(247) |
May
(374) |
Jun
(227) |
Jul
(231) |
Aug
(60) |
Sep
(31) |
Oct
(45) |
Nov
(18) |
Dec
|
| 2008 |
Jan
(38) |
Feb
(71) |
Mar
(142) |
Apr
|
May
(59) |
Jun
(6) |
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
(12) |
Feb
(4) |
Mar
(88) |
Apr
(121) |
May
(17) |
Jun
(30) |
Jul
|
Aug
(5) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2010 |
Jan
(11) |
Feb
(76) |
Mar
(11) |
Apr
|
May
(11) |
Jun
|
Jul
|
Aug
(44) |
Sep
(14) |
Oct
(7) |
Nov
|
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(9) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(168) |
| 2017 |
Jan
(77) |
Feb
(11) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2018 |
Jan
|
Feb
|
Mar
(1) |
Apr
(6) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
(88) |
Mar
(118) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(6) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(141) |
| 2021 |
Jan
(170) |
Feb
(20) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(62) |
Nov
(189) |
Dec
(162) |
| 2022 |
Jan
(201) |
Feb
(118) |
Mar
(8) |
Apr
|
May
(2) |
Jun
(47) |
Jul
(19) |
Aug
(14) |
Sep
(3) |
Oct
|
Nov
(28) |
Dec
(235) |
| 2023 |
Jan
(112) |
Feb
(23) |
Mar
(2) |
Apr
(2) |
May
|
Jun
(1) |
Jul
|
Aug
(70) |
Sep
(92) |
Oct
(20) |
Nov
(1) |
Dec
(1) |
| 2024 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(14) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
(10) |
Feb
(29) |
Mar
|
Apr
(162) |
May
(245) |
Jun
(83) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(4) |
Dec
|
|
From: <vic...@us...> - 2021-11-11 13:39:31
|
Revision: 12026
http://sourceforge.net/p/foray/code/12026
Author: victormote
Date: 2021-11-11 13:39:17 +0000 (Thu, 11 Nov 2021)
Log Message:
-----------
Normal dictionary editing. Converted all multi-word <w> elements to <phrase>.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-10 22:44:12 UTC (rev 12025)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-11 13:39:17 UTC (rev 12026)
@@ -54,12 +54,12 @@
-->
<w><t>a</t></w>
-<w><t>a cap-pel-la</t></w>
-<w><t>a for-ti-o-ri</t></w>
-<w><t>a go-go</t></w>
-<w><t>a pos-te-ri-o-ri</t></w>
-<w><t>a pri-o-ri</t></w>
-<w><t>a tem-po</t></w>
+<phrase><t>a cap-pel-la</t></phrase>
+<phrase><t>a for-ti-o-ri</t></phrase>
+<phrase><t>a go-go</t></phrase>
+<phrase><t>a pos-te-ri-o-ri</t></phrase>
+<phrase><t>a pri-o-ri</t></phrase>
+<phrase><t>a tem-po</t></phrase>
<w><t>a=plen-ty</t></w>
<w><t>a-a</t></w>
<w><t>Aa-chen</t></w>
@@ -73,15 +73,15 @@
<w><t>Aar-gau</t></w>
<w><t>Aar-hus</t></w>
<w><t>Aar-on</t></w>
-<w><t>Aa-ron's beard</t></w>
-<w><t>Aa-ron's rod</t></w>
+<phrase><t>Aa-ron's beard</t></phrase>
+<phrase><t>Aa-ron's rod</t></phrase>
<w><t>Aa-ron's=beard</t></w>
<w><t>Aa-ron-ic</t></w>
<w><t>Aa-ron-i-cal</t></w>
<w><t>Aa-ron-ite</t></w>
<w><t>Aar-on-it-ic</t></w>
-<w><t>ab in-i-ti-o</t></w>
-<w><t>ab o-vo</t></w>
+<phrase><t>ab in-i-ti-o</t></phrase>
+<phrase><t>ab o-vo</t></phrase>
<w><t>ab-a</t></w>
<w><t>a-ba</t></w>
<w><t>a-bac</t></w>
@@ -162,7 +162,7 @@
<w><t>Abbe-vill-i-an</t></w>
<w><t>Ab-bey</t></w>
<w><t>ab-bey</t></w>
-<w><t>Ab-bey The-a-tre</t></w>
+<phrase><t>Ab-bey The-a-tre</t></phrase>
<w><t>ab-bey-stead</t></w>
<w><t>ab-bey-stede</t></w>
<w><t>Ab-bie</t></w>
@@ -206,7 +206,7 @@
<w><t>ab-duce</t></w>
<w><t>ab-duced</t></w>
<w><t>ab-du-cens</t></w>
-<w><t>ab-du-cens nerve</t></w>
+<phrase><t>ab-du-cens nerve</t></phrase>
<w><t>ab-du-cent</t></w>
<w><t>ab-du-cen-tes</t></w>
<w><t>ab-duc-ing</t></w>
@@ -214,7 +214,7 @@
<w><t>ab-duc-tion</t></w>
<w><t>ab-duc-tor</t></w>
<w><t>Ab-dul=A-ziz</t></w>
-<w><t>Ab-dul=Ha-mid II</t></w>
+<phrase><t>Ab-dul=Ha-mid II</t></phrase>
<w><t>a-beam</t></w>
<w><t>a-be-ce-dar-i-a</t></w>
<w><t>a-be-ce-dar-i-an</t></w>
@@ -227,7 +227,7 @@
<w><t>Ab-e-lard</t></w>
<w><t>a-bele</t></w>
<w><t>A-be-li-an</t></w>
-<w><t>A-be-li-an group</t></w>
+<phrase><t>A-be-li-an group</t></phrase>
<w><t>a-bel-mosk</t></w>
<w><t>A-be-nez-ra</t></w>
<w><t>Ab-e-o-ku-ta</t></w>
@@ -235,7 +235,7 @@
<w><t>Ab-er-crom-bie</t></w>
<w><t>Ab-er-dare</t></w>
<w><t>Ab-er-deen</t></w>
-<w><t>Ab-er-deen An-gus</t></w>
+<phrase><t>Ab-er-deen An-gus</t></phrase>
<w><t>Ab-er-deen-shire</t></w>
<w><t>Ab-er-do-ni-an</t></w>
<w><t>A-ber-glau-be</t></w>
@@ -283,7 +283,7 @@
<w><t>A-bie</t></w>
<w><t>ab-i-ent</t></w>
<w><t>ab-i-e-tate</t></w>
-<w><t>ab-i-et-ic ac-id</t></w>
+<phrase><t>ab-i-et-ic ac-id</t></phrase>
<w><t>ab-i-gail</t></w>
<w><t>Ab-i-gail</t></w>
<w><t>A-bi-hu</t></w>
@@ -346,14 +346,14 @@
<w><t>ab-la-tion</t></w>
<w><t>ab-la-ti-val</t></w>
<w><t>ab-la-tive</t></w>
-<w><t>ab-la-tive ab-so-lute</t></w>
+<phrase><t>ab-la-tive ab-so-lute</t></phrase>
<w><t>ab-la-tor</t></w>
<w><t>ab-laut</t></w>
<w><t>a-blaze</t></w>
<w><t>a-ble</t></w>
-<w><t>a-ble rat-ing</t></w>
+<phrase><t>a-ble rat-ing</t></phrase>
<w><t>a-ble=bod-ied</t></w>
-<w><t>a-ble=bod-ied sea-man</t></w>
+<phrase><t>a-ble=bod-ied sea-man</t></phrase>
<w><t>a-ble=bod-ied-ness</t></w>
<w><t>ab-le-gate</t></w>
<w><t>a-bleph-a-rous</t></w>
@@ -381,9 +381,9 @@
<w><t>ab-ne-ga-tion</t></w>
<w><t>ab-ne-ga-tor</t></w>
<w><t>Ab-ner</t></w>
-<w><t>Ab-ney lev-el</t></w>
+<phrase><t>Ab-ney lev-el</t></phrase>
<w><t>ab-nor-mal</t></w>
-<w><t>ab-nor-mal psy-chol-o-gy</t></w>
+<phrase><t>ab-nor-mal psy-chol-o-gy</t></phrase>
<w><t>ab-nor-mal-cy</t></w>
<w><t>ab-nor-mal-ise</t></w>
<w><t>ab-nor-mal-ised</t></w>
@@ -422,7 +422,7 @@
<w><t>ab-o-ma-sum</t></w>
<w><t>ab-o-ma-sus</t></w>
<w><t>a-bom-i-na-ble</t></w>
-<w><t>a-bom-i-na-ble snow-man</t></w>
+<phrase><t>a-bom-i-na-ble snow-man</t></phrase>
<w><t>a-bom-i-na-ble-ness</t></w>
<w><t>a-bom-i-na-bly</t></w>
<w><t>a-bom-i-nate</t></w>
@@ -465,7 +465,7 @@
<w><t>a-bout=face</t></w>
<w><t>a-bout=ship</t></w>
<w><t>a-bove</t></w>
-<w><t>a-bove board</t></w>
+<phrase><t>a-bove board</t></phrase>
<w><t>a-bove-board</t></w>
<w><t>a-bove-ground</t></w>
<w><t>a-box</t></w>
@@ -524,7 +524,7 @@
<w><t>ab-scis-sae</t></w>
<w><t>ab-scis-sas</t></w>
<w><t>ab-scis-sion</t></w>
-<w><t>ab-scis-sion lay-er</t></w>
+<phrase><t>ab-scis-sion lay-er</t></phrase>
<w><t>ab-scond</t></w>
<w><t>ab-scond-er</t></w>
<w><t>Ab-se-con</t></w>
@@ -531,13 +531,13 @@
<w><t>ab-seil</t></w>
<w><t>ab-sence</t></w>
<w><t>ab-sent</t></w>
-<w><t>ab-sent with-out leave</t></w>
+<phrase><t>ab-sent with-out leave</t></phrase>
<w><t>ab-sent=mind-ed</t></w>
<w><t>ab-sent=mind-ed-ness</t></w>
<w><t>ab-sen-ta-tion</t></w>
-<w><t>ab-sen-te re-o</t></w>
+<phrase><t>ab-sen-te re-o</t></phrase>
<w><t>ab-sen-tee</t></w>
-<w><t>ab-sen-tee land-lord</t></w>
+<phrase><t>ab-sen-tee land-lord</t></phrase>
<w><t>ab-sen-tee-ism</t></w>
<w><t>ab-sent-er</t></w>
<w><t>ab-sen-ti-a</t></w>
@@ -550,22 +550,22 @@
<w><t>ab-sin-thi-al</t></w>
<w><t>ab-sin-thi-an</t></w>
<w><t>ab-sinth-ism</t></w>
-<w><t>ab-sit o-men</t></w>
+<phrase><t>ab-sit o-men</t></phrase>
<w><t>Ab-so-lute</t></w>
<w><t>ab-so-lute</t></w>
-<w><t>ab-so-lute al-co-hol</t></w>
-<w><t>ab-so-lute ceil-ing</t></w>
-<w><t>ab-so-lute hu-mid-i-ty</t></w>
-<w><t>ab-so-lute mag-ni-tude</t></w>
-<w><t>ab-so-lute ma-jor-i-ty</t></w>
-<w><t>ab-so-lute mon-ar-chy</t></w>
-<w><t>ab-so-lute mu-sic</t></w>
-<w><t>ab-so-lute pitch</t></w>
-<w><t>ab-so-lute tem-per-a-ture</t></w>
-<w><t>ab-so-lute u-nit</t></w>
-<w><t>ab-so-lute val-ue</t></w>
-<w><t>ab-so-lute vis-cos-i-ty</t></w>
-<w><t>ab-so-lute ze-ro</t></w>
+<phrase><t>ab-so-lute al-co-hol</t></phrase>
+<phrase><t>ab-so-lute ceil-ing</t></phrase>
+<phrase><t>ab-so-lute hu-mid-i-ty</t></phrase>
+<phrase><t>ab-so-lute mag-ni-tude</t></phrase>
+<phrase><t>ab-so-lute ma-jor-i-ty</t></phrase>
+<phrase><t>ab-so-lute mon-ar-chy</t></phrase>
+<phrase><t>ab-so-lute mu-sic</t></phrase>
+<phrase><t>ab-so-lute pitch</t></phrase>
+<phrase><t>ab-so-lute tem-per-a-ture</t></phrase>
+<phrase><t>ab-so-lute u-nit</t></phrase>
+<phrase><t>ab-so-lute val-ue</t></phrase>
+<phrase><t>ab-so-lute vis-cos-i-ty</t></phrase>
+<phrase><t>ab-so-lute ze-ro</t></phrase>
<w><t>ab-so-lute-ly</t></w>
<w><t>ab-so-lute-ness</t></w>
<w><t>ab-so-lu-tion</t></w>
@@ -591,7 +591,7 @@
<w><t>ab-sor-be-fa-cient</t></w>
<w><t>ab-sorb-en-cy</t></w>
<w><t>ab-sorb-ent</t></w>
-<w><t>ab-sorb-ent cot-ton</t></w>
+<phrase><t>ab-sorb-ent cot-ton</t></phrase>
<w><t>ab-sorb-er</t></w>
<w><t>ab-sorb-ing</t></w>
<w><t>ab-sorb-ing-ly</t></w>
@@ -599,7 +599,7 @@
<w><t>ab-sorp-ti-om-e-ter</t></w>
<w><t>ab-sorp-ti-o-met-ric</t></w>
<w><t>ab-sorp-tion</t></w>
-<w><t>ab-sorp-tion spec-trum</t></w>
+<phrase><t>ab-sorp-tion spec-trum</t></phrase>
<w><t>ab-sorp-tive</t></w>
<w><t>ab-sorp-tive-ness</t></w>
<w><t>ab-sorp-tiv-i-ty</t></w>
@@ -623,9 +623,9 @@
<w><t>ab-sti-nent</t></w>
<w><t>ab-sti-nent-ly</t></w>
<w><t>ab-stract</t></w>
-<w><t>ab-stract ex-pres-sion-ism</t></w>
-<w><t>ab-stract noun</t></w>
-<w><t>ab-stract of ti-tle</t></w>
+<phrase><t>ab-stract ex-pres-sion-ism</t></phrase>
+<phrase><t>ab-stract noun</t></phrase>
+<phrase><t>ab-stract of ti-tle</t></phrase>
<w><t>ab-stract-ed</t></w>
<w><t>ab-stract-ed-ly</t></w>
<w><t>ab-stract-ed-ness</t></w>
@@ -649,12 +649,12 @@
<w><t>ab-surd-ly</t></w>
<w><t>ab-surd-ness</t></w>
<w><t>Ab-syr-tus</t></w>
-<w><t>Ab-u Dha-bi</t></w>
-<w><t>Ab-u Sim-bel</t></w>
+<phrase><t>Ab-u Dha-bi</t></phrase>
+<phrase><t>Ab-u Sim-bel</t></phrase>
<w><t>A-bu=Bakr</t></w>
<w><t>A-bu=Bekr</t></w>
<w><t>A-bu-kir</t></w>
-<w><t>A-bu-kir Bay</t></w>
+<phrase><t>A-bu-kir Bay</t></phrase>
<w><t>A-bul-fe-da</t></w>
<w><t>a-bu-li-a</t></w>
<w><t>a-bu-lic</t></w>
@@ -696,7 +696,7 @@
<w><t>a-bys-sal</t></w>
<w><t>Ab-ys-sin-i-a</t></w>
<w><t>Ab-ys-sin-i-an</t></w>
-<w><t>Ab-ys-sin-i-an cat</t></w>
+<phrase><t>Ab-ys-sin-i-an cat</t></phrase>
<w><t>Ab-é-lard</t></w>
<w><t>ac=glob-u-lin</t></w>
<w><t>A-ca-cal-lis</t></w>
@@ -705,7 +705,7 @@
<w><t>ac-a-dem-i-a</t></w>
<w><t>ac-a-de-mi-a</t></w>
<w><t>ac-a-dem-ic</t></w>
-<w><t>ac-a-dem-ic dress</t></w>
+<phrase><t>ac-a-dem-ic dress</t></phrase>
<w><t>ac-a-dem-i-cal</t></w>
<w><t>ac-a-dem-i-cal-ly</t></w>
<w><t>ac-a-dem-i-cals</t></w>
@@ -757,7 +757,7 @@
<w><t>ac-a-rine</t></w>
<w><t>A-car-nan</t></w>
<w><t>ac-a-roid</t></w>
-<w><t>ac-a-roid gum</t></w>
+<phrase><t>ac-a-roid gum</t></phrase>
<w><t>ac-a-rol-o-gist</t></w>
<w><t>ac-a-rol-o-gy</t></w>
<w><t>ac-a-ro-pho-bi-a</t></w>
@@ -819,8 +819,8 @@
<w><t>ac-cept-er</t></w>
<w><t>ac-cep-tor</t></w>
<w><t>ac-cess</t></w>
-<w><t>ac-cess road</t></w>
-<w><t>ac-cess time</t></w>
+<phrase><t>ac-cess road</t></phrase>
+<phrase><t>ac-cess time</t></phrase>
<w><t>ac-ces-sa-ri-ly</t></w>
<w><t>ac-ces-sa-ri-ness</t></w>
<w><t>ac-ces-sa-ry</t></w>
@@ -828,7 +828,7 @@
<w><t>ac-ces-si-ble</t></w>
<w><t>ac-ces-si-bly</t></w>
<w><t>ac-ces-sion</t></w>
-<w><t>ac-ces-sion num-ber</t></w>
+<phrase><t>ac-ces-sion num-ber</t></phrase>
<w><t>ac-ces-sion-al</t></w>
<w><t>ac-ces-so-ri-al</t></w>
<w><t>ac-ces-so-ri-i</t></w>
@@ -839,14 +839,14 @@
<w><t>ac-ces-so-rized</t></w>
<w><t>ac-ces-so-riz-ing</t></w>
<w><t>ac-ces-so-ry</t></w>
-<w><t>ac-ces-so-ry fruit</t></w>
-<w><t>ac-ces-so-ry nerve</t></w>
+<phrase><t>ac-ces-so-ry fruit</t></phrase>
+<phrase><t>ac-ces-so-ry nerve</t></phrase>
<w><t>ac-ciac-ca-tu-ra</t></w>
<w><t>ac-ciac-catu-ras</t></w>
<w><t>ac-ciac-ca-tu-re</t></w>
<w><t>ac-ci-dence</t></w>
<w><t>ac-ci-dent</t></w>
-<w><t>ac-ci-dent in-sur-ance</t></w>
+<phrase><t>ac-ci-dent in-sur-ance</t></phrase>
<w><t>ac-ci-dent=prone</t></w>
<w><t>ac-ci-den-tal</t></w>
<w><t>ac-ci-den-tal-ism</t></w>
@@ -891,9 +891,9 @@
<w><t>ac-com-mo-dat-ing</t></w>
<w><t>ac-com-mo-dat-ing-ly</t></w>
<w><t>ac-com-mo-da-tion</t></w>
-<w><t>ac-com-mo-da-tion ad-dress</t></w>
-<w><t>ac-com-mo-da-tion bill</t></w>
-<w><t>ac-com-mo-da-tion lad-der</t></w>
+<phrase><t>ac-com-mo-da-tion ad-dress</t></phrase>
+<phrase><t>ac-com-mo-da-tion bill</t></phrase>
+<phrase><t>ac-com-mo-da-tion lad-der</t></phrase>
<w><t>ac-com-mo-da-tion-al</t></w>
<w><t>ac-com-mo-da-tive</t></w>
<w><t>ac-com-mo-da-tive-ness</t></w>
@@ -924,7 +924,7 @@
<w><t>ac-cord-ing</t></w>
<w><t>ac-cord-ing-ly</t></w>
<w><t>ac-cor-di-on</t></w>
-<w><t>ac-cor-di-on pleats</t></w>
+<phrase><t>ac-cor-di-on pleats</t></phrase>
<w><t>ac-cor-di-on-ist</t></w>
<w><t>ac-cost</t></w>
<w><t>ac-cost-a-ble</t></w>
@@ -934,10 +934,10 @@
<w><t>ac-coucheuse</t></w>
<w><t>ac-cou-cheuse</t></w>
<w><t>ac-count</t></w>
-<w><t>ac-count day</t></w>
-<w><t>ac-count for</t></w>
-<w><t>ac-count pay-a-ble</t></w>
-<w><t>ac-count re-ceiv-a-ble</t></w>
+<phrase><t>ac-count day</t></phrase>
+<phrase><t>ac-count for</t></phrase>
+<phrase><t>ac-count pay-a-ble</t></phrase>
+<phrase><t>ac-count re-ceiv-a-ble</t></phrase>
<w><t>ac-count-a-bil-i-ty</t></w>
<w><t>ac-count-a-ble</t></w>
<w><t>ac-count-a-ble-ness</t></w>
@@ -990,7 +990,7 @@
<w><t>ac-cu-mu-lat-ed</t></w>
<w><t>ac-cu-mu-lat-ing</t></w>
<w><t>ac-cu-mu-la-tion</t></w>
-<w><t>ac-cu-mu-la-tion point</t></w>
+<phrase><t>ac-cu-mu-la-tion point</t></phrase>
<w><t>ac-cu-mu-la-tive</t></w>
<w><t>ac-cu-mu-la-tive-ly</t></w>
<w><t>ac-cu-mu-la-tive-ness</t></w>
@@ -1067,14 +1067,14 @@
<w><t>ac-et-an-i-lide</t></w>
<w><t>ac-et-a-nis-i-dine</t></w>
<w><t>ac-e-tate</t></w>
-<w><t>ac-e-tate ray-on</t></w>
+<phrase><t>ac-e-tate ray-on</t></phrase>
<w><t>ac-e-tat-ed</t></w>
<w><t>ac-e-ta-tion</t></w>
<w><t>ac-et-a-zol-a-mide</t></w>
<w><t>A-ce-tes</t></w>
<w><t>a-ce-tic</t></w>
-<w><t>a-ce-tic ac-id</t></w>
-<w><t>a-ce-tic an-hy-dride</t></w>
+<phrase><t>a-ce-tic ac-id</t></phrase>
+<phrase><t>a-ce-tic an-hy-dride</t></phrase>
<w><t>a-cet-i-fi-ca-tion</t></w>
<w><t>a-cet-i-fied</t></w>
<w><t>a-cet-i-fi-er</t></w>
@@ -1091,7 +1091,7 @@
<w><t>ac-e-to-met-ri-cal-ly</t></w>
<w><t>ac-e-tom-e-try</t></w>
<w><t>ac-e-tone</t></w>
-<w><t>ac-e-tone bod-y</t></w>
+<phrase><t>ac-e-tone bod-y</t></phrase>
<w><t>ac-e-ton-ic</t></w>
<w><t>ac-e-to-ni-trile</t></w>
<w><t>ac-e-to-phe-net-i-din</t></w>
@@ -1124,13 +1124,13 @@
<w><t>a-cet-y-liz-er</t></w>
<w><t>a-cet-y-liz-ing</t></w>
<w><t>a-ce-tyl-meth-yl-car-bi-nol</t></w>
-<w><t>ac-e-tyl-sal-i-cyl-ic ac-id</t></w>
+<phrase><t>ac-e-tyl-sal-i-cyl-ic ac-id</t></phrase>
<w><t>ace-y=deuc-y</t></w>
<w><t>A-chab</t></w>
<w><t>A-chad</t></w>
<w><t>A-chae-a</t></w>
<w><t>A-chae-an</t></w>
-<w><t>A-chae-an League</t></w>
+<phrase><t>A-chae-an League</t></phrase>
<w><t>A-chae-me-nes</t></w>
<w><t>Ach-ae-me-ni-an</t></w>
<w><t>A-chae-me-nid</t></w>
@@ -1161,18 +1161,18 @@
<w><t>a-chieve</t></w>
<w><t>a-chieved</t></w>
<w><t>a-chieve-ment</t></w>
-<w><t>a-chieve-ment age</t></w>
-<w><t>a-chieve-ment quo-tient</t></w>
-<w><t>a-chieve-ment test</t></w>
+<phrase><t>a-chieve-ment age</t></phrase>
+<phrase><t>a-chieve-ment quo-tient</t></phrase>
+<phrase><t>a-chieve-ment test</t></phrase>
<w><t>a-chiev-er</t></w>
<w><t>a-chiev-ing</t></w>
<w><t>a-chi-la-ry</t></w>
<w><t>Ach-ill</t></w>
-<w><t>Ach-ill Is-land</t></w>
+<phrase><t>Ach-ill Is-land</t></phrase>
<w><t>Ach-il-le-an</t></w>
<w><t>A-chil-les</t></w>
-<w><t>A-chil-les heel</t></w>
-<w><t>A-chil-les ten-don</t></w>
+<phrase><t>A-chil-les heel</t></phrase>
+<phrase><t>A-chil-les ten-don</t></phrase>
<w><t>A-chim-a-as</t></w>
<w><t>A-chim-e-lech</t></w>
<w><t>a-chim-e-nes</t></w>
@@ -1200,8 +1200,8 @@
<w><t>ach-ro-mat</t></w>
<w><t>ach-ro-mate</t></w>
<w><t>ach-ro-mat-ic</t></w>
-<w><t>ach-ro-mat-ic col-or</t></w>
-<w><t>ach-ro-mat-ic lens</t></w>
+<phrase><t>ach-ro-mat-ic col-or</t></phrase>
+<phrase><t>ach-ro-mat-ic lens</t></phrase>
<w><t>ach-ro-mat-i-cal-ly</t></w>
<w><t>ach-ro-ma-tic-i-ty</t></w>
<w><t>a-chro-ma-tin</t></w>
@@ -1231,11 +1231,11 @@
<w><t>a-cic-u-lat-ed</t></w>
<w><t>a-cic-u-lum</t></w>
<w><t>ac-id</t></w>
-<w><t>ac-id drop</t></w>
-<w><t>ac-id rock</t></w>
-<w><t>ac-id soil</t></w>
-<w><t>ac-id test</t></w>
-<w><t>ac-id val-ue</t></w>
+<phrase><t>ac-id drop</t></phrase>
+<phrase><t>ac-id rock</t></phrase>
+<phrase><t>ac-id soil</t></phrase>
+<phrase><t>ac-id test</t></phrase>
+<phrase><t>ac-id val-ue</t></phrase>
<w><t>ac-id=fast</t></w>
<w><t>ac-id=fast-ness</t></w>
<w><t>ac-id=form-ing</t></w>
@@ -1267,7 +1267,7 @@
<w><t>ac-i-do-phile</t></w>
<w><t>ac-i-do-phil-ic</t></w>
<w><t>ac-i-doph-i-lus</t></w>
-<w><t>ac-i-doph-i-lus milk</t></w>
+<phrase><t>ac-i-doph-i-lus milk</t></phrase>
<w><t>ac-i-do-sis</t></w>
<w><t>ac-i-dot-ic</t></w>
<w><t>a-cid-u-lant</t></w>
@@ -1313,7 +1313,7 @@
<w><t>ac-lau-rin</t></w>
<w><t>ac-le</t></w>
<w><t>a-cleis-to-car-di-a</t></w>
-<w><t>a-clin-ic line</t></w>
+<phrase><t>a-clin-ic line</t></phrase>
<w><t>ac-maes-the-sia</t></w>
<w><t>ac-me</t></w>
<w><t>ac-mes-the-sia</t></w>
@@ -1344,9 +1344,9 @@
<w><t>a-co-ni-tum</t></w>
<w><t>acor-a-ble</t></w>
<w><t>a-corn</t></w>
-<w><t>a-corn bar-na-cle</t></w>
-<w><t>a-corn valve</t></w>
-<w><t>a-corn worm</t></w>
+<phrase><t>a-corn bar-na-cle</t></phrase>
+<phrase><t>a-corn valve</t></phrase>
+<phrase><t>a-corn worm</t></phrase>
<w><t>a-corned</t></w>
<w><t>a-cos-mism</t></w>
<w><t>a-cos-mist</t></w>
@@ -1360,9 +1360,9 @@
<w><t>a-cous-mas</t></w>
<w><t>a-cous-ma-ta</t></w>
<w><t>a-cous-tic</t></w>
-<w><t>a-cous-tic fea-ture</t></w>
-<w><t>a-cous-tic nerve</t></w>
-<w><t>a-cous-tic pho-net-ics</t></w>
+<phrase><t>a-cous-tic fea-ture</t></phrase>
+<phrase><t>a-cous-tic nerve</t></phrase>
+<phrase><t>a-cous-tic pho-net-ics</t></phrase>
<w><t>a-cous-ti-cal</t></w>
<w><t>a-cous-ti-cal-ly</t></w>
<w><t>ac-ous-ti-cian</t></w>
@@ -1384,8 +1384,8 @@
<w><t>ac-quir-a-ble</t></w>
<w><t>ac-quire</t></w>
<w><t>ac-quired</t></w>
-<w><t>ac-quired char-ac-ter-is-tic</t></w>
-<w><t>ac-quired taste</t></w>
+<phrase><t>ac-quired char-ac-ter-is-tic</t></phrase>
+<phrase><t>ac-quired taste</t></phrase>
<w><t>ac-quire-ment</t></w>
<w><t>ac-quir-er</t></w>
<w><t>ac-quir-ing</t></w>
@@ -1413,7 +1413,7 @@
<w><t>a-crid-i-ty</t></w>
<w><t>ac-rid-ly</t></w>
<w><t>ac-ri-fla-vine</t></w>
-<w><t>ac-ri-fla-vine hy-dro-chlor-ide</t></w>
+<phrase><t>ac-ri-fla-vine hy-dro-chlor-ide</t></phrase>
<w><t>Ac-ri-lan</t></w>
<w><t>ac-ri-mo-ni-ous</t></w>
<w><t>ac-ri-mo-ni-ous-ly</t></w>
@@ -1503,14 +1503,14 @@
<w><t>ac-ryl-al-de-hyde</t></w>
<w><t>ac-ry-late</t></w>
<w><t>a-cryl-ic</t></w>
-<w><t>a-cryl-ic ac-id</t></w>
-<w><t>a-cryl-ic fi-bre</t></w>
-<w><t>a-cryl-ic res-in</t></w>
+<phrase><t>a-cryl-ic ac-id</t></phrase>
+<phrase><t>a-cryl-ic fi-bre</t></phrase>
+<phrase><t>a-cryl-ic res-in</t></phrase>
<w><t>ac-ry-lo-ni-trile</t></w>
<w><t>ac-ryl-yl</t></w>
<w><t>ac-ry-lyl</t></w>
<w><t>act</t></w>
-<w><t>act of con-tri-tion</t></w>
+<phrase><t>act of con-tri-tion</t></phrase>
<w><t>Ac-ta</t></w>
<w><t>act-a-bil-i-ty</t></w>
<w><t>act-a-ble</t></w>
@@ -1526,11 +1526,11 @@
<w><t>ac-tin-ic</t></w>
<w><t>ac-tin-i-cal-ly</t></w>
<w><t>ac-ti-nide</t></w>
-<w><t>ac-ti-nide se-ries</t></w>
+<phrase><t>ac-ti-nide se-ries</t></phrase>
<w><t>ac-tin-i-form</t></w>
<w><t>ac-tin-ism</t></w>
<w><t>ac-tin-i-um</t></w>
-<w><t>ac-tin-i-um se-ries</t></w>
+<phrase><t>ac-tin-i-um se-ries</t></phrase>
<w><t>ac-ti-no-ba-cil-li</t></w>
<w><t>ac-ti-no-bac-il-lo-sis</t></w>
<w><t>ac-ti-no-bac-il-lot-ic</t></w>
@@ -1569,10 +1569,10 @@
<w><t>ac-ti-no-u-ra-ni-um</t></w>
<w><t>ac-ti-no-zo-an</t></w>
<w><t>ac-tion</t></w>
-<w><t>ac-tion paint-ing</t></w>
-<w><t>ac-tion po-ten-tial</t></w>
-<w><t>ac-tion re-play</t></w>
-<w><t>ac-tion sta-tions</t></w>
+<phrase><t>ac-tion paint-ing</t></phrase>
+<phrase><t>ac-tion po-ten-tial</t></phrase>
+<phrase><t>ac-tion re-play</t></phrase>
+<phrase><t>ac-tion sta-tions</t></phrase>
<w><t>ac-tion-a-ble</t></w>
<w><t>ac-tion-a-bly</t></w>
<w><t>ac-tion-less</t></w>
@@ -1580,16 +1580,16 @@
<w><t>Ac-ti-um</t></w>
<w><t>ac-ti-vate</t></w>
<w><t>ac-ti-vat-ed</t></w>
-<w><t>ac-ti-vat-ed a-lu-mi-na</t></w>
-<w><t>ac-ti-vat-ed car-bon</t></w>
-<w><t>ac-ti-vat-ed sludge</t></w>
+<phrase><t>ac-ti-vat-ed a-lu-mi-na</t></phrase>
+<phrase><t>ac-ti-vat-ed car-bon</t></phrase>
+<phrase><t>ac-ti-vat-ed sludge</t></phrase>
<w><t>ac-ti-vat-ing</t></w>
<w><t>ac-ti-va-tion</t></w>
<w><t>ac-ti-va-tor</t></w>
<w><t>ac-tive</t></w>
-<w><t>ac-tive du-ty</t></w>
-<w><t>ac-tive list</t></w>
-<w><t>ac-tive serv-ice</t></w>
+<phrase><t>ac-tive du-ty</t></phrase>
+<phrase><t>ac-tive list</t></phrase>
+<phrase><t>ac-tive serv-ice</t></phrase>
<w><t>ac-tive-ly</t></w>
<w><t>ac-tive-ness</t></w>
<w><t>ac-tiv-ism</t></w>
@@ -1651,8 +1651,8 @@
<w><t>ac-u-sec-tor</t></w>
<w><t>a-cut-ance</t></w>
<w><t>a-cute</t></w>
-<w><t>a-cute ac-cent</t></w>
-<w><t>a-cute arch</t></w>
+<phrase><t>a-cute ac-cent</t></phrase>
+<phrase><t>a-cute arch</t></phrase>
<w><t>a-cute-ly</t></w>
<w><t>a-cute-ness</t></w>
<w><t>a-cu-ti-lin-gual</t></w>
@@ -1664,12 +1664,12 @@
<w><t>ac-yl-at-ing</t></w>
<w><t>ac-yl-a-tion</t></w>
<w><t>a-cyl-o-in</t></w>
-<w><t>ad ho-mi-nem</t></w>
-<w><t>ad in-fi-ni-tum</t></w>
-<w><t>ad in-ter-im</t></w>
-<w><t>ad li-tem</t></w>
-<w><t>ad nau-se-am</t></w>
-<w><t>ad va-lo-rem</t></w>
+<phrase><t>ad ho-mi-nem</t></phrase>
+<phrase><t>ad in-fi-ni-tum</t></phrase>
+<phrase><t>ad in-ter-im</t></phrase>
+<phrase><t>ad li-tem</t></phrase>
+<phrase><t>ad nau-se-am</t></phrase>
+<phrase><t>ad va-lo-rem</t></phrase>
<w><t>ad=lib</t></w>
<w><t>A-da</t></w>
<w><t>A-da-bel</t></w>
@@ -1686,7 +1686,7 @@
<w><t>A-dal</t></w>
<w><t>Ad-al-bert</t></w>
<w><t>Ad-am</t></w>
-<w><t>Ad-am's ap-ple</t></w>
+<phrase><t>Ad-am's ap-ple</t></phrase>
<w><t>Ad-am's=nee-dle</t></w>
<w><t>Ad-am=and=Eve</t></w>
<w><t>Ad-a-ma</t></w>
@@ -1701,7 +1701,7 @@
<w><t>Ad-am-ite</t></w>
<w><t>Ad-am-it-ic</t></w>
<w><t>Ad-ams</t></w>
-<w><t>Ad-ams=Stokes syn-drome</t></w>
+<phrase><t>Ad-ams=Stokes syn-drome</t></phrase>
<w><t>ad-ams-ite</t></w>
<w><t>A-da-na</t></w>
<w><t>A-da-pa</t></w>
@@ -1716,19 +1716,19 @@
<w><t>a-dapt-er</t></w>
<w><t>a-dap-tion</t></w>
<w><t>a-dap-tive</t></w>
-<w><t>a-dap-tive ra-di-a-tion</t></w>
+<phrase><t>a-dap-tive ra-di-a-tion</t></phrase>
<w><t>a-dap-tive-ly</t></w>
<w><t>a-dap-tive-ness</t></w>
<w><t>a-dap-tor</t></w>
<w><t>A-dar</t></w>
-<w><t>A-dar She-ni</t></w>
+<phrase><t>A-dar She-ni</t></phrase>
<w><t>a-dat</t></w>
<w><t>ad-ax-i-al</t></w>
-<w><t>add</t></w>
+<w><t>add</t><verb regular-root="true"/></w>
<w><t>add-a-ble</t></w>
<w><t>Ad-dams</t></w>
<w><t>ad-dax</t></w>
-<w><t>add-ed sixth</t></w>
+<phrase><t>add-ed sixth</t></phrase>
<w><t>add-ed-ly</t></w>
<w><t>ad-dend</t></w>
<w><t>ad-den-dum</t></w>
@@ -1744,11 +1744,11 @@
<w><t>ad-dic-tion</t></w>
<w><t>ad-dic-tive</t></w>
<w><t>Ad-die</t></w>
-<w><t>add-ing ma-chine</t></w>
+<phrase><t>add-ing ma-chine</t></phrase>
<w><t>Ad-ding-ton</t></w>
-<w><t>Ad-dis Ab-a-ba</t></w>
+<phrase><t>Ad-dis Ab-a-ba</t></phrase>
<w><t>Ad-di-son</t></w>
-<w><t>Ad-di-son's dis-ease</t></w>
+<phrase><t>Ad-di-son's dis-ease</t></phrase>
<w><t>Ad-di-so-ni-an</t></w>
<w><t>ad-dit-a-ment</t></w>
<w><t>ad-dit-a-men-ta-ry</t></w>
@@ -1823,7 +1823,7 @@
<w><t>ad-e-no-sar-co-mas</t></w>
<w><t>ad-e-no-sar-co-ma-ta</t></w>
<w><t>a-den-o-sine</t></w>
-<w><t>a-den-o-sine tri-phos-phate</t></w>
+<phrase><t>a-den-o-sine tri-phos-phate</t></phrase>
<w><t>ad-e-no-vi-rus</t></w>
<w><t>ad-e-no-vi-rus-es</t></w>
<w><t>ad-e-nyl-py-ro-phos-phate</t></w>
@@ -1896,11 +1896,11 @@
<w><t>ad-i-po-pex-ic</t></w>
<w><t>ad-i-po-pex-is</t></w>
<w><t>ad-i-pose</t></w>
-<w><t>ad-i-pose fin</t></w>
+<phrase><t>ad-i-pose fin</t></phrase>
<w><t>ad-i-pose-ness</t></w>
<w><t>ad-i-pos-i-ty</t></w>
<w><t>Ad-i-ron-dack</t></w>
-<w><t>Ad-i-ron-dack Moun-tains</t></w>
+<phrase><t>Ad-i-ron-dack Moun-tains</t></phrase>
<w><t>Ad-i-ron-dacks</t></w>
<w><t>ad-it</t></w>
<w><t>A-dit-ya</t></w>
@@ -1952,14 +1952,14 @@
<w><t>ad-jus-tor</t></w>
<w><t>ad-ju-tan-cy</t></w>
<w><t>ad-ju-tant</t></w>
-<w><t>ad-ju-tant bird</t></w>
-<w><t>ad-ju-tant gen-er-al</t></w>
+<phrase><t>ad-ju-tant bird</t></phrase>
+<phrase><t>ad-ju-tant gen-er-al</t></phrase>
<w><t>ad-ju-vant</t></w>
<w><t>Ad-lai</t></w>
<w><t>Ad-ler</t></w>
<w><t>Ad-le-ri-an</t></w>
<w><t>Ad-ley</t></w>
-<w><t>A-d-lie Coast</t></w>
+<phrase><t>A-d-lie Coast</t></phrase>
<w><t>Ad-mah</t></w>
<w><t>ad-man</t></w>
<w><t>ad-mass</t></w>
@@ -1995,11 +1995,11 @@
<w><t>Ad-mi-ral</t></w>
<w><t>ad-mi-ral-ship</t></w>
<w><t>ad-mi-ral-ty</t></w>
-<w><t>Ad-mi-ral-ty Board</t></w>
-<w><t>Ad-mi-ral-ty House</t></w>
-<w><t>Ad-mi-ral-ty Is-lands</t></w>
-<w><t>Ad-mi-ral-ty mile</t></w>
-<w><t>Ad-mi-ral-ty Range</t></w>
+<phrase><t>Ad-mi-ral-ty Board</t></phrase>
+<phrase><t>Ad-mi-ral-ty House</t></phrase>
+<phrase><t>Ad-mi-ral-ty Is-lands</t></phrase>
+<phrase><t>Ad-mi-ral-ty mile</t></phrase>
+<phrase><t>Ad-mi-ral-ty Range</t></phrase>
<w><t>ad-mi-ra-tion</t></w>
<w><t>ad-mi-ra-tive</t></w>
<w><t>ad-mi-ra-tive-ly</t></w>
@@ -2041,7 +2041,7 @@
<w><t>ad-noun</t></w>
<w><t>a-do</t></w>
<w><t>a-do-be</t></w>
-<w><t>a-do-be flat</t></w>
+<phrase><t>a-do-be flat</t></phrase>
<w><t>ad-o-les-cence</t></w>
<w><t>ad-o-les-cent</t></w>
<w><t>ad-o-les-cent-ly</t></w>
@@ -2097,8 +2097,8 @@
<w><t>A-dras-tos</t></w>
<w><t>A-dras-tus</t></w>
<w><t>ad-re-nal</t></w>
-<w><t>ad-re-nal gland</t></w>
-<w><t>ad-re-nal in-suf-fi-cien-cy</t></w>
+<phrase><t>ad-re-nal gland</t></phrase>
+<phrase><t>ad-re-nal in-suf-fi-cien-cy</t></phrase>
<w><t>ad-re-nal-ec-to-mize</t></w>
<w><t>ad-re-nal-ec-to-mized</t></w>
<w><t>ad-re-nal-ec-to-miz-ing</t></w>
@@ -2111,18 +2111,18 @@
<w><t>a-dre-nine</t></w>
<w><t>a-dre-no-cor-ti-co-troph-ic</t></w>
<w><t>a-dre-no-cor-ti-co-trop-ic</t></w>
-<w><t>a-dre-no-cor-ti-co-trop-ic hor-mone</t></w>
+<phrase><t>a-dre-no-cor-ti-co-trop-ic hor-mone</t></phrase>
<w><t>ad-res</t></w>
<w><t>A-dres-tus</t></w>
<w><t>a-dret</t></w>
<w><t>A-dri-aen</t></w>
<w><t>A-dri-an</t></w>
-<w><t>A-dri-an IV</t></w>
+<phrase><t>A-dri-an IV</t></phrase>
<w><t>A-dri-an-o-ple</t></w>
<w><t>A-dri-an-op-o-lis</t></w>
<w><t>A-dri-a-nop-o-lis</t></w>
<w><t>A-dri-at-ic</t></w>
-<w><t>A-dri-at-ic Sea</t></w>
+<phrase><t>A-dri-at-ic Sea</t></phrase>
<w><t>A-dri-enne</t></w>
<w><t>a-drift</t></w>
<w><t>a-droit</t></w>
@@ -2142,7 +2142,7 @@
<w><t>ad-sorp-tion</t></w>
<w><t>ad-sorp-tive</t></w>
<w><t>ad-sorp-tive-ly</t></w>
-<w><t>ad-su-ki bean</t></w>
+<phrase><t>ad-su-ki bean</t></phrase>
<w><t>ad-sum</t></w>
<w><t>ad-u-la-res-cence</t></w>
<w><t>ad-u-la-res-cent</t></w>
@@ -2185,10 +2185,10 @@
<w><t>adv</t></w>
<w><t>Ad-vai-ta</t></w>
<w><t>ad-vance</t></w>
-<w><t>ad-vance guard</t></w>
-<w><t>ad-vance poll</t></w>
+<phrase><t>ad-vance guard</t></phrase>
+<phrase><t>ad-vance poll</t></phrase>
<w><t>ad-vanced</t></w>
-<w><t>ad-vanced gas=cooled re-ac-tor</t></w>
+<phrase><t>ad-vanced gas=cooled re-ac-tor</t></phrase>
<w><t>ad-vance-ment</t></w>
<w><t>ad-vanc-er</t></w>
<w><t>ad-vanc-es</t></w>
@@ -2205,7 +2205,7 @@
<w><t>ad-ve-nae</t></w>
<w><t>Ad-vent</t></w>
<w><t>ad-vent</t></w>
-<w><t>Ad-vent Sun-day</t></w>
+<phrase><t>Ad-vent Sun-day</t></phrase>
<w><t>Ad-vent-ism</t></w>
<w><t>Ad-vent-ist</t></w>
<w><t>ad-ven-ti-ti-a</t></w>
@@ -2216,7 +2216,7 @@
<w><t>ad-ven-tive</t></w>
<w><t>ad-ven-tive-ly</t></w>
<w><t>ad-ven-ture</t></w>
-<w><t>ad-ven-ture play-ground</t></w>
+<phrase><t>ad-ven-ture play-ground</t></phrase>
<w><t>ad-ven-tured</t></w>
<w><t>ad-ven-ture-ful</t></w>
<w><t>ad-ven-tur-er</t></w>
@@ -2240,7 +2240,7 @@
<w><t>ad-ver-sa-tive</t></w>
<w><t>ad-ver-sa-tive-ly</t></w>
<w><t>ad-verse</t></w>
-<w><t>ad-verse pos-ses-sion</t></w>
+<phrase><t>ad-verse pos-ses-sion</t></phrase>
<w><t>ad-verse-ly</t></w>
<w><t>ad-verse-ness</t></w>
<w><t>ad-ver-si-ty</t></w>
@@ -2285,12 +2285,12 @@
<w><t>ad-vo-cat-ing</t></w>
<w><t>ad-vo-ca-tion</t></w>
<w><t>ad-voc-a-to-ry</t></w>
-<w><t>ad-vo-ca-tus di-ab-o-li</t></w>
+<phrase><t>ad-vo-ca-tus di-ab-o-li</t></phrase>
<w><t>ad-vow-son</t></w>
<w><t>advt</t></w>
<w><t>A-dy-ge</t></w>
<w><t>A-dy-gei</t></w>
-<w><t>A-dy-gei Au-ton-o-mous Re-gion</t></w>
+<phrase><t>A-dy-gei Au-ton-o-mous Re-gion</t></phrase>
<w><t>A-dy-ghe</t></w>
<w><t>ad-y-na-mi-a</t></w>
<w><t>ad-y-nam-ic</t></w>
@@ -2299,7 +2299,7 @@
<w><t>adze</t></w>
<w><t>A-dzhar</t></w>
<w><t>A-dzhar-i-stan</t></w>
-<w><t>ad-zu-ki bean</t></w>
+<phrase><t>ad-zu-ki bean</t></phrase>
<w><t>Ae-ac-i-des</t></w>
<w><t>Ae-a-cus</t></w>
<w><t>Ae-ae-a</t></w>
@@ -2324,8 +2324,8 @@
<w><t>Ae-gae-on</t></w>
<w><t>Ae-ga-tes</t></w>
<w><t>Ae-ge-an</t></w>
-<w><t>Ae-ge-an Is-lands</t></w>
-<w><t>Ae-ge-an Sea</t></w>
+<phrase><t>Ae-ge-an Is-lands</t></phrase>
+<phrase><t>Ae-ge-an Sea</t></phrase>
<w><t>Ae-ge-ri-a</t></w>
<w><t>Ae-ges-ta</t></w>
<w><t>Ae-ge-us</t></w>
@@ -2355,7 +2355,7 @@
<w><t>A-el-la</t></w>
<w><t>A-el-lo</t></w>
<w><t>Ae-ne-as</t></w>
-<w><t>Ae-ne-as Sil-vi-us</t></w>
+<phrase><t>Ae-ne-as Sil-vi-us</t></phrase>
<w><t>Ae-ne-id</t></w>
<w><t>A-e-ne-o-lith-ic</t></w>
<w><t>a-e-ne-ous</t></w>
@@ -2362,8 +2362,8 @@
<w><t>Ae-ni-us</t></w>
<w><t>Ae-o-li-a</t></w>
<w><t>Ae-o-li-an</t></w>
-<w><t>ae-o-li-an harp</t></w>
-<w><t>Ae-o-li-an Is-lands</t></w>
+<phrase><t>ae-o-li-an harp</t></phrase>
+<phrase><t>Ae-o-li-an Is-lands</t></phrase>
<w><t>Ae-ol-ic</t></w>
<w><t>Ae-o-li-des</t></w>
<w><t>ae-ol-i-pile</t></w>
@@ -2388,9 +2388,9 @@
<w><t>aer-en-chy-ma</t></w>
<w><t>Aer-i-a</t></w>
<w><t>aer-i-al</t></w>
-<w><t>aer-i-al lad-der</t></w>
-<w><t>aer-i-al per-spec-tive</t></w>
-<w><t>aer-i-al ping-pong</t></w>
+<phrase><t>aer-i-al lad-der</t></phrase>
+<phrase><t>aer-i-al per-spec-tive</t></phrase>
+<phrase><t>aer-i-al ping-pong</t></phrase>
<w><t>aer-i-al-ist</t></w>
<w><t>aer-i-al-i-ty</t></w>
<w><t>aer-i-al-ly</t></w>
@@ -2404,7 +2404,7 @@
<w><t>aer-i-fy</t></w>
<w><t>aer-i-fy-ing</t></w>
<w><t>aer-o</t></w>
-<w><t>aer-o en-gine</t></w>
+<phrase><t>aer-o en-gine</t></phrase>
<w><t>aer-o-bac-te-ri-o-log-i-cal</t></w>
<w><t>aer-o-bac-te-ri-o-log-i-cal-ly</t></w>
<w><t>aer-o-bac-te-ri-ol-o-gist</t></w>
@@ -2478,7 +2478,7 @@
<w><t>aer-o-naut</t></w>
<w><t>aer-o-naut-ic</t></w>
<w><t>aer-o-nau-ti-cal</t></w>
-<w><t>aer-o-nau-ti-cal en-gi-neer-ing</t></w>
+<phrase><t>aer-o-nau-ti-cal en-gi-neer-ing</t></phrase>
<w><t>aer-o-nau-ti-cal-ly</t></w>
<w><t>aer-o-naut-ics</t></w>
<w><t>aer-o-neu-ro-sis</t></w>
@@ -2498,8 +2498,8 @@
<w><t>aer-o-pho-tog-ra-phy</t></w>
<w><t>aer-o-phyte</t></w>
<w><t>aer-o-plane</t></w>
-<w><t>aer-o-plane cloth</t></w>
-<w><t>aer-o-plane spin</t></w>
+<phrase><t>aer-o-plane cloth</t></phrase>
+<phrase><t>aer-o-plane spin</t></phrase>
<w><t>aer-o-plank-ton</t></w>
<w><t>aer-o-pulse</t></w>
<w><t>aer-o-scep-sis</t></w>
@@ -2587,7 +2587,7 @@
<w><t>af-fa-bly</t></w>
<w><t>af-fair</t></w>
<w><t>af-faire</t></w>
-<w><t>af-faire d'hon-neur</t></w>
+<phrase><t>af-faire d'hon-neur</t></phrase>
<w><t>af-fairs</t></w>
<w><t>af-fect</t></w>
<w><t>af-fec-ta-tion</t></w>
@@ -2597,11 +2597,11 @@
<w><t>af-fect-er</t></w>
<w><t>af-fect-ing</t></w>
<w><t>af-fect-ing-ly</t></w>
-<w><t>af-fec-tion</t></w>
-<w><t>af-fec-tion-al</t></w>
+<w><t>af-fec-tion</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>af-fec-tion-al</t><adjective extensible="false"/></w>
<w><t>af-fec-tion-al-ly</t></w>
-<w><t>af-fec-tion-ate</t></w>
-<w><t>af-fec-tion-ate-ly</t></w>
+<w><t>af-fec-tion-ate</t><adjective extensible="false"/></w>
+<w><t>af-fec-tion-ate-ly</t><adverb/></w>
<w><t>af-fec-tion-ate-ness</t></w>
<w><t>af-fec-tive</t></w>
<w><t>af-fec-tiv-i-ty</t></w>
@@ -2621,8 +2621,8 @@
<w><t>af-fil-i-at-ed</t></w>
<w><t>af-fil-i-at-ing</t></w>
<w><t>af-fil-i-a-tion</t></w>
-<w><t>af-fil-i-a-tion or-der</t></w>
-<w><t>af-fil-i-a-tion pro-ceed-ings</t></w>
+<phrase><t>af-fil-i-a-tion or-der</t></phrase>
+<phrase><t>af-fil-i-a-tion pro-ceed-ings</t></phrase>
<w><t>af-fi-nal</t></w>
<w><t>af-fine</t></w>
<w><t>af-fined</t></w>
@@ -2688,7 +2688,7 @@
<w><t>af-fu-sion</t></w>
<w><t>af-ghan</t></w>
<w><t>Af-ghan</t></w>
-<w><t>Af-ghan hound</t></w>
+<phrase><t>Af-ghan hound</t></phrase>
<w><t>af-ghan-ets</t></w>
<w><t>Af-ghan-i</t></w>
<w><t>af-ghan-i</t></w>
@@ -2719,9 +2719,9 @@
<w><t>Af-ric</t></w>
<w><t>Af-ri-ca</t></w>
<w><t>Af-ri-can</t></w>
-<w><t>Af-ri-can lil-y</t></w>
-<w><t>Af-ri-can ma-hog-a-ny</t></w>
-<w><t>Af-ri-can vi-o-let</t></w>
+<phrase><t>Af-ri-can lil-y</t></phrase>
+<phrase><t>Af-ri-can ma-hog-a-ny</t></phrase>
+<phrase><t>Af-ri-can vi-o-let</t></phrase>
<w><t>Af-ri-can-der</t></w>
<w><t>Af-ri-can-der-ism</t></w>
<w><t>Af-ri-can-ise</t></w>
@@ -2783,7 +2783,7 @@
<w><t>af-ter-sen-sa-tion</t></w>
<w><t>af-ter-shaft</t></w>
<w><t>af-ter-shaft-ed</t></w>
-<w><t>af-ter-shave lo-tion</t></w>
+<phrase><t>af-ter-shave lo-tion</t></phrase>
<w><t>af-ter-shock</t></w>
<w><t>af-ter-taste</t></w>
<w><t>af-ter-thought</t></w>
@@ -2797,8 +2797,8 @@
<w><t>aft-most</t></w>
<w><t>A-fyon</t></w>
<w><t>a-ga</t></w>
-<w><t>A-ga Khan</t></w>
-<w><t>A-ga Khan IV</t></w>
+<phrase><t>A-ga Khan</t></phrase>
+<phrase><t>A-ga Khan IV</t></phrase>
<w><t>Ag-a-bus</t></w>
<w><t>Ag-a-cles</t></w>
<w><t>A-ga-dir</t></w>
@@ -2847,7 +2847,7 @@
<w><t>A-gas-tya</t></w>
<w><t>ag-a-ta</t></w>
<w><t>ag-ate</t></w>
-<w><t>ag-ate line</t></w>
+<phrase><t>ag-ate line</t></phrase>
<w><t>ag-ate-like</t></w>
<w><t>ag-ate-ware</t></w>
<w><t>Ag-a-tha</t></w>
@@ -2867,9 +2867,9 @@
<w><t>ag-ba</t></w>
<w><t>agcy</t></w>
<w><t>age</t></w>
-<w><t>age har-den-ing</t></w>
-<w><t>age of con-sent</t></w>
-<w><t>Age of Rea-son</t></w>
+<phrase><t>age har-den-ing</t></phrase>
+<phrase><t>age of con-sent</t></phrase>
+<phrase><t>Age of Rea-son</t></phrase>
<w><t>age=old</t></w>
<w><t>a-ged</t></w>
<w><t>a-ged-ly</t></w>
@@ -2893,8 +2893,8 @@
<w><t>A-ge-nois</t></w>
<w><t>A-ge-nor</t></w>
<w><t>a-gent</t></w>
-<w><t>a-gent of pro-duc-tion</t></w>
-<w><t>a-gent pro-vo-ca-teur</t></w>
+<phrase><t>a-gent of pro-duc-tion</t></phrase>
+<phrase><t>a-gent pro-vo-ca-teur</t></phrase>
<w><t>a-gent=gen-er-al</t></w>
<w><t>a-gen-tial</t></w>
<w><t>a-gen-ti-val</t></w>
@@ -3044,7 +3044,7 @@
<w><t>ag-nos-tic</t></w>
<w><t>ag-nos-ti-cal-ly</t></w>
<w><t>ag-nos-ti-cism</t></w>
-<w><t>Ag-nus De-i</t></w>
+<phrase><t>Ag-nus De-i</t></phrase>
<w><t>a-go</t></w>
<w><t>a-gog</t></w>
<w><t>a-gog-ics</t></w>
@@ -3054,7 +3054,7 @@
<w><t>a-gone</t></w>
<w><t>a-go-nes</t></w>
<w><t>a-gon-ic</t></w>
-<w><t>a-gon-ic line</t></w>
+<phrase><t>a-gon-ic line</t></phrase>
<w><t>ag-o-nise</t></w>
<w><t>ag-o-nised</t></w>
<w><t>ag-o-nis-ing</t></w>
@@ -3068,7 +3068,7 @@
<w><t>ag-o-niz-ing</t></w>
<w><t>ag-o-niz-ing-ly</t></w>
<w><t>ag-o-ny</t></w>
-<w><t>ag-o-ny col-umn</t></w>
+<phrase><t>ag-o-ny col-umn</t></phrase>
<w><t>ag-o-ra</t></w>
<w><t>a-go-ra</t></w>
<w><t>ag-o-rae</t></w>
@@ -3209,7 +3209,7 @@
<w><t>a-hue-hue-te</t></w>
<w><t>a-hull</t></w>
<w><t>a-hun-gered</t></w>
-<w><t>A-hu-ra Maz-da</t></w>
+<phrase><t>A-hu-ra Maz-da</t></phrase>
<w><t>A-huz-zath</t></w>
<w><t>Ah-vaz</t></w>
<w><t>Ah-ve-nan-maa</t></w>
@@ -3272,31 +3272,31 @@
<w><t>Ai-nu</t></w>
<w><t>air</t></w>
<w><t>A-ir</t></w>
-<w><t>air a-lert</t></w>
-<w><t>air blad-der</t></w>
-<w><t>air chief mar-shal</t></w>
-<w><t>air com-mo-dore</t></w>
-<w><t>air con-di-tion-ing</t></w>
-<w><t>air cor-ri-dor</t></w>
-<w><t>air cov-er</t></w>
-<w><t>air cur-tain</t></w>
-<w><t>air cush-ion</t></w>
-<w><t>air cyl-in-der</t></w>
-<w><t>air em-bo-lism</t></w>
-<w><t>air host-ess</t></w>
-<w><t>air jack-et</t></w>
-<w><t>air let-ter</t></w>
-<w><t>air mar-shal</t></w>
-<w><t>Air Of-fic-er</t></w>
-<w><t>air pock-et</t></w>
-<w><t>air pow-er</t></w>
-<w><t>air ri-fle</t></w>
-<w><t>air sta-tion</t></w>
-<w><t>air ter-mi-nal</t></w>
-<w><t>air traf-fic</t></w>
-<w><t>air tur-bine</t></w>
-<w><t>air ves-i-cle</t></w>
-<w><t>air vice=mar-shal</t></w>
+<phrase><t>air a-lert</t></phrase>
+<phrase><t>air blad-der</t></phrase>
+<phrase><t>air chief mar-shal</t></phrase>
+<phrase><t>air com-mo-dore</t></phrase>
+<phrase><t>air con-di-tion-ing</t></phrase>
+<phrase><t>air cor-ri-dor</t></phrase>
+<phrase><t>air cov-er</t></phrase>
+<phrase><t>air cur-tain</t></phrase>
+<phrase><t>air cush-ion</t></phrase>
+<phrase><t>air cyl-in-der</t></phrase>
+<phrase><t>air em-bo-lism</t></phrase>
+<phrase><t>air host-ess</t></phrase>
+<phrase><t>air jack-et</t></phrase>
+<phrase><t>air let-ter</t></phrase>
+<phrase><t>air mar-shal</t></phrase>
+<phrase><t>Air Of-fic-er</t></phrase>
+<phrase><t>air pock-et</t></phrase>
+<phrase><t>air pow-er</t></phrase>
+<phrase><t>air ri-fle</t></phrase>
+<phrase><t>air sta-tion</t></phrase>
+<phrase><t>air ter-mi-nal</t></phrase>
+<phrase><t>air traf-fic</t></phrase>
+<phrase><t>air tur-bine</t></phrase>
+<phrase><t>air ves-i-cle</t></phrase>
+<phrase><t>air vice=mar-shal</t></phrase>
<w><t>air=breath-er</t></w>
<w><t>air=con-di-tion</t></w>
<w><t>air=hard-en-ing</t></w>
@@ -3303,10 +3303,10 @@
<w><t>air=in-take</t></w>
<w><t>air=mind-ed</t></w>
<w><t>air=mind-ed-ness</t></w>
-<w><t>air=raid ward-en</t></w>
-<w><t>air=sea res-cue</t></w>
+<phrase><t>air=raid ward-en</t></phrase>
+<phrase><t>air=sea res-cue</t></phrase>
<w><t>air=to=sur-face</t></w>
-<w><t>air=traf-fic con-trol</t></w>
+<phrase><t>air=traf-fic con-trol</t></phrase>
<w><t>air=twist-ed</t></w>
<w><t>air-bill</t></w>
<w><t>air-boat</t></w>
@@ -3317,8 +3317,8 @@
<w><t>air-burst</t></w>
<w><t>air-bus</t></w>
<w><t>air-craft</t></w>
-<w><t>air-craft car-ri-er</t></w>
-<w><t>air-craft cloth</t></w>
+<phrase><t>air-craft car-ri-er</t></phrase>
+<phrase><t>air-craft cloth</t></phrase>
<w><t>air-craft-man</t></w>
<w><t>air-crafts-man</t></w>
<w><t>air-crafts-wom-an</t></w>
@@ -3462,16 +3462,16 @@
<w><t>A-ku-ta-ga-wa</t></w>
<w><t>ak-va-vit</t></w>
<w><t>Ak-yab</t></w>
-<w><t>al den-te</t></w>
-<w><t>Al Fai-y</t></w>
-<w><t>Al Fat-ah</t></w>
-<w><t>Al Ha-sa</t></w>
-<w><t>Al Ho-fuf</t></w>
-<w><t>Al Hu-fuf</t></w>
-<w><t>Al Ma-di-nah</t></w>
-<w><t>Al Man-s-rah</t></w>
-<w><t>Al Man-su-rah</t></w>
-<w><t>Al Si-rat</t></w>
+<phrase><t>al den-te</t></phrase>
+<phrase><t>Al Fai-y</t></phrase>
+<phrase><t>Al Fat-ah</t></phrase>
+<phrase><t>Al Ha-sa</t></phrase>
+<phrase><t>Al Ho-fuf</t></phrase>
+<phrase><t>Al Hu-fuf</t></phrase>
+<phrase><t>Al Ma-di-nah</t></phrase>
+<phrase><t>Al Man-s-rah</t></phrase>
+<phrase><t>Al Man-su-rah</t></phrase>
+<phrase><t>Al Si-rat</t></phrase>
<w><t>al=Fus-tat</t></w>
<w><t>Al=Ga-zel</t></w>
<w><t>Al=Is-kan-da-rî-yah</t></w>
@@ -3530,7 +3530,7 @@
<w><t>A-lar-cón</t></w>
<w><t>Al-a-ric</t></w>
<w><t>a-larm</t></w>
-<w><t>a-larm clock</t></w>
+<phrase><t>a-larm clock</t></phrase>
<w><t>a-larm-a-ble</t></w>
<w><t>a-larm-ed-ly</t></w>
<w><t>a-larm-ing-ly</t></w>
@@ -3541,9 +3541,9 @@
<w><t>a-la-ry</t></w>
<w><t>a-las</t></w>
<w><t>A-las-ka</t></w>
-<w><t>A-las-ka High-way</t></w>
-<w><t>A-las-ka Pen-in-su-la</t></w>
-<w><t>A-las-ka Range</t></w>
+<phrase><t>A-las-ka High-way</t></phrase>
+<phrase><t>A-las-ka Pen-in-su-la</t></phrase>
+<phrase><t>A-las-ka Range</t></phrase>
<w><t>A-las-kan</t></w>
<w><t>Al-as-tair</t></w>
<w><t>Al-as-ter</t></w>
@@ -3555,7 +3555,7 @@
<w><t>alb</t></w>
<w><t>al-ba</t></w>
<w><t>Al-ba</t></w>
-<w><t>Al-ba Lon-ga</t></w>
+<phrase><t>Al-ba Lon-ga</t></phrase>
<w><t>Al-ba-ce-te</t></w>
<w><t>al-ba-core</t></w>
<w><t>Al-ba-my-cin</t></w>
@@ -3577,7 +3577,7 @@
<w><t>Al-bee</t></w>
<w><t>al-be-it</t></w>
<w><t>Al-be-marle</t></w>
-<w><t>Al-be-marle Sound</t></w>
+<phrase><t>Al-be-marle Sound</t></phrase>
<w><t>Al-ben</t></w>
<w><t>al-ber-go</t></w>
<w><t>Al-ber-ich</t></w>
@@ -3584,8 +3584,8 @@
<w><t>Al-be-ro-ni</t></w>
<w><t>Al-bert</t></w>
<w><t>al-bert</t></w>
-<w><t>Al-bert Ed-ward</t></w>
-<w><t>Al-bert I</t></w>
+<phrase><t>Al-bert Ed-ward</t></phrase>
+<phrase><t>Al-bert I</t></phrase>
<w><t>Al-ber-ta</t></w>
<w><t>Al-ber-ti</t></w>
<w><t>Al-ber-ti-na</t></w>
@@ -3593,7 +3593,7 @@
<w><t>Al-ber-tist</t></w>
<w><t>al-bert-ite</t></w>
<w><t>Al-ber-to</t></w>
-<w><t>Al-ber-tus Mag-nus</t></w>
+<phrase><t>Al-ber-tus Mag-nus</t></phrase>
<w><t>Al-bert-ville</t></w>
<w><t>al-ber-type</t></w>
<w><t>al-bes-cence</t></w>
@@ -3661,7 +3661,7 @@
<w><t>al-cal-de</t></w>
<w><t>al-ca-lig-e-nes</t></w>
<w><t>Al-ca-lá</t></w>
-<w><t>Al-can High-way</t></w>
+<phrase><t>Al-can High-way</t></phrase>
<w><t>Al-can-dre</t></w>
<w><t>al-cap-ton</t></w>
<w><t>al-cap-ton-u-ri-a</t></w>
@@ -3710,7 +3710,7 @@
<w><t>al-co-hol-ic</t></w>
<w><t>al-co-hol-i-cal-ly</t></w>
<w><t>al-co-hol-ic-i-ty</t></w>
-<w><t>Al-co-hol-ics A-non-y-mous</t></w>
+<phrase><t>Al-co-hol-ics A-non-y-mous</t></phrase>
<w><t>al-co-hol-ise</t></w>
<w><t>al-co-hol-ised</t></w>
<w><t>al-co-hol-is-ing</t></w>
@@ -3744,8 +3744,8 @@
<w><t>Al-den</t></w>
<w><t>al-der</t></w>
<w><t>Al-der</t></w>
-<w><t>al-der buck-thorn</t></w>
-<w><t>al-der fly</t></w>
+<phrase><t>al-der buck-thorn</t></phrase>
+<phrase><t>al-der fly</t></phrase>
<w><t>al-der-fly</t></w>
<w><t>al-der-man</t></w>
<w><t>al-der-man-cy</t></w>
@@ -3758,7 +3758,7 @@
<w><t>Al-dine</t></w>
<w><t>Al-ding-ton</t></w>
<w><t>Al-dis</t></w>
-<w><t>Al-dis lamp</t></w>
+<phrase><t>Al-dis lamp</t></phrase>
<w><t>Al-do</t></w>
<w><t>al-do-hex-ose</t></w>
<w><t>al-dol</t></w>
@@ -3771,7 +3771,7 @@
<w><t>Al-dridge=Brown-hills</t></w>
<w><t>al-drin</t></w>
<w><t>Al-dus</t></w>
-<w><t>Al-dus Ma-nu-ti-us</t></w>
+<phrase><t>Al-dus Ma-nu-ti-us</t></phrase>
<w><t>ale</t></w>
<w><t>A-le-a</t></w>
<w><t>A-le-ar-di</t></w>
@@ -3807,7 +3807,7 @@
<w><t>a-lem-bi-cat-ed</t></w>
<w><t>A-le-mán</t></w>
<w><t>A-len-con</t></w>
-<w><t>A-len-con lace</t></w>
+<phrase><t>A-len-con lace</t></phrase>
<w><t>A-lene</t></w>
<w><t>A-len-çon</t></w>
<w><t>A-lep</t></w>
@@ -3815,7 +3815,7 @@
<w><t>a-leph=null</t></w>
<w><t>a-leph=ze-ro</t></w>
<w><t>A-lep-po</t></w>
-<w><t>A-lep-po gall</t></w>
+<phrase><t>A-lep-po gall</t></phrase>
<w><t>a-ler-ce</t></w>
<w><t>a-le-ri-on</t></w>
<w><t>a-lert</t></w>
@@ -3844,7 +3844,7 @@
<w><t>A-le-us</t></w>
<w><t>Al-eut</t></w>
<w><t>A-leu-tian</t></w>
-<w><t>A-leu-tian Is-lands</t></w>
+<phrase><t>A-leu-tian Is-lands</t></phrase>
<w><t>A-leu-tians</t></w>
<w><t>al-e-vin</t></w>
<w><t>ale-wife</t></w>
@@ -3852,14 +3852,14 @@
<w><t>A-lex-a</t></w>
<w><t>Al-ex-an-der</t></w>
<w><t>al-ex-an-der</t></w>
-<w><t>Al-ex-an-der Ar-chi-pel-a-go</t></w>
-<w><t>Al-ex-an-der I</t></w>
-<w><t>Al-ex-an-der I Is-land</t></w>
-<w><t>Al-ex-an-der II</t></w>
-<w><t>Al-ex-an-der III</t></w>
-<w><t>Al-ex-an-der Nev-ski</t></w>
-<w><t>Al-ex-an-der the Great</t></w>
-<w><t>Al-ex-an-der VI</t></w>
+<phrase><t>Al-ex-an-der Ar-chi-pel-a-go</t></phrase>
+<phrase><t>Al-ex-an-der I</t></phrase>
+<phrase><t>Al-ex-an-der I Is-land</t></phrase>
+<phrase><t>Al-ex-an-der II</t></phrase>
+<phrase><t>Al-ex-an-der III</t></phrase>
+<phrase><t>Al-ex-an-der Nev-ski</t></phrase>
+<phrase><t>Al-ex-an-der the Great</t></phrase>
+<phrase><t>Al-ex-an-der VI</t></phrase>
<w><t>al-ex-an-ders</t></w>
<w><t>Al-ex-an-der-son</t></w>
<w><t>Al-ex-an-dra</t></w>
@@ -3881,8 +3881,8 @@
<w><t>al-ex-in-ic</t></w>
<w><t>a-lex-i-phar-mic</t></w>
<w><t>A-lex-is</t></w>
-<w><t>A-lex-is Mi-khai-lo-vich</t></w>
-<w><t>A-lex-i-us I Com-ne-nus</t></w>
+<phrase><t>A-lex-is Mi-khai-lo-vich</t></phrase>
+<phrase><t>A-lex-i-us I Com-ne-nus</t></phrase>
<w><t>ale-yard</t></w>
<w><t>alfa</t></w>
<w><t>Al-fa-dir</t></w>
@@ -3895,12 +3895,12 @@
<w><t>al-fil-a-ri-a</t></w>
<w><t>al-fil-e-ri-a</t></w>
<w><t>Al-fon-so</t></w>
-<w><t>Al-fon-so XIII</t></w>
+<phrase><t>Al-fon-so XIII</t></phrase>
<w><t>Al-fon-son</t></w>
<w><t>al-for-ja</t></w>
<w><t>Al-fra-ga-nus</t></w>
<w><t>Al-fred</t></w>
-<w><t>Al-fred the Great</t></w>
+<phrase><t>Al-fred the Great</t></phrase>
<w><t>Al-fre-da</t></w>
<w><t>Al-fre-do</t></w>
<w><t>al-fres-co</t></w>
@@ -3916,7 +3916,7 @@
<w><t>al-gar-ro-ba</t></w>
<w><t>al-ge-bra</t></w>
<w><t>al-ge-bra-ic</t></w>
-<w><t>al-ge-bra-ic num-ber</t></w>
+<phrase><t>al-ge-bra-ic num-ber</t></phrase>
<w><t>al-ge-bra-i-cal</t></w>
<w><t>al-ge-bra-i-cal-ly</t></w>
<w><t>al-ge-bra-ist</t></w>
@@ -3943,7 +3943,7 @@
<w><t>Al-giers</t></w>
<w><t>al-gin</t></w>
<w><t>al-gi-nate</t></w>
-<w><t>al-gin-ic ac-id</t></w>
+<phrase><t>al-gin-ic ac-id</t></phrase>
<w><t>al-goid</t></w>
<w><t>Al-gol</t></w>
<w><t>al-go-lag-ni-a</t></w>
@@ -3964,7 +3964,7 @@
<w><t>Al-gon-kin</t></w>
<w><t>Al-gon-qui-an</t></w>
<w><t>Al-gon-quin</t></w>
-<w><t>Al-gon-quin Park</t></w>
+<phrase><t>Al-gon-quin Park</t></phrase>
<w><t>al-goph-a-gous</t></w>
<w><t>al-go-pho-bi-a</t></w>
<w><t>al-gor</t></w>
@@ -3981,8 +3981,8 @@
<w><t>Al-ham-bresque</t></w>
<w><t>Al-ha-zen</t></w>
<w><t>A-li</t></w>
-<w><t>A-li Ba-ba</t></w>
-<w><t>A-li Pa-sha</t></w>
+<phrase><t>A-li Ba-ba</t></phrase>
+<phrase><t>A-li Pa-sha</t></phrase>
<w><t>A-li-a-cen-sis</t></w>
<w><t>alias</t></w>
<w><t>a-li-as</t></w>
@@ -3992,7 +3992,7 @@
<w><t>al-i-ble</t></w>
<w><t>Al-i-can-te</t></w>
<w><t>Al-ice</t></w>
-<w><t>Al-ice Springs</t></w>
+<phrase><t>Al-ice Springs</t></phrase>
<w><t>Al-ice=in=Won-der-land</t></w>
<w><t>Al-ice-ville</t></w>
<w><t>A-li-cia</t></w>
@@ -4020,7 +4020,7 @@
<w><t>a-light</t></w>
<w><t>a-light-ed</t></w>
<w><t>a-light-ing</t></w>
-<w><t>a-light-ing gear</t></w>
+<phrase><t>a-light-ing gear</t></phrase>
<w><t>a-lign</t></w>
<w><t>a-lign-er</t></w>
<w><t>a-lign-ment</t></w>
@@ -4030,7 +4030,7 @@
<w><t>al-i-men-tal</t></w>
<w><t>al-i-men-tal-ly</t></w>
<w><t>al-i-men-ta-ry</t></w>
-<w><t>al-i-men-ta-ry ca-nal</t></w>
+<phrase><t>al-i-men-ta-ry ca-nal</t></phrase>
<w><t>al-i-men-ta-tion</t></w>
<w><t>al-i-men-ta-tive</t></w>
<w><t>al-i-men-ta-tive-ly</t></w>
@@ -4078,9 +4078,9 @@
<w><t>al-ka-les-cen-cy</t></w>
<w><t>al-ka-les-cent</t></w>
<w><t>al-ka-li</t></w>
-<w><t>al-ka-li flat</t></w>
-<w><t>al-ka-li met-al</t></w>
-<w><t>al-ka-li soil</t></w>
+<phrase><t>al-ka-li flat</t></phrase>
+<phrase><t>al-ka-li met-al</t></phrase>
+<phrase><t>al-ka-li soil</t></phrase>
<w><t>al-kal-ic</t></w>
<w><t>al-ka-li-fi-a-ble</t></w>
<w><t>al-ka-li-fied</t></w>
@@ -4091,7 +4091,7 @@
<w><t>al-ka-li-met-ri-cal-ly</t></w>
<w><t>al-ka-lim-e-try</t></w>
<w><t>al-ka-line</t></w>
-<w><t>al-ka-line earth</t></w>
+<phrase><t>al-ka-line earth</t></phrase>
<w><t>al-ka-lin-ise</t></w>
<w><t>al-ka-lin-ised</t></w>
<w><t>al-ka-lin-is-ing</t></w>
@@ -4130,7 +4130,7 @@
<w><t>Al-ko-ran</t></w>
<w><t>alk-ox-ide</t></w>
<w><t>al-ky</t></w>
-<w><t>al-kyd res-in</t></w>
+<phrase><t>al-kyd res-in</t></phrase>
<w><t>al-kyl</t></w>
<w><t>al-kyl-a-tion</t></w>
<w><t>al-kyl-ic</t></w>
@@ -4144,7 +4144,7 @@
<w><t>all=ex-pense</t></w>
<w><t>all=ex-pens-es=paid</t></w>
<w><t>all=fired-ly</t></w>
-<w><t>all=fly-ing tail</t></w>
+<phrase><t>all=fly-ing tail</t></phrase>
<w><t>all=im-por-tant</t></w>
<w><t>all=in</t></w>
<w><t>all=in-clu-sive</t></w>
@@ -4154,8 +4154,8 @@
<w><t>all=round-er</t></w>
<w><t>all=weath-er</t></w>
<w><t>Al-la</t></w>
-<w><t>al-la bre-ve</t></w>
-<w><t>al-la pri-ma</t></w>
+<phrase><t>al-la bre-ve</t></phrase>
+<phrase><t>al-la pri-ma</t></phrase>
<w><t>al-la-ches-the-sia</t></w>
<w><t>Al-lah</t></w>
<w><t>Al-lah-a-bad</t></w>
@@ -4185,7 +4185,7 @@
<w><t>Al-le-ghe-ni-an</t></w>
<w><t>Al-le-ghe-nies</t></w>
<w><t>Al-le-ghe-ny</t></w>
-<w><t>Al-le-ghe-ny Moun-tains</t></w>
+<phrase><t>Al-le-ghe-ny Moun-tains</t></phrase>
<w><t>al-le-giance</t></w>
<w><t>al-le-giant</t></w>
<w><t>al-leg-ing</t></w>
@@ -4241,7 +4241,7 @@
<w><t>al-le-vi-a-tor</t></w>
<w><t>al-le-vi-a-to-ry</t></w>
<w><t>al-ley</t></w>
-<w><t>al-ley cat</t></w>
+<phrase><t>al-ley cat</t></phrase>
<w><t>al-ley-fired-est</t></w>
<w><t>al-ley-way</t></w>
<w><t>All-hal-low-mas</t></w>
@@ -4261,8 +4261,8 @@
<w><t>al-li-gat-ed</t></w>
<w><t>al-li-gat-ing</t></w>
<w><t>al-li-ga-tor</t></w>
-<w><t>al-li-ga-tor pear</t></w>
-<w><t>al-li-ga-tor pep-per</t></w>
+<phrase><t>al-li-ga-tor pear</t></phrase>
+<phrase><t>al-li-ga-tor pep-per</t></phrase>
<w><t>al-li-ga-tor-fish</t></w>
<w><t>al-li-ga-tor-fish-es</t></w>
<w><t>al-li-sion</t></w>
@@ -4367,7 +4367,7 @@
<w><t>al-lowed</t></w>
<w><t>al-low-ed-ly</t></w>
<w><t>al-loy</t></w>
-<w><t>al-loyed junc-tion</t></w>
+<phrase><t>al-loyed junc-tion</t></phrase>
<w><t>all-round</t></w>
<w><t>all-seed</t></w>
<w><t>all-spice</t></w>
@@ -4388,7 +4388,7 @@
<w><t>al-lu-sive-ness</t></w>
<w><t>al-lu-vi-a</t></w>
<w><t>al-lu-vi-al</t></w>
-<w><t>al-lu-vi-al fan</t></w>
+<phrase><t>al-lu-vi-al fan</t></phrase>
<w><t>al-lu-vi-ium</t></w>
<w><t>al-lu-vi-on</t></w>
<w><t>al-lu-vi-um</t></w>
@@ -4396,9 +4396,9 @@
<w><t>All-var</t></w>
<w><t>al-ly</t></w>
<w><t>al-lyl</t></w>
-<w><t>al-lyl al-co-hol</t></w>
-<w><t>al-lyl res-in</t></w>
-<w><t>al-lyl sul-phide</t></w>
+<phrase><t>al-lyl al-co-hol</t></phrase>
+<phrase><t>al-lyl res-in</t></phrase>
+<phrase><t>al-lyl sul-phide</t></phrase>
<w><t>al-lyl-ic</t></w>
<w><t>al-lyl-thi-o-u-re-a</t></w>
<w><t>Al-lyn</t></w>
@@ -4406,7 +4406,7 @@
<w><t>all-you</t></w>
<w><t>al-ma</t></w>
<w><t>Al-ma</t></w>
-<w><t>al-ma ma-ter</t></w>
+<phrase><t>al-ma ma-ter</t></phrase>
<w><t>Al-ma=A-ta</t></w>
<w><t>Al-ma=Tad-e-ma</t></w>
<w><t>al-ma-can-tar</t></w>
@@ -4510,7 +4510,7 @@
<w><t>al-pen-glow</t></w>
<w><t>al-pen-horn</t></w>
<w><t>al-pen-stock</t></w>
-<w><t>Alpes Ma-ri-times</t></w>
+<phrase><t>Alpes Ma-ri-times</t></phrase>
<w><t>Alpes=de=Haute=Pro-vence</t></w>
<w><t>Alpes=Ma-ri-times</t></w>
<w><t>al-pes-trine</t></w>
@@ -4517,13 +4517,13 @@
<w><t>Al-pet-ra-gi-us</t></w>
<w><t>al-pha</t></w>
<w><t>Al-pha</t></w>
-<w><t>al-pha and o-me-ga</t></w>
-<w><t>Al-pha Cen-tau-ri</t></w>
-<w><t>al-pha i-ron</t></w>
-<w><t>al-pha par-ti-cle</t></w>
-<w><t>al-pha priv-a-tive</t></w>
-<w><t>al-pha ray</t></w>
-<w><t>al-pha rhythm</t></w>
+<phrase><t>al-pha and o-me-ga</t></phrase>
+<phrase><t>Al-pha Cen-tau-ri</t></phrase>
+<phrase><t>al-pha i-ron</t></phrase>
+<phrase><t>al-pha par-ti-cle</t></phrase>
+<phrase><t>al-pha priv-a-tive</t></phrase>
+<phrase><t>al-pha ray</t></phrase>
+<phrase><t>al-pha rhythm</t></phrase>
<w><t>al-pha=hy-poph-a-mine</t></w>
<w><t>al-pha=naph-thol</t></w>
<w><t>al-pha=naph-thyl-thi-o-u-re-a</t></w>
@@ -4581,7 +4581,7 @@
<w><t>Al-ta</t></w>
<w><t>Al-ta-de-na</t></w>
<w><t>Al-tai</t></w>
-<w><t>Al-tai Moun-tains</t></w>
+<phrase><t>Al-tai Moun-tains</t></phrase>
<w><t>Al-ta-ian</t></w>
<w><t>Al-ta-ic</t></w>
<w><t>Al-tair</t></w>
@@ -4588,7 +4588,7 @@
<w><t>Al-ta-ir</t></w>
<w><t>Al-ta-mi-ra</t></w>
<w><t>al-tar</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
-<w><t>al-tar boy</t></w>
+<phrase><t>al-tar boy</t></phrase>
<w><t>al-tar-age</t></w>
<w><t>al-tar-piece</t></w>
<w><t>Al-ta-vis-ta</t></w>
@@ -4597,7 +4597,7 @@
<w><t>Alt-dorf</t></w>
<w><t>Alt-dor-fer</t></w>
<w><t>al-ter</t></w>
-<w><t>al-ter e-go</t></w>
+<phrase><t>al-ter e-go</t></phrase>
<w><t>al-ter-a-bil-i-ty</t></w>
<w><t>al-ter-a-ble</t></w>
<w><t>al-ter-a-ble-ness</t></w>
@@ -4609,20 +4609,20 @@
<w><t>al-ter-cat-ed</t></w>
<w><t>al-ter-cat-ing</t></w>
<w><t>al-ter-ca-tion</t></w>
-<w><t>al-tered chord</t></w>
+<phrase><t>al-tered chord</t></phrase>
<w><t>al-tern</t></w>
<w><t>al-ter-nant</t></w>
<w><t>al-ter-nate</t></w>
-<w><t>al-ter-nate an-gles</t></w>
+<phrase><t>al-ter-nate an-gles</t></phrase>
<w><t>al-ter-nat-ed</t></w>
<w><t>al-ter-nate-ly</t></w>
<w><t>al-ter-nate-ness</t></w>
<w><t>al-ter-nat-ing</t></w>
-<w><t>al-ter-nat-ing cur-rent</t></w>
-<w><t>al-ter-nat-ing=gra-di-ent fo-cus-ing</t></w>
+<phrase><t>al-ter-nat-ing cur-rent</t></phrase>
+<phrase><t>al-ter-nat-ing=gra-di-ent fo-cus-ing</t></phrase>
<w><t>al-ter-nat-ing-ly</t></w>
<w><t>al-ter-na-tion</t></w>
-<w><t>al-ter-na-tion of gen-er-a-tions</t></w>
+<phrase><t>al-ter-na-tion of gen-er-a-tions</t></phrase>
<w><t>al-ter-na-tive</t></w>
<w><t>al-ter-na-tive-ly</t></w>
<w><t>al-ter-na-tive-ness</t></w>
@@ -4650,8 +4650,8 @@
<w><t>al-ti-tude</t></w>
<w><t>al-ti-tu-di-nal</t></w>
<w><t>al-to</t></w>
-<w><t>al-to clef</t></w>
-<w><t>al-to horn</t></w>
+<phrase><t>al-to clef</t></phrase>
+<phrase><t>al-to horn</t></phrase>
<w><t>al-to=re-lie-vo</t></w>
<w><t>al-to=ri-lie-vo</t></w>
<w><t>al-to-cu-mu-lus</t></w>
@@ -4682,9 +4682,9 @@
<w><t>a-lu-mi-nis-ing</t></w>
<w><t>a-lu-mi-nite</t></w>
<w><t>al-u-min-i-um</t></w>
-<w><t>a-lu-min-i-um hy-drox-ide</t></w>
-<w><t>a-lu-min-i-um ox-ide</t></w>
-<w><t>a-lu-min-i-um sul-phate</t></w>
+<phrase><t>a-lu-min-i-um hy-drox-ide</t></phrase>
+<phrase><t>a-lu-min-i-um ox-ide</t></phrase>
+<phrase><t>a-lu-min-i-um sul-phate</t></phrase>
<w><t>a-lu-mi-nize</t></w>
<w><t>a-lu-mi-nized</t></w>
<w><t>a-lu-mi-niz-ing</t></w>
@@ -4823,7 +4823,7 @@
<w><t>a-maz-ing-ly</t></w>
<w><t>am-a-zon</t></w>
<w><t>Am-a-zon</t></w>
-<w><t>am-a-zon ant</t></w>
+<phrase><t>am-a-zon ant</t></phrase>
<w><t>A-ma-zo-nas</t></w>
<w><t>Am-a-zo-ni-an</t></w>
<w><t>Am-a-zo-nis</t></w>
@@ -4920,7 +4920,7 @@
<w><t>Am-brose</t></w>
<w><t>am-bro-sia</t></w>
<w><t>am-bro-si-a</t></w>
-<w><t>am-bro-si-a bee-tle</t></w>
+<phrase><t>am-bro-si-a bee-tle</t></phrase>
<w><t>am-bro-si-a-ceous</t></w>
<w><t>am-bro-sial</t></w>
<w><t>am-bro-sial-ly</t></w>
@@ -4933,7 +4933,7 @@
<w><t>am-bu-lac-ral</t></w>
<w><t>am-bu-lac-rum</t></w>
<w><t>am-bu-lance</t></w>
-<w><t>am-bu-lance chas-er</t></w>
+<phrase><t>am-bu-lance chas-er</t></phrase>
<w><t>am-bu-lant</t></w>
<w><t>am-bu-lante</t></w>
<w><t>am-bu-late</t></w>
@@ -4978,7 +4978,7 @@
<w><t>am-e-lo-blas-tic</t></w>
<w><t>a-men</t></w>
<w><t>A-men</t></w>
-<w><t>a-men cor-ner</t></w>
+<phrase><t>a-men cor-ner</t></phrase>
<w><t>A-men=Ra</t></w>
<w><t>a-me-na-bil-i-ty</t></w>
<w><t>a-me-na-ble</t></w>
@@ -4991,10 +4991,10 @@
<w><t>a-mend-er</t></w>
<w><t>a-mend-ment</t></w>
<w><t>a-mends</t></w>
-<w><t>A-men-ho-tep III</t></w>
-<w><t>Am-en-ho-tep IV</t></w>
-<w><t>A-men-hot-pe III</t></w>
-<w><t>Am-en-hot-pe IV</t></w>
+<phrase><t>A-men-ho-tep III</t></phrase>
+<phrase><t>Am-en-ho-tep IV</t></phrase>
+<phrase><t>A-men-hot-pe III</t></phrase>
+<phrase><t>Am-en-hot-pe IV</t></phrase>
<w><t>a-men-i-ty</t></w>
<w><t>a-men-or-rhe-a</t></w>
<w><t>a-men-or-rhe-al</t></w>
@@ -5016,23 +5016,23 @@
<w><t>a-merc-er</t></w>
<w><t>a-merc-ing</t></w>
<w><t>A-mer-i-ca</t></w>
-<w><t>A-mer-i-ca's Cup</t></w>
+<phrase><t>A-mer-i-ca's Cup</t></phrase>
<w><t>A-mer-i-can</t></w>
-<w><t>A-mer-i-can al-oe</t></w>
-<w><t>A-mer-i-can Beau-ty</t></w>
-<w><t>A-mer-i-can cha-me-le-on</t></w>
-<w><t>A-mer-i-can cheese</t></w>
-<w><t>A-mer-i-can cloth</t></w>
-<w><t>A-mer-i-can ea-gle</t></w>
-<w><t>A-mer-i-can Ex-pe-di-tion-ar-y Forc-es</t></w>
-<w><t>A-mer-i-can foot-ball</t></w>
-<w><t>A-mer-i-can In-di-an</t></w>
-<w><t>A-mer-i-can In-di-an Move-ment</t></w>
-<w><t>A-mer-i-can plan</t></w>
-<w><t>A-mer-i-can Rev-o-lu-tion</t></w>
-<w><t>A-mer-i-can Sa-mo-a</t></w>
-<w><t>A-mer-i-can Stand-ard Ver-sion</t></w>
-<w><t>A-mer-i-can tryp-a-no-so-mi-a-sis</t></w>
+<phrase><t>A-mer-i-can al-oe</t></phrase>
+<phrase><t>A-mer-i-can Beau-ty</t></phrase>
+<phrase><t>A-mer-i-can cha-me-le-on</t></phrase>
+<phrase><t>A-mer-i-can cheese</t></phrase>
+<phrase><t>A-mer-i-can cloth</t></phrase>
+<phrase><t>A-mer-i-can ea-gle</t></phrase>
+<phrase><t>A-mer-i-can Ex-pe-di-tion-ar-y Forc-es</t></phrase>
+<phrase><t>A-mer-i-can foot-ball</t></phrase>
+<phrase><t>A-mer-i-can In-di-an</t></phrase>
+<phrase><t>A-mer-i-can In-di-an Move-ment</t></phrase>
+<phrase><t>A-mer-i-can plan</t></phrase>
+<phrase><t>A-mer-i-can Rev-o-lu-tion</t></phrase>
+<phrase><t>A-mer-i-can Sa-mo-a</t></phrase>
+<phrase><t>A-mer-i-can Stand-ard Ver-sion</t></phrase>
+<phrase><t>A-mer-i-can tryp-a-no-so-mi-a-sis</t></phrase>
<w><t>A-mer-i-ca-na</t></w>
<w><t>A-mer-i-can-ise</t></w>
<w><t>A-mer-i-can-ised</t></w>
@@ -5050,7 +5050,7 @@
<w><t>A-mer-i-ca-no</t></w>
<w><t>am-er-i-ci-um</t></w>
<w><t>A-me-ri-go</t></w>
-<w><t>A-mer-i-go Ve-spuc-ci</t></w>
+<phrase><t>A-mer-i-go Ve-spuc-ci</t></phrase>
<w><t>Am-er-ind</t></w>
<w><t>Am-er-in-dian</t></w>
<w><t>Am-er-in-di-an</t></w>
@@ -5086,7 +5086,7 @@
<w><t>am-i-ca-ble-ness</t></w>
<w><t>am-i-ca-bly</t></w>
<w><t>am-ice</t></w>
-<w><t>a-mi-cus cu-ri-ae</t></w>
+<phrase><t>a-mi-cus cu-ri-ae</t></phrase>
<w><t>a-mid</t></w>
<w><t>A-mi-da</t></w>
<w><t>A-mi-dah</t></w>
@@ -5126,10 +5126,10 @@
<w><t>a-min-ic</t></w>
<w><t>a-min-i-ty</t></w>
<w><t>a-mi-no</t></w>
-<w><t>a-mi-no ac-id</t></w>
-<w><t>a-mi-no res-in</t></w>
+<phrase><t>a-mi-no ac-id</t></phrase>
+<phrase><t>a-mi-no res-in</t></phrase>
<w><t>a-mi-no-ben-zene</t></w>
-<w><t>a-mi-no-ben-zo-ic ac-id</t></w>
+<phrase><t>a-mi-no-ben-zo-ic ac-id</t></phrase>
<w><t>a-mi-no-phe-nol</t></w>
<w><t>am-i-noph-er-ase</t></w>
<w><t>a-mi-no-phyl-line</t></w>
@@ -5167,7 +5167,7 @@
<w><t>am-mo-nate</t></w>
<w><t>am-mo-nia</t></w>
<w><t>am-mo-ni-a</t></w>
-<w><t>am-mo-ni-a so-lu-tion</t></w>
+<phrase><t>am-mo-ni-a so-lu-tion</t></phrase>
<w><t>am-mo-ni-ac</t></w>
<w><t>am-mo-ni-a-cal</t></w>
<w><t>am-mo-ni-a-cum</t></w>
@@ -5188,12 +5188,12 @@
<w><t>Am-mon-it-ish</t></w>
<w><t>am-mon-i-toid</t></w>
<w><t>am-mo-ni-um</t></w>
-<w><t>am-mo-ni-um car-ba-mate</t></w>
-<w><t>am-mo-ni-um car-bon-ate</t></w>
-<w><t>am-mo-ni-um chlo-ride</t></w>
-<w><t>am-mo-ni-um hy-drox-ide</t></w>
-<w><t>am-mo-ni-um ni-trate</t></w>
-<w><t>am-mo-ni-um sul-phate</t></w>
+<phrase><t>am-mo-ni-um car-ba-mate</t></phrase>
+<phrase><t>am-mo-ni-um car-bon-ate</t></phrase>
+<phrase><t>am-mo-ni-um chlo-ride</t></phrase>
+<phrase><t>am-mo-ni-um hy-drox-ide</t></phrase>
+<phrase><t>am-mo-ni-um ni-trate</t></phrase>
+<phrase><t>am-mo-ni-um sul-phate</t></phrase>
<w><t>am-mo-no</t></w>
<w><t>am-mo-noid</t></w>
<w><t>am-mo-no-lit-ic</t></w>
@@ -5226,7 +5226,7 @@
<w><t>am-oe-be-an</t></w>
<w><t>am-oe-bi-a-sis</t></w>
<w><t>a-moe-bic</t></w>
-<w><t>a-moe-bic dys-en-ter-y</t></w>
+<phrase><t>a-moe-bic dys-en-ter-y</t></phrase>
<w><t>a-moe-bo-cyte</t></w>
<w><t>a-moe-boid</t></w>
<w><t>a-moe-boid-ism</t></w>
@@ -5239,7 +5239,7 @@
<w><t>a-mon-til-la-do</t></w>
<w><t>A-mo-pa-on</t></w>
<w><t>A-mor</t></w>
-<w><t>a-mor pa-tri-ae</t></w>
+<phrase><t>a-mor pa-tri-ae</t></phrase>
<w><t>a-mo-ra</t></w>
<w><t>a-mo-ra-im</t></w>
<w><t>a-mor-al</t></w>
@@ -5430,7 +5430,7 @@
<w><t>am-pli-fy</t></w>
<w><t>am-pli-fy-ing</t></w>
<w><t>am-pli-tude</t></w>
-<w><t>am-pli-tude mod-u-la-tion</t></w>
+<phrase><t>am-pli-tude mod-u-la-tion</t></phrase>
<w><t>am-ply</t></w>
<w><t>am-poule</t></w>
<w><t>am-pul</t></w>
@@ -5465,13 +5465,13 @@
<w><t>am-trac</t></w>
<w><t>am-track</t></w>
<w><t>amu</t></w>
-<w><t>A-mu Dar-ya</t></w>
+<phrase><t>A-mu Dar-ya</t></phrase>
<w><t>a-muck</t></w>
<w><t>a-mu-gis</t></w>
<w><t>am-u-let</t></w>
<w><t>A-mu-li-us</t></w>
<w><t>A-mund-sen</t></w>
-<w><t>A-mund-sen Sea</t></w>
+<phrase><t>A-mund-sen Sea</t></phrase>
<w><t>A-mur</t></w>
<w><t>a-mur-ca</t></w>
<w><t>a-mus-a-ble</t></w>
@@ -5479,8 +5479,8 @@
<w><t>a-mused</t></w>
<w><t>a-mus-ed-ly</t></w>
<w><t>a-muse-ment</t></w>
-<w><t>a-muse-ment ar-cade</t></w>
-<w><t>a-muse-ment park</t></w>
+<phrase><t>a-muse-ment ar-cade</t></phrase>
+<phrase><t>a-muse-ment park</t></phrase>
<w><t>a-mus-er</t></w>
<w><t>a-mu-si-a</t></w>
<w><t>a-mus-ing</t></w>
@@ -5507,9 +5507,9 @@
<w><t>a-myg-da-loi-dal</t></w>
<w><t>a-myg-dule</t></w>
<w><t>am-yl</t></w>
-<w><t>am-yl ac-e-tate</t></w>
-<w><t>am-yl al-co-hol</t></w>
-<w><t>am-yl ni-trite</t></w>
+<phrase><t>am-yl ac-e-tate</t></phrase>
+<phrase><t>am-yl al-co-hol</t></phrase>
+<phrase><t>am-yl ni-trite</t></phrase>
<w><t>am-y-la-ceous</t></w>
<w><t>am-yl-ase</t></w>
<w><t>am-yl-ene</t></w>
@@ -5534,7 +5534,7 @@
<w><t>a-myx-or-rhoe-a</t></w>
<w><t>A-mé-dée</t></w>
<w><t>an</t></w>
-<w><t>AN Oth-er</t></w>
+<phrase><t>AN Oth-er</t></phrase>
<w><t>an't</t></w>
<w><t>an=end</t></w>
<w><t>an-a</t></w>
@@ -5553,7 +5553,7 @@
<w><t>an-a-bi-ot-ic</t></w>
<w><t>an-a-bleps</t></w>
<w><t>an-a-bol-ic</t></w>
-<w><t>an-a-bol-ic ster-oid</t></w>
+<phrase><t>an-a-bol-ic ster-oid</t></phrase>
<w><t>a-nab-o-lism</t></w>
<w><t>a-nab-o-lite</t></w>
<w><t>an-a-branch</t></w>
@@ -5654,13 +5654,13 @@
<w><t>an-a-gram-ma-tized</t></w>
<w><t>an-a-gram-ma-tiz-ing</t></w>
<w><t>An-a-heim</t></w>
-<w><t>a-nak ku-ching</t></w>
+<phrase><t>a-nak ku-ching</t></phrase>
<w><t>An-a-kim</t></w>
<w><t>anal</t></w>
<w><t>a-nal</t></w>
-<w><t>a-nal ca-nal</t></w>
-<w><t>a-nal fin</t></w>
-<w><t>a-nal in-ter-course</t></w>
+<phrase><t>a-nal ca-nal</t></phrase>
+<phrase><t>a-nal fin</t></phrase>
+<phrase><t>a-nal in-ter-course</t></phrase>
<w><t>a-nal-cime</t></w>
<w><t>a-nal-cite</t></w>
<w><t>an-a-lec-ta</t></w>
@@ -5676,8 +5676,8 @@
<w><t>an-al-gi-a</t></w>
<w><t>a-nal-ly</t></w>
<w><t>an-a-log</t></w>
-<w><t>an-a-log com-put-er</t></w>
-<w><t>an-a-log=dig-it-al con-vert-er</t></w>
+<phrase><t>an-a-log com-put-er</t></phrase>
+<phrase><t>an-a-log=dig-it-al con-vert-er</t></phrase>
<w><t>a-na-lo-gi-a</t></w>
<w><t>an-a-log-ic</t></w>
<w><t>an-a-log-i-cal</t></w>
@@ -5707,13 +5707,13 @@
<w><t>an-a-lys-er</t></w>
<w><t>an-a-lys-ing</t></w>
<w><t>a-nal-y-sis</t></w>
-<w><t>a-nal-y-sis of var-i-ance</t></w>
-<w><t>a-nal-y-sis si-tus</t></w>
+<phrase><t>a-nal-y-sis of var-i-ance</t></phrase>
+<phrase><t>a-nal-y-sis si-tus</t></phrase>
<w><t>an-a-lyst</t></w>
<w><t>an-a-lyt-ic</t></w>
-<w><t>an-a-lyt-ic psy-chol-o-gy</t></w>
+<phrase><t>an-a-lyt-ic psy-chol-o-gy</t></phrase>
<w><t>an-a-lyt-i-cal</t></w>
-<w><t>an-a-lyt-i-cal ge-om-e-try</t></w>
+<phrase><t>an-a-lyt-i-cal ge-om-e-try</t></phrase>
<w><t>an-a-lyt-i-cal-ly</t></w>
<w><t>an-a-lyt-ics</t></w>
<w><t>an-a-ly-tique</t></w>
@@ -5824,7 +5824,7 @@
<w><t>An-a-tol-ic</t></w>
<w><t>an-a-tom-ic</t></w...
[truncated message content] |
|
From: <vic...@us...> - 2021-11-10 22:44:14
|
Revision: 12025
http://sourceforge.net/p/foray/code/12025
Author: victormote
Date: 2021-11-10 22:44:12 +0000 (Wed, 10 Nov 2021)
Log Message:
-----------
Make constructors public and document the registration process better.
Modified Paths:
--------------
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Language4a.java
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java 2021-11-10 20:56:15 UTC (rev 12024)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java 2021-11-10 22:44:12 UTC (rev 12025)
@@ -83,249 +83,249 @@
/* Checkstyle: Allow Magic Numbers that are hard-coded data. */
static {
- Country4a.registerCountry(UNDETERMINED);
- Country4a.registerCountry(new Country4a("AALAND ISLANDS", "AX", "ALA", (short) 248));
- Country4a.registerCountry(new Country4a("AFGHANISTAN", "AF", "AFG", (short) 4));
- Country4a.registerCountry(new Country4a("ALBANIA", "AL", "ALB", (short) 8));
- Country4a.registerCountry(new Country4a("ALGERIA", "DZ", "DZA", (short) 12));
- Country4a.registerCountry(new Country4a("AMERICAN SAMOA", "AS", "ASM", (short) 16));
- Country4a.registerCountry(new Country4a("ANDORRA", "AD", "AND", (short) 20));
- Country4a.registerCountry(new Country4a("ANGOLA", "AO", "AGO", (short) 24));
- Country4a.registerCountry(new Country4a("ANGUILLA", "AI", "AIA", (short) 660));
- Country4a.registerCountry(new Country4a("ANTARCTICA", "AQ", "ATA", (short) 10));
- Country4a.registerCountry(new Country4a("ANTIGUA AND BARBUDA", "AG", "ATG", (short) 28));
- Country4a.registerCountry(new Country4a("ARGENTINA", "AR", "ARG", (short) 32));
- Country4a.registerCountry(new Country4a("ARMENIA", "AM", "ARM", (short) 51));
- Country4a.registerCountry(new Country4a("ARUBA", "AW", "ABW", (short) 533));
- Country4a.registerCountry(new Country4a("AUSTRALIA", "AU", "AUS", (short) 36));
- Country4a.registerCountry(new Country4a("AUSTRIA", "AT", "AUT", (short) 40));
- Country4a.registerCountry(new Country4a("AZERBAIJAN", "AZ", "AZE", (short) 31));
- Country4a.registerCountry(new Country4a("BAHAMAS", "BS", "BHS", (short) 44));
- Country4a.registerCountry(new Country4a("BAHRAIN", "BH", "BHR", (short) 48));
- Country4a.registerCountry(new Country4a("BANGLADESH", "BD", "BGD", (short) 50));
- Country4a.registerCountry(new Country4a("BARBADOS", "BB", "BRB", (short) 52));
- Country4a.registerCountry(new Country4a("BELARUS", "BY", "BLR", (short) 112));
- Country4a.registerCountry(new Country4a("BELGIUM", "BE", "BEL", (short) 56));
- Country4a.registerCountry(new Country4a("BELIZE", "BZ", "BLZ", (short) 84));
- Country4a.registerCountry(new Country4a("BENIN", "BJ", "BEN", (short) 204));
- Country4a.registerCountry(new Country4a("BERMUDA", "BM", "BMU", (short) 60));
- Country4a.registerCountry(new Country4a("BHUTAN", "BT", "BTN", (short) 64));
- Country4a.registerCountry(new Country4a("BOLIVIA", "BO", "BOL", (short) 68));
- Country4a.registerCountry(new Country4a("BOSNIA AND HERZEGOWINA", "BA", "BIH", (short) 70));
- Country4a.registerCountry(new Country4a("BOTSWANA", "BW", "BWA", (short) 72));
- Country4a.registerCountry(new Country4a("BOUVET ISLAND", "BV", "BVT", (short) 74));
- Country4a.registerCountry(new Country4a("BRAZIL", "BR", "BRA", (short) 76));
- Country4a.registerCountry(new Country4a("BRITISH INDIAN OCEAN TERRITORY", "IO", "IOT", (short) 86));
- Country4a.registerCountry(new Country4a("BRUNEI DARUSSALAM", "BN", "BRN", (short) 96));
- Country4a.registerCountry(new Country4a("BULGARIA", "BG", "BGR", (short) 100));
- Country4a.registerCountry(new Country4a("BURKINA FASO", "BF", "BFA", (short) 854));
- Country4a.registerCountry(new Country4a("BURUNDI", "BI", "BDI", (short) 108));
- Country4a.registerCountry(new Country4a("CAMBODIA", "KH", "KHM", (short) 116));
- Country4a.registerCountry(new Country4a("CAMEROON", "CM", "CMR", (short) 120));
- Country4a.registerCountry(new Country4a("CANADA", "CA", "CAN", (short) 124));
- Country4a.registerCountry(new Country4a("CAPE VERDE", "CV", "CPV", (short) 132));
- Country4a.registerCountry(new Country4a("CAYMAN ISLANDS", "KY", "CYM", (short) 136));
- Country4a.registerCountry(new Country4a("CENTRAL AFRICAN REPUBLIC", "CF", "CAF", (short) 140));
- Country4a.registerCountry(new Country4a("CHAD", "TD", "TCD", (short) 148));
- Country4a.registerCountry(new Country4a("CHILE", "CL", "CHL", (short) 152));
- Country4a.registerCountry(new Country4a("CHINA", "CN", "CHN", (short) 156));
- Country4a.registerCountry(new Country4a("CHRISTMAS ISLAND", "CX", "CXR", (short) 162));
- Country4a.registerCountry(new Country4a("COCOS (KEELING) ISLANDS", "CC", "CCK", (short) 166));
- Country4a.registerCountry(new Country4a("COLOMBIA", "CO", "COL", (short) 170));
- Country4a.registerCountry(new Country4a("COMOROS", "KM", "COM", (short) 174));
- Country4a.registerCountry(new Country4a("CONGO, Democratic Republic of (was Zaire)", "CD", "COD", (short) 180));
- Country4a.registerCountry(new Country4a("CONGO, Republic of", "CG", "COG", (short) 178));
- Country4a.registerCountry(new Country4a("COOK ISLANDS", "CK", "COK", (short) 184));
- Country4a.registerCountry(new Country4a("COSTA RICA", "CR", "CRI", (short) 188));
- Country4a.registerCountry(new Country4a("COTE D'IVOIRE", "CI", "CIV", (short) 384));
- Country4a.registerCountry(new Country4a("CROATIA (local name: Hrvatska)", "HR", "HRV", (short) 191));
- Country4a.registerCountry(new Country4a("CUBA", "CU", "CUB", (short) 192));
- Country4a.registerCountry(new Country4a("CYPRUS", "CY", "CYP", (short) 196));
- Country4a.registerCountry(new Country4a("CZECH REPUBLIC", "CZ", "CZE", (short) 203));
- Country4a.registerCountry(new Country4a("DENMARK", "DK", "DNK", (short) 208));
- Country4a.registerCountry(new Country4a("DJIBOUTI", "DJ", "DJI", (short) 262));
- Country4a.registerCountry(new Country4a("DOMINICA", "DM", "DMA", (short) 212));
- Country4a.registerCountry(new Country4a("DOMINICAN REPUBLIC", "DO", "DOM", (short) 214));
- Country4a.registerCountry(new Country4a("ECUADOR", "EC", "ECU", (short) 218));
- Country4a.registerCountry(new Country4a("EGYPT", "EG", "EGY", (short) 818));
- Country4a.registerCountry(new Country4a("EL SALVADOR", "SV", "SLV", (short) 222));
- Country4a.registerCountry(new Country4a("EQUATORIAL GUINEA", "GQ", "GNQ", (short) 226));
- Country4a.registerCountry(new Country4a("ERITREA", "ER", "ERI", (short) 232));
- Country4a.registerCountry(new Country4a("ESTONIA", "EE", "EST", (short) 233));
- Country4a.registerCountry(new Country4a("ETHIOPIA", "ET", "ETH", (short) 231));
- Country4a.registerCountry(new Country4a("FALKLAND ISLANDS (MALVINAS)", "FK", "FLK", (short) 238));
- Country4a.registerCountry(new Country4a("FAROE ISLANDS", "FO", "FRO", (short) 234));
- Country4a.registerCountry(new Country4a("FIJI", "FJ", "FJI", (short) 242));
- Country4a.registerCountry(new Country4a("FINLAND", "FI", "FIN", (short) 246));
- Country4a.registerCountry(new Country4a("FRANCE", "FR", "FRA", (short) 250));
- Country4a.registerCountry(new Country4a("FRENCH GUIANA", "GF", "GUF", (short) 254));
- Country4a.registerCountry(new Country4a("FRENCH POLYNESIA", "PF", "PYF", (short) 258));
- Country4a.registerCountry(new Country4a("FRENCH SOUTHERN TERRITORIES", "TF", "ATF", (short) 260));
- Country4a.registerCountry(new Country4a("GABON", "GA", "GAB", (short) 266));
- Country4a.registerCountry(new Country4a("GAMBIA", "GM", "GMB", (short) 270));
- Country4a.registerCountry(new Country4a("GEORGIA", "GE", "GEO", (short) 268));
- Country4a.registerCountry(new Country4a("GERMANY", "DE", "DEU", (short) 276));
- Country4a.registerCountry(new Country4a("GHANA", "GH", "GHA", (short) 288));
- Country4a.registerCountry(new Country4a("GIBRALTAR", "GI", "GIB", (short) 292));
- Country4a.registerCountry(new Country4a("GREECE", "GR", "GRC", (short) 300));
- Country4a.registerCountry(new Country4a("GREENLAND", "GL", "GRL", (short) 304));
- Country4a.registerCountry(new Country4a("GRENADA", "GD", "GRD", (short) 308));
- Country4a.registerCountry(new Country4a("GUADELOUPE", "GP", "GLP", (short) 312));
- Country4a.registerCountry(new Country4a("GUAM", "GU", "GUM", (short) 316));
- Country4a.registerCountry(new Country4a("GUATEMALA", "GT", "GTM", (short) 320));
- Country4a.registerCountry(new Country4a("GUINEA", "GN", "GIN", (short) 324));
- Country4a.registerCountry(new Country4a("GUINEA-BISSAU", "GW", "GNB", (short) 624));
- Country4a.registerCountry(new Country4a("GUYANA", "GY", "GUY", (short) 328));
- Country4a.registerCountry(new Country4a("HAITI", "HT", "HTI", (short) 332));
- Country4a.registerCountry(new Country4a("HEARD AND MC DONALD ISLANDS", "HM", "HMD", (short) 334));
- Country4a.registerCountry(new Country4a("HONDURAS", "HN", "HND", (short) 340));
- Country4a.registerCountry(new Country4a("HONG KONG", "HK", "HKG", (short) 344));
- Country4a.registerCountry(new Country4a("HUNGARY", "HU", "HUN", (short) 348));
- Country4a.registerCountry(new Country4a("ICELAND", "IS", "ISL", (short) 352));
- Country4a.registerCountry(new Country4a("INDIA", "IN", "IND", (short) 356));
- Country4a.registerCountry(new Country4a("INDONESIA", "ID", "IDN", (short) 360));
- Country4a.registerCountry(new Country4a("IRAN (ISLAMIC REPUBLIC OF)", "IR", "IRN", (short) 364));
- Country4a.registerCountry(new Country4a("IRAQ", "IQ", "IRQ", (short) 368));
- Country4a.registerCountry(new Country4a("IRELAND", "IE", "IRL", (short) 372));
- Country4a.registerCountry(new Country4a("ISRAEL", "IL", "ISR", (short) 376));
- Country4a.registerCountry(new Country4a("ITALY", "IT", "ITA", (short) 380));
- Country4a.registerCountry(new Country4a("JAMAICA", "JM", "JAM", (short) 388));
- Country4a.registerCountry(new Country4a("JAPAN", "JP", "JPN", (short) 392));
- Country4a.registerCountry(new Country4a("JORDAN", "JO", "JOR", (short) 400));
- Country4a.registerCountry(new Country4a("KAZAKHSTAN", "KZ", "KAZ", (short) 398));
- Country4a.registerCountry(new Country4a("KENYA", "KE", "KEN", (short) 404));
- Country4a.registerCountry(new Country4a("KIRIBATI", "KI", "KIR", (short) 296));
- Country4a.registerCountry(new Country4a("KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF", "KP", "PRK", (short) 408));
- Country4a.registerCountry(new Country4a("KOREA, REPUBLIC OF", "KR", "KOR", (short) 410));
- Country4a.registerCountry(new Country4a("KUWAIT", "KW", "KWT", (short) 414));
- Country4a.registerCountry(new Country4a("KYRGYZSTAN", "KG", "KGZ", (short) 417));
- Country4a.registerCountry(new Country4a("LAO PEOPLE'S DEMOCRATIC REPUBLIC", "LA", "LAO", (short) 418));
- Country4a.registerCountry(new Country4a("LATVIA", "LV", "LVA", (short) 428));
- Country4a.registerCountry(new Country4a("LEBANON", "LB", "LBN", (short) 422));
- Country4a.registerCountry(new Country4a("LESOTHO", "LS", "LSO", (short) 426));
- Country4a.registerCountry(new Country4a("LIBERIA", "LR", "LBR", (short) 430));
- Country4a.registerCountry(new Country4a("LIBYAN ARAB JAMAHIRIYA", "LY", "LBY", (short) 434));
- Country4a.registerCountry(new Country4a("LIECHTENSTEIN", "LI", "LIE", (short) 438));
- Country4a.registerCountry(new Country4a("LITHUANIA", "LT", "LTU", (short) 440));
- Country4a.registerCountry(new Country4a("LUXEMBOURG", "LU", "LUX", (short) 442));
- Country4a.registerCountry(new Country4a("MACAU", "MO", "MAC", (short) 446));
- Country4a.registerCountry(new Country4a(
+ Country4a.register(UNDETERMINED);
+ Country4a.register(new Country4a("AALAND ISLANDS", "AX", "ALA", (short) 248));
+ Country4a.register(new Country4a("AFGHANISTAN", "AF", "AFG", (short) 4));
+ Country4a.register(new Country4a("ALBANIA", "AL", "ALB", (short) 8));
+ Country4a.register(new Country4a("ALGERIA", "DZ", "DZA", (short) 12));
+ Country4a.register(new Country4a("AMERICAN SAMOA", "AS", "ASM", (short) 16));
+ Country4a.register(new Country4a("ANDORRA", "AD", "AND", (short) 20));
+ Country4a.register(new Country4a("ANGOLA", "AO", "AGO", (short) 24));
+ Country4a.register(new Country4a("ANGUILLA", "AI", "AIA", (short) 660));
+ Country4a.register(new Country4a("ANTARCTICA", "AQ", "ATA", (short) 10));
+ Country4a.register(new Country4a("ANTIGUA AND BARBUDA", "AG", "ATG", (short) 28));
+ Country4a.register(new Country4a("ARGENTINA", "AR", "ARG", (short) 32));
+ Country4a.register(new Country4a("ARMENIA", "AM", "ARM", (short) 51));
+ Country4a.register(new Country4a("ARUBA", "AW", "ABW", (short) 533));
+ Country4a.register(new Country4a("AUSTRALIA", "AU", "AUS", (short) 36));
+ Country4a.register(new Country4a("AUSTRIA", "AT", "AUT", (short) 40));
+ Country4a.register(new Country4a("AZERBAIJAN", "AZ", "AZE", (short) 31));
+ Country4a.register(new Country4a("BAHAMAS", "BS", "BHS", (short) 44));
+ Country4a.register(new Country4a("BAHRAIN", "BH", "BHR", (short) 48));
+ Country4a.register(new Country4a("BANGLADESH", "BD", "BGD", (short) 50));
+ Country4a.register(new Country4a("BARBADOS", "BB", "BRB", (short) 52));
+ Country4a.register(new Country4a("BELARUS", "BY", "BLR", (short) 112));
+ Country4a.register(new Country4a("BELGIUM", "BE", "BEL", (short) 56));
+ Country4a.register(new Country4a("BELIZE", "BZ", "BLZ", (short) 84));
+ Country4a.register(new Country4a("BENIN", "BJ", "BEN", (short) 204));
+ Country4a.register(new Country4a("BERMUDA", "BM", "BMU", (short) 60));
+ Country4a.register(new Country4a("BHUTAN", "BT", "BTN", (short) 64));
+ Country4a.register(new Country4a("BOLIVIA", "BO", "BOL", (short) 68));
+ Country4a.register(new Country4a("BOSNIA AND HERZEGOWINA", "BA", "BIH", (short) 70));
+ Country4a.register(new Country4a("BOTSWANA", "BW", "BWA", (short) 72));
+ Country4a.register(new Country4a("BOUVET ISLAND", "BV", "BVT", (short) 74));
+ Country4a.register(new Country4a("BRAZIL", "BR", "BRA", (short) 76));
+ Country4a.register(new Country4a("BRITISH INDIAN OCEAN TERRITORY", "IO", "IOT", (short) 86));
+ Country4a.register(new Country4a("BRUNEI DARUSSALAM", "BN", "BRN", (short) 96));
+ Country4a.register(new Country4a("BULGARIA", "BG", "BGR", (short) 100));
+ Country4a.register(new Country4a("BURKINA FASO", "BF", "BFA", (short) 854));
+ Country4a.register(new Country4a("BURUNDI", "BI", "BDI", (short) 108));
+ Country4a.register(new Country4a("CAMBODIA", "KH", "KHM", (short) 116));
+ Country4a.register(new Country4a("CAMEROON", "CM", "CMR", (short) 120));
+ Country4a.register(new Country4a("CANADA", "CA", "CAN", (short) 124));
+ Country4a.register(new Country4a("CAPE VERDE", "CV", "CPV", (short) 132));
+ Country4a.register(new Country4a("CAYMAN ISLANDS", "KY", "CYM", (short) 136));
+ Country4a.register(new Country4a("CENTRAL AFRICAN REPUBLIC", "CF", "CAF", (short) 140));
+ Country4a.register(new Country4a("CHAD", "TD", "TCD", (short) 148));
+ Country4a.register(new Country4a("CHILE", "CL", "CHL", (short) 152));
+ Country4a.register(new Country4a("CHINA", "CN", "CHN", (short) 156));
+ Country4a.register(new Country4a("CHRISTMAS ISLAND", "CX", "CXR", (short) 162));
+ Country4a.register(new Country4a("COCOS (KEELING) ISLANDS", "CC", "CCK", (short) 166));
+ Country4a.register(new Country4a("COLOMBIA", "CO", "COL", (short) 170));
+ Country4a.register(new Country4a("COMOROS", "KM", "COM", (short) 174));
+ Country4a.register(new Country4a("CONGO, Democratic Republic of (was Zaire)", "CD", "COD", (short) 180));
+ Country4a.register(new Country4a("CONGO, Republic of", "CG", "COG", (short) 178));
+ Country4a.register(new Country4a("COOK ISLANDS", "CK", "COK", (short) 184));
+ Country4a.register(new Country4a("COSTA RICA", "CR", "CRI", (short) 188));
+ Country4a.register(new Country4a("COTE D'IVOIRE", "CI", "CIV", (short) 384));
+ Country4a.register(new Country4a("CROATIA (local name: Hrvatska)", "HR", "HRV", (short) 191));
+ Country4a.register(new Country4a("CUBA", "CU", "CUB", (short) 192));
+ Country4a.register(new Country4a("CYPRUS", "CY", "CYP", (short) 196));
+ Country4a.register(new Country4a("CZECH REPUBLIC", "CZ", "CZE", (short) 203));
+ Country4a.register(new Country4a("DENMARK", "DK", "DNK", (short) 208));
+ Country4a.register(new Country4a("DJIBOUTI", "DJ", "DJI", (short) 262));
+ Country4a.register(new Country4a("DOMINICA", "DM", "DMA", (short) 212));
+ Country4a.register(new Country4a("DOMINICAN REPUBLIC", "DO", "DOM", (short) 214));
+ Country4a.register(new Country4a("ECUADOR", "EC", "ECU", (short) 218));
+ Country4a.register(new Country4a("EGYPT", "EG", "EGY", (short) 818));
+ Country4a.register(new Country4a("EL SALVADOR", "SV", "SLV", (short) 222));
+ Country4a.register(new Country4a("EQUATORIAL GUINEA", "GQ", "GNQ", (short) 226));
+ Country4a.register(new Country4a("ERITREA", "ER", "ERI", (short) 232));
+ Country4a.register(new Country4a("ESTONIA", "EE", "EST", (short) 233));
+ Country4a.register(new Country4a("ETHIOPIA", "ET", "ETH", (short) 231));
+ Country4a.register(new Country4a("FALKLAND ISLANDS (MALVINAS)", "FK", "FLK", (short) 238));
+ Country4a.register(new Country4a("FAROE ISLANDS", "FO", "FRO", (short) 234));
+ Country4a.register(new Country4a("FIJI", "FJ", "FJI", (short) 242));
+ Country4a.register(new Country4a("FINLAND", "FI", "FIN", (short) 246));
+ Country4a.register(new Country4a("FRANCE", "FR", "FRA", (short) 250));
+ Country4a.register(new Country4a("FRENCH GUIANA", "GF", "GUF", (short) 254));
+ Country4a.register(new Country4a("FRENCH POLYNESIA", "PF", "PYF", (short) 258));
+ Country4a.register(new Country4a("FRENCH SOUTHERN TERRITORIES", "TF", "ATF", (short) 260));
+ Country4a.register(new Country4a("GABON", "GA", "GAB", (short) 266));
+ Country4a.register(new Country4a("GAMBIA", "GM", "GMB", (short) 270));
+ Country4a.register(new Country4a("GEORGIA", "GE", "GEO", (short) 268));
+ Country4a.register(new Country4a("GERMANY", "DE", "DEU", (short) 276));
+ Country4a.register(new Country4a("GHANA", "GH", "GHA", (short) 288));
+ Country4a.register(new Country4a("GIBRALTAR", "GI", "GIB", (short) 292));
+ Country4a.register(new Country4a("GREECE", "GR", "GRC", (short) 300));
+ Country4a.register(new Country4a("GREENLAND", "GL", "GRL", (short) 304));
+ Country4a.register(new Country4a("GRENADA", "GD", "GRD", (short) 308));
+ Country4a.register(new Country4a("GUADELOUPE", "GP", "GLP", (short) 312));
+ Country4a.register(new Country4a("GUAM", "GU", "GUM", (short) 316));
+ Country4a.register(new Country4a("GUATEMALA", "GT", "GTM", (short) 320));
+ Country4a.register(new Country4a("GUINEA", "GN", "GIN", (short) 324));
+ Country4a.register(new Country4a("GUINEA-BISSAU", "GW", "GNB", (short) 624));
+ Country4a.register(new Country4a("GUYANA", "GY", "GUY", (short) 328));
+ Country4a.register(new Country4a("HAITI", "HT", "HTI", (short) 332));
+ Country4a.register(new Country4a("HEARD AND MC DONALD ISLANDS", "HM", "HMD", (short) 334));
+ Country4a.register(new Country4a("HONDURAS", "HN", "HND", (short) 340));
+ Country4a.register(new Country4a("HONG KONG", "HK", "HKG", (short) 344));
+ Country4a.register(new Country4a("HUNGARY", "HU", "HUN", (short) 348));
+ Country4a.register(new Country4a("ICELAND", "IS", "ISL", (short) 352));
+ Country4a.register(new Country4a("INDIA", "IN", "IND", (short) 356));
+ Country4a.register(new Country4a("INDONESIA", "ID", "IDN", (short) 360));
+ Country4a.register(new Country4a("IRAN (ISLAMIC REPUBLIC OF)", "IR", "IRN", (short) 364));
+ Country4a.register(new Country4a("IRAQ", "IQ", "IRQ", (short) 368));
+ Country4a.register(new Country4a("IRELAND", "IE", "IRL", (short) 372));
+ Country4a.register(new Country4a("ISRAEL", "IL", "ISR", (short) 376));
+ Country4a.register(new Country4a("ITALY", "IT", "ITA", (short) 380));
+ Country4a.register(new Country4a("JAMAICA", "JM", "JAM", (short) 388));
+ Country4a.register(new Country4a("JAPAN", "JP", "JPN", (short) 392));
+ Country4a.register(new Country4a("JORDAN", "JO", "JOR", (short) 400));
+ Country4a.register(new Country4a("KAZAKHSTAN", "KZ", "KAZ", (short) 398));
+ Country4a.register(new Country4a("KENYA", "KE", "KEN", (short) 404));
+ Country4a.register(new Country4a("KIRIBATI", "KI", "KIR", (short) 296));
+ Country4a.register(new Country4a("KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF", "KP", "PRK", (short) 408));
+ Country4a.register(new Country4a("KOREA, REPUBLIC OF", "KR", "KOR", (short) 410));
+ Country4a.register(new Country4a("KUWAIT", "KW", "KWT", (short) 414));
+ Country4a.register(new Country4a("KYRGYZSTAN", "KG", "KGZ", (short) 417));
+ Country4a.register(new Country4a("LAO PEOPLE'S DEMOCRATIC REPUBLIC", "LA", "LAO", (short) 418));
+ Country4a.register(new Country4a("LATVIA", "LV", "LVA", (short) 428));
+ Country4a.register(new Country4a("LEBANON", "LB", "LBN", (short) 422));
+ Country4a.register(new Country4a("LESOTHO", "LS", "LSO", (short) 426));
+ Country4a.register(new Country4a("LIBERIA", "LR", "LBR", (short) 430));
+ Country4a.register(new Country4a("LIBYAN ARAB JAMAHIRIYA", "LY", "LBY", (short) 434));
+ Country4a.register(new Country4a("LIECHTENSTEIN", "LI", "LIE", (short) 438));
+ Country4a.register(new Country4a("LITHUANIA", "LT", "LTU", (short) 440));
+ Country4a.register(new Country4a("LUXEMBOURG", "LU", "LUX", (short) 442));
+ Country4a.register(new Country4a("MACAU", "MO", "MAC", (short) 446));
+ Country4a.register(new Country4a(
"MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF", "MK", "MKD", (short) 807));
- Country4a.registerCountry(new Country4a("MADAGASCAR", "MG", "MDG", (short) 450));
- Country4a.registerCountry(new Country4a("MALAWI", "MW", "MWI", (short) 454));
- Country4a.registerCountry(new Country4a("MALAYSIA", "MY", "MYS", (short) 458));
- Country4a.registerCountry(new Country4a("MALDIVES", "MV", "MDV", (short) 462));
- Country4a.registerCountry(new Country4a("MALI", "ML", "MLI", (short) 466));
- Country4a.registerCountry(new Country4a("MALTA", "MT", "MLT", (short) 470));
- Country4a.registerCountry(new Country4a("MARSHALL ISLANDS", "MH", "MHL", (short) 584));
- Country4a.registerCountry(new Country4a("MARTINIQUE", "MQ", "MTQ", (short) 474));
- Country4a.registerCountry(new Country4a("MAURITANIA", "MR", "MRT", (short) 478));
- Country4a.registerCountry(new Country4a("MAURITIUS", "MU", "MUS", (short) 480));
- Country4a.registerCountry(new Country4a("MAYOTTE", "YT", "MYT", (short) 175));
- Country4a.registerCountry(new Country4a("MEXICO", "MX", "MEX", (short) 484));
- Country4a.registerCountry(new Country4a("MICRONESIA, FEDERATED STATES OF", "FM", "FSM", (short) 583));
- Country4a.registerCountry(new Country4a("MOLDOVA, REPUBLIC OF", "MD", "MDA", (short) 498));
- Country4a.registerCountry(new Country4a("MONACO", "MC", "MCO", (short) 492));
- Country4a.registerCountry(new Country4a("MONGOLIA", "MN", "MNG", (short) 496));
- Country4a.registerCountry(new Country4a("MONTSERRAT", "MS", "MSR", (short) 500));
- Country4a.registerCountry(new Country4a("MOROCCO", "MA", "MAR", (short) 504));
- Country4a.registerCountry(new Country4a("MOZAMBIQUE", "MZ", "MOZ", (short) 508));
- Country4a.registerCountry(new Country4a("MYANMAR", "MM", "MMR", (short) 104));
- Country4a.registerCountry(new Country4a("NAMIBIA", "NA", "NAM", (short) 516));
- Country4a.registerCountry(new Country4a("NAURU", "NR", "NRU", (short) 520));
- Country4a.registerCountry(new Country4a("NEPAL", "NP", "NPL", (short) 524));
- Country4a.registerCountry(new Country4a("NETHERLANDS", "NL", "NLD", (short) 528));
- Country4a.registerCountry(new Country4a("NETHERLANDS ANTILLES", "AN", "ANT", (short) 530));
- Country4a.registerCountry(new Country4a("NEW CALEDONIA", "NC", "NCL", (short) 540));
- Country4a.registerCountry(new Country4a("NEW ZEALAND", "NZ", "NZL", (short) 554));
- Country4a.registerCountry(new Country4a("NICARAGUA", "NI", "NIC", (short) 558));
- Country4a.registerCountry(new Country4a("NIGER", "NE", "NER", (short) 562));
- Country4a.registerCountry(new Country4a("NIGERIA", "NG", "NGA", (short) 566));
- Country4a.registerCountry(new Country4a("NIUE", "NU", "NIU", (short) 570));
- Country4a.registerCountry(new Country4a("NORFOLK ISLAND", "NF", "NFK", (short) 574));
- Country4a.registerCountry(new Country4a("NORTHERN MARIANA ISLANDS", "MP", "MNP", (short) 580));
- Country4a.registerCountry(new Country4a("NORWAY", "NO", "NOR", (short) 578));
- Country4a.registerCountry(new Country4a("OMAN", "OM", "OMN", (short) 512));
- Country4a.registerCountry(new Country4a("PAKISTAN", "PK", "PAK", (short) 586));
- Country4a.registerCountry(new Country4a("PALAU", "PW", "PLW", (short) 585));
- Country4a.registerCountry(new Country4a("PALESTINIAN TERRITORY, Occupied", "PS", "PSE", (short) 275));
- Country4a.registerCountry(new Country4a("PANAMA", "PA", "PAN", (short) 591));
- Country4a.registerCountry(new Country4a("PAPUA NEW GUINEA", "PG", "PNG", (short) 598));
- Country4a.registerCountry(new Country4a("PARAGUAY", "PY", "PRY", (short) 600));
- Country4a.registerCountry(new Country4a("PERU", "PE", "PER", (short) 604));
- Country4a.registerCountry(new Country4a("PHILIPPINES", "PH", "PHL", (short) 608));
- Country4a.registerCountry(new Country4a("PITCAIRN", "PN", "PCN", (short) 612));
- Country4a.registerCountry(new Country4a("POLAND", "PL", "POL", (short) 616));
- Country4a.registerCountry(new Country4a("PORTUGAL", "PT", "PRT", (short) 620));
- Country4a.registerCountry(new Country4a("PUERTO RICO", "PR", "PRI", (short) 630));
- Country4a.registerCountry(new Country4a("QATAR", "QA", "QAT", (short) 634));
- Country4a.registerCountry(new Country4a("REUNION", "RE", "REU", (short) 638));
- Country4a.registerCountry(new Country4a("ROMANIA", "RO", "ROU", (short) 642));
- Country4a.registerCountry(new Country4a("RUSSIAN FEDERATION", "RU", "RUS", (short) 643));
- Country4a.registerCountry(new Country4a("RWANDA", "RW", "RWA", (short) 646));
- Country4a.registerCountry(new Country4a("SAINT HELENA", "SH", "SHN", (short) 654));
- Country4a.registerCountry(new Country4a("SAINT KITTS AND NEVIS", "KN", "KNA", (short) 659));
- Country4a.registerCountry(new Country4a("SAINT LUCIA", "LC", "LCA", (short) 662));
- Country4a.registerCountry(new Country4a("SAINT PIERRE AND MIQUELON", "PM", "SPM", (short) 666));
- Country4a.registerCountry(new Country4a("SAINT VINCENT AND THE GRENADINES", "VC", "VCT", (short) 670));
- Country4a.registerCountry(new Country4a("SAMOA", "WS", "WSM", (short) 882));
- Country4a.registerCountry(new Country4a("SAN MARINO", "SM", "SMR", (short) 674));
- Country4a.registerCountry(new Country4a("SAO TOME AND PRINCIPE", "ST", "STP", (short) 678));
- Country4a.registerCountry(new Country4a("SAUDI ARABIA", "SA", "SAU", (short) 682));
- Country4a.registerCountry(new Country4a("SENEGAL", "SN", "SEN", (short) 686));
- Country4a.registerCountry(new Country4a("SERBIA AND MONTENEGRO", "CS", "SCG", (short) 891));
- Country4a.registerCountry(new Country4a("SEYCHELLES", "SC", "SYC", (short) 690));
- Country4a.registerCountry(new Country4a("SIERRA LEONE", "SL", "SLE", (short) 694));
- Country4a.registerCountry(new Country4a("SINGAPORE", "SG", "SGP", (short) 702));
- Country4a.registerCountry(new Country4a("SLOVAKIA", "SK", "SVK", (short) 703));
- Country4a.registerCountry(new Country4a("SLOVENIA", "SI", "SVN", (short) 705));
- Country4a.registerCountry(new Country4a("SOLOMON ISLANDS", "SB", "SLB", (short) 90));
- Country4a.registerCountry(new Country4a("SOMALIA", "SO", "SOM", (short) 706));
- Country4a.registerCountry(new Country4a("SOUTH AFRICA", "ZA", "ZAF", (short) 710));
- Country4a.registerCountry(new Country4a(
+ Country4a.register(new Country4a("MADAGASCAR", "MG", "MDG", (short) 450));
+ Country4a.register(new Country4a("MALAWI", "MW", "MWI", (short) 454));
+ Country4a.register(new Country4a("MALAYSIA", "MY", "MYS", (short) 458));
+ Country4a.register(new Country4a("MALDIVES", "MV", "MDV", (short) 462));
+ Country4a.register(new Country4a("MALI", "ML", "MLI", (short) 466));
+ Country4a.register(new Country4a("MALTA", "MT", "MLT", (short) 470));
+ Country4a.register(new Country4a("MARSHALL ISLANDS", "MH", "MHL", (short) 584));
+ Country4a.register(new Country4a("MARTINIQUE", "MQ", "MTQ", (short) 474));
+ Country4a.register(new Country4a("MAURITANIA", "MR", "MRT", (short) 478));
+ Country4a.register(new Country4a("MAURITIUS", "MU", "MUS", (short) 480));
+ Country4a.register(new Country4a("MAYOTTE", "YT", "MYT", (short) 175));
+ Country4a.register(new Country4a("MEXICO", "MX", "MEX", (short) 484));
+ Country4a.register(new Country4a("MICRONESIA, FEDERATED STATES OF", "FM", "FSM", (short) 583));
+ Country4a.register(new Country4a("MOLDOVA, REPUBLIC OF", "MD", "MDA", (short) 498));
+ Country4a.register(new Country4a("MONACO", "MC", "MCO", (short) 492));
+ Country4a.register(new Country4a("MONGOLIA", "MN", "MNG", (short) 496));
+ Country4a.register(new Country4a("MONTSERRAT", "MS", "MSR", (short) 500));
+ Country4a.register(new Country4a("MOROCCO", "MA", "MAR", (short) 504));
+ Country4a.register(new Country4a("MOZAMBIQUE", "MZ", "MOZ", (short) 508));
+ Country4a.register(new Country4a("MYANMAR", "MM", "MMR", (short) 104));
+ Country4a.register(new Country4a("NAMIBIA", "NA", "NAM", (short) 516));
+ Country4a.register(new Country4a("NAURU", "NR", "NRU", (short) 520));
+ Country4a.register(new Country4a("NEPAL", "NP", "NPL", (short) 524));
+ Country4a.register(new Country4a("NETHERLANDS", "NL", "NLD", (short) 528));
+ Country4a.register(new Country4a("NETHERLANDS ANTILLES", "AN", "ANT", (short) 530));
+ Country4a.register(new Country4a("NEW CALEDONIA", "NC", "NCL", (short) 540));
+ Country4a.register(new Country4a("NEW ZEALAND", "NZ", "NZL", (short) 554));
+ Country4a.register(new Country4a("NICARAGUA", "NI", "NIC", (short) 558));
+ Country4a.register(new Country4a("NIGER", "NE", "NER", (short) 562));
+ Country4a.register(new Country4a("NIGERIA", "NG", "NGA", (short) 566));
+ Country4a.register(new Country4a("NIUE", "NU", "NIU", (short) 570));
+ Country4a.register(new Country4a("NORFOLK ISLAND", "NF", "NFK", (short) 574));
+ Country4a.register(new Country4a("NORTHERN MARIANA ISLANDS", "MP", "MNP", (short) 580));
+ Country4a.register(new Country4a("NORWAY", "NO", "NOR", (short) 578));
+ Country4a.register(new Country4a("OMAN", "OM", "OMN", (short) 512));
+ Country4a.register(new Country4a("PAKISTAN", "PK", "PAK", (short) 586));
+ Country4a.register(new Country4a("PALAU", "PW", "PLW", (short) 585));
+ Country4a.register(new Country4a("PALESTINIAN TERRITORY, Occupied", "PS", "PSE", (short) 275));
+ Country4a.register(new Country4a("PANAMA", "PA", "PAN", (short) 591));
+ Country4a.register(new Country4a("PAPUA NEW GUINEA", "PG", "PNG", (short) 598));
+ Country4a.register(new Country4a("PARAGUAY", "PY", "PRY", (short) 600));
+ Country4a.register(new Country4a("PERU", "PE", "PER", (short) 604));
+ Country4a.register(new Country4a("PHILIPPINES", "PH", "PHL", (short) 608));
+ Country4a.register(new Country4a("PITCAIRN", "PN", "PCN", (short) 612));
+ Country4a.register(new Country4a("POLAND", "PL", "POL", (short) 616));
+ Country4a.register(new Country4a("PORTUGAL", "PT", "PRT", (short) 620));
+ Country4a.register(new Country4a("PUERTO RICO", "PR", "PRI", (short) 630));
+ Country4a.register(new Country4a("QATAR", "QA", "QAT", (short) 634));
+ Country4a.register(new Country4a("REUNION", "RE", "REU", (short) 638));
+ Country4a.register(new Country4a("ROMANIA", "RO", "ROU", (short) 642));
+ Country4a.register(new Country4a("RUSSIAN FEDERATION", "RU", "RUS", (short) 643));
+ Country4a.register(new Country4a("RWANDA", "RW", "RWA", (short) 646));
+ Country4a.register(new Country4a("SAINT HELENA", "SH", "SHN", (short) 654));
+ Country4a.register(new Country4a("SAINT KITTS AND NEVIS", "KN", "KNA", (short) 659));
+ Country4a.register(new Country4a("SAINT LUCIA", "LC", "LCA", (short) 662));
+ Country4a.register(new Country4a("SAINT PIERRE AND MIQUELON", "PM", "SPM", (short) 666));
+ Country4a.register(new Country4a("SAINT VINCENT AND THE GRENADINES", "VC", "VCT", (short) 670));
+ Country4a.register(new Country4a("SAMOA", "WS", "WSM", (short) 882));
+ Country4a.register(new Country4a("SAN MARINO", "SM", "SMR", (short) 674));
+ Country4a.register(new Country4a("SAO TOME AND PRINCIPE", "ST", "STP", (short) 678));
+ Country4a.register(new Country4a("SAUDI ARABIA", "SA", "SAU", (short) 682));
+ Country4a.register(new Country4a("SENEGAL", "SN", "SEN", (short) 686));
+ Country4a.register(new Country4a("SERBIA AND MONTENEGRO", "CS", "SCG", (short) 891));
+ Country4a.register(new Country4a("SEYCHELLES", "SC", "SYC", (short) 690));
+ Country4a.register(new Country4a("SIERRA LEONE", "SL", "SLE", (short) 694));
+ Country4a.register(new Country4a("SINGAPORE", "SG", "SGP", (short) 702));
+ Country4a.register(new Country4a("SLOVAKIA", "SK", "SVK", (short) 703));
+ Country4a.register(new Country4a("SLOVENIA", "SI", "SVN", (short) 705));
+ Country4a.register(new Country4a("SOLOMON ISLANDS", "SB", "SLB", (short) 90));
+ Country4a.register(new Country4a("SOMALIA", "SO", "SOM", (short) 706));
+ Country4a.register(new Country4a("SOUTH AFRICA", "ZA", "ZAF", (short) 710));
+ Country4a.register(new Country4a(
"SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS", "GS", "SGS", (short) 239));
- Country4a.registerCountry(new Country4a("SPAIN", "ES", "ESP", (short) 724));
- Country4a.registerCountry(new Country4a("SRI LANKA", "LK", "LKA", (short) 144));
- Country4a.registerCountry(new Country4a("SUDAN", "SD", "SDN", (short) 736));
- Country4a.registerCountry(new Country4a("SURINAME", "SR", "SUR", (short) 740));
- Country4a.registerCountry(new Country4a("SVALBARD AND JAN MAYEN ISLANDS", "SJ", "SJM", (short) 744));
- Country4a.registerCountry(new Country4a("SWAZILAND", "SZ", "SWZ", (short) 748));
- Country4a.registerCountry(new Country4a("SWEDEN", "SE", "SWE", (short) 752));
- Country4a.registerCountry(new Country4a("SWITZERLAND", "CH", "CHE", (short) 756));
- Country4a.registerCountry(new Country4a("SYRIAN ARAB REPUBLIC", "SY", "SYR", (short) 760));
- Country4a.registerCountry(new Country4a("TAIWAN", "TW", "TWN", (short) 158));
- Country4a.registerCountry(new Country4a("TAJIKISTAN", "TJ", "TJK", (short) 762));
- Country4a.registerCountry(new Country4a("TANZANIA, UNITED REPUBLIC OF", "TZ", "TZA", (short) 834));
- Country4a.registerCountry(new Country4a("THAILAND", "TH", "THA", (short) 764));
- Country4a.registerCountry(new Country4a("TIMOR-LESTE", "TL", "TLS", (short) 626));
- Country4a.registerCountry(new Country4a("TOGO", "TG", "TGO", (short) 768));
- Country4a.registerCountry(new Country4a("TOKELAU", "TK", "TKL", (short) 772));
- Country4a.registerCountry(new Country4a("TONGA", "TO", "TON", (short) 776));
- Country4a.registerCountry(new Country4a("TRINIDAD AND TOBAGO", "TT", "TTO", (short) 780));
- Country4a.registerCountry(new Country4a("TUNISIA", "TN", "TUN", (short) 788));
- Country4a.registerCountry(new Country4a("TURKEY", "TR", "TUR", (short) 792));
- Country4a.registerCountry(new Country4a("TURKMENISTAN", "TM", "TKM", (short) 795));
- Country4a.registerCountry(new Country4a("TURKS AND CAICOS ISLANDS", "TC", "TCA", (short) 796));
- Country4a.registerCountry(new Country4a("TUVALU", "TV", "TUV", (short) 798));
- Country4a.registerCountry(new Country4a("UGANDA", "UG", "UGA", (short) 800));
- Country4a.registerCountry(new Country4a("UKRAINE", "UA", "UKR", (short) 804));
- Country4a.registerCountry(new Country4a("UNITED ARAB EMIRATES", "AE", "ARE", (short) 784));
- Country4a.registerCountry(new Country4a("UNITED KINGDOM", "GB", "GBR", (short) 826));
- Country4a.registerCountry(new Country4a("UNITED STATES", "US", "USA", (short) 840));
- Country4a.registerCountry(new Country4a("UNITED STATES MINOR OUTLYING ISLANDS", "UM", "UMI", (short) 581));
- Country4a.registerCountry(new Country4a("URUGUAY", "UY", "URY", (short) 858));
- Country4a.registerCountry(new Country4a("UZBEKISTAN", "UZ", "UZB", (short) 860));
- Country4a.registerCountry(new Country4a("VANUATU", "VU", "VUT", (short) 548));
- Country4a.registerCountry(new Country4a("VATICAN CITY STATE (HOLY SEE)", "VA", "VAT", (short) 336));
- Country4a.registerCountry(new Country4a("VENEZUELA", "VE", "VEN", (short) 862));
- Country4a.registerCountry(new Country4a("VIET NAM", "VN", "VNM", (short) 704));
- Country4a.registerCountry(new Country4a("VIRGIN ISLANDS (BRITISH)", "VG", "VGB", (short) 92));
- Country4a.registerCountry(new Country4a("VIRGIN ISLANDS (U.S.)", "VI", "VIR", (short) 850));
- Country4a.registerCountry(new Country4a("WALLIS AND FUTUNA ISLANDS", "WF", "WLF", (short) 876));
- Country4a.registerCountry(new Country4a("WESTERN SAHARA", "EH", "ESH", (short) 732));
- Country4a.registerCountry(new Country4a("YEMEN", "YE", "YEM", (short) 887));
- Country4a.registerCountry(new Country4a("ZAMBIA", "ZM", "ZMB", (short) 894));
- Country4a.registerCountry(new Country4a("ZIMBABWE", "ZW", "ZWE", (short) 716));
+ Country4a.register(new Country4a("SPAIN", "ES", "ESP", (short) 724));
+ Country4a.register(new Country4a("SRI LANKA", "LK", "LKA", (short) 144));
+ Country4a.register(new Country4a("SUDAN", "SD", "SDN", (short) 736));
+ Country4a.register(new Country4a("SURINAME", "SR", "SUR", (short) 740));
+ Country4a.register(new Country4a("SVALBARD AND JAN MAYEN ISLANDS", "SJ", "SJM", (short) 744));
+ Country4a.register(new Country4a("SWAZILAND", "SZ", "SWZ", (short) 748));
+ Country4a.register(new Country4a("SWEDEN", "SE", "SWE", (short) 752));
+ Country4a.register(new Country4a("SWITZERLAND", "CH", "CHE", (short) 756));
+ Country4a.register(new Country4a("SYRIAN ARAB REPUBLIC", "SY", "SYR", (short) 760));
+ Country4a.register(new Country4a("TAIWAN", "TW", "TWN", (short) 158));
+ Country4a.register(new Country4a("TAJIKISTAN", "TJ", "TJK", (short) 762));
+ Country4a.register(new Country4a("TANZANIA, UNITED REPUBLIC OF", "TZ", "TZA", (short) 834));
+ Country4a.register(new Country4a("THAILAND", "TH", "THA", (short) 764));
+ Country4a.register(new Country4a("TIMOR-LESTE", "TL", "TLS", (short) 626));
+ Country4a.register(new Country4a("TOGO", "TG", "TGO", (short) 768));
+ Country4a.register(new Country4a("TOKELAU", "TK", "TKL", (short) 772));
+ Country4a.register(new Country4a("TONGA", "TO", "TON", (short) 776));
+ Country4a.register(new Country4a("TRINIDAD AND TOBAGO", "TT", "TTO", (short) 780));
+ Country4a.register(new Country4a("TUNISIA", "TN", "TUN", (short) 788));
+ Country4a.register(new Country4a("TURKEY", "TR", "TUR", (short) 792));
+ Country4a.register(new Country4a("TURKMENISTAN", "TM", "TKM", (short) 795));
+ Country4a.register(new Country4a("TURKS AND CAICOS ISLANDS", "TC", "TCA", (short) 796));
+ Country4a.register(new Country4a("TUVALU", "TV", "TUV", (short) 798));
+ Country4a.register(new Country4a("UGANDA", "UG", "UGA", (short) 800));
+ Country4a.register(new Country4a("UKRAINE", "UA", "UKR", (short) 804));
+ Country4a.register(new Country4a("UNITED ARAB EMIRATES", "AE", "ARE", (short) 784));
+ Country4a.register(new Country4a("UNITED KINGDOM", "GB", "GBR", (short) 826));
+ Country4a.register(new Country4a("UNITED STATES", "US", "USA", (short) 840));
+ Country4a.register(new Country4a("UNITED STATES MINOR OUTLYING ISLANDS", "UM", "UMI", (short) 581));
+ Country4a.register(new Country4a("URUGUAY", "UY", "URY", (short) 858));
+ Country4a.register(new Country4a("UZBEKISTAN", "UZ", "UZB", (short) 860));
+ Country4a.register(new Country4a("VANUATU", "VU", "VUT", (short) 548));
+ Country4a.register(new Country4a("VATICAN CITY STATE (HOLY SEE)", "VA", "VAT", (short) 336));
+ Country4a.register(new Country4a("VENEZUELA", "VE", "VEN", (short) 862));
+ Country4a.register(new Country4a("VIET NAM", "VN", "VNM", (short) 704));
+ Country4a.register(new Country4a("VIRGIN ISLANDS (BRITISH)", "VG", "VGB", (short) 92));
+ Country4a.register(new Country4a("VIRGIN ISLANDS (U.S.)", "VI", "VIR", (short) 850));
+ Country4a.register(new Country4a("WALLIS AND FUTUNA ISLANDS", "WF", "WLF", (short) 876));
+ Country4a.register(new Country4a("WESTERN SAHARA", "EH", "ESH", (short) 732));
+ Country4a.register(new Country4a("YEMEN", "YE", "YEM", (short) 887));
+ Country4a.register(new Country4a("ZAMBIA", "ZM", "ZMB", (short) 894));
+ Country4a.register(new Country4a("ZIMBABWE", "ZW", "ZWE", (short) 716));
USA = Country4a.findFrom3Char("USA");
FINLAND = Country4a.findFrom3Char("FIN");
@@ -352,12 +352,17 @@
/**
* Constructor.
+ * Client code should not ordinarily use this constructor, but should use {@link #findFromAlpha(String)},
+ * {@link #findFrom2Char(String)}, {@link #findFrom3Char(String)}, or {@link #findFromNumeric(int) to obtain an
+ * instance of this class.
+ * If a new instance does need to be created, it must then be registered using {@link #register(Country4a)} for the
+ * methods mentioned above to be able to find it.
* @param englishName The English name of this country.
* @param alpha2Code The two-character code for this country.
* @param alpha3Code The three-character code for this country.
* @param numericCode The numeric code for this country.
*/
- private Country4a(final String englishName, final String alpha2Code, final String alpha3Code,
+ public Country4a(final String englishName, final String alpha2Code, final String alpha3Code,
final short numericCode) {
this.englishName = englishName;
this.alpha2Code = alpha2Code;
@@ -365,6 +370,16 @@
this.numericCode = numericCode;
}
+ /**
+ * Registers country information.
+ * @param country The country to be registered.
+ */
+ private static void register(final Country4a country) {
+ Country4a.map2char.put(country.getAlpha2Code(), country);
+ Country4a.map3char.put(country.getAlpha3Code(), country);
+ Country4a.mapNumeric.put(country.getNumericCode(), country);
+ }
+
@Override
public boolean isValid() {
return this.numericCode >= 0;
@@ -391,16 +406,6 @@
}
/**
- * Registers country information.
- * @param country The country to be registered.
- */
- private static void registerCountry(final Country4a country) {
- Country4a.map2char.put(country.getAlpha2Code(), country);
- Country4a.map3char.put(country.getAlpha3Code(), country);
- Country4a.mapNumeric.put(country.getNumericCode(), country);
- }
-
- /**
* Finds an instance of this class from a given 2-character or 3-character language code as defined in ISO 3166.
* @param countryCode A 2-letter or 3-letter ISO 3166 code.
* @return The instance matching {@code countryCode}, or null if the code has not been registered.
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Language4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Language4a.java 2021-11-10 20:56:15 UTC (rev 12024)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Language4a.java 2021-11-10 22:44:12 UTC (rev 12025)
@@ -677,12 +677,16 @@
/**
* Constructor.
+ * Client code should not ordinarily use this constructor, but should use {@link #findFromAlpha(String)},
+ * {@link #findFrom2Char(String)}, or {@link #findFrom3Char(String)} to obtain an instance of this class.
+ * If a new instance does need to be created, it must then be registered using {@link #register(Language4a)} for the
+ * methods mentioned above to be able to find it.
* @param alpha3Code The three-character code for this language.
* @param alpha2Code The two-character code for this language.
* @param englishName The English name of this language.
* @param frenchName The French name of this language.
*/
- private Language4a(final String alpha3Code, final String alpha2Code, final String englishName,
+ public Language4a(final String alpha3Code, final String alpha2Code, final String englishName,
final String frenchName) {
this.alpha3Code = alpha3Code;
this.alpha2Code = alpha2Code;
@@ -690,6 +694,32 @@
this.frenchName = frenchName;
}
+ /**
+ * Registers language information.
+ * @param language The language to be registered.
+ * @throws IllegalArgumentException If alpha2Code has already been registered, if alpha3Code is null, or if
+ * alpha3Code has already been registered.
+ */
+ public static void register(final Language4a language) {
+ final String alpha2Code = language.getAlpha2Code();
+ if (alpha2Code != null) {
+ if (Language4a.map2char.containsKey(alpha2Code)) {
+ throw new IllegalArgumentException("Language 2-char code already registered: " + alpha2Code);
+ }
+ }
+
+ final String alpha3Code = language.getAlpha3Code();
+ if (alpha3Code == null) {
+ throw new IllegalArgumentException("Language 3-char code already registered: " + alpha3Code);
+ }
+
+ /* If we got this far, the language parameter can be registered. */
+ if (alpha2Code != null) {
+ Language4a.map2char.put(language.getAlpha2Code(), language);
+ }
+ Language4a.map3char.put(language.getAlpha3Code(), language);
+ }
+
@Override
public String getAlpha2Code() {
return this.alpha2Code;
@@ -711,18 +741,6 @@
}
/**
- * Registers language information.
- * @param language The language to be registered.
- */
- public static void register(final Language4a language) {
- final String alpha2Code = language.getAlpha2Code();
- if (alpha2Code != null) {
- Language4a.map2char.put(language.getAlpha2Code(), language);
- }
- Language4a.map3char.put(language.getAlpha3Code(), language);
- }
-
- /**
* Finds an instance of this class from a given 2-character or 3-character
* language code as defined in ISO 639.
* @param languageCode A 2-letter or 3-letter ISO 639 code.
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 20:56:15 UTC (rev 12024)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 22:44:12 UTC (rev 12025)
@@ -35,6 +35,7 @@
import java.lang.Character.UnicodeScript;
import java.util.HashMap;
+import java.util.Iterator;
import java.util.Map;
/**
@@ -337,6 +338,11 @@
/**
* Constructor.
+ * Client code should not ordinarily use this constructor, but should use {@link #findFromAlpha(String)},
+ * {@link #findFromNumeric(int)}, {@link #findFromIcu4jCode(int)}, or {@link #findFromUnicodeScript(UnicodeScript)}
+ * to obtain an instance of this class.
+ * If a new instance does need to be created, it must then be registered using {@link #register(Script4a)} for the
+ * methods mentioned above to be able to find it.
* @param alpha The ISO-15924 alpha code for this script.
* @param numeric The ISO-15924 numeric code for this script.
* @param englishName The ISO-15924 English name of this script.
@@ -344,7 +350,7 @@
* @param icu4jCode The ICU4J code for this script.
* @param unicodeScript The Unicode script corresponding to this ISO-15924 script.
*/
- private Script4a(final String alpha, final short numeric, final String englishName, final String frenchName,
+ public Script4a(final String alpha, final short numeric, final String englishName, final String frenchName,
final byte icu4jCode, final UnicodeScript unicodeScript) {
this.alphaCode = alpha;
this.numericCode = numeric;
@@ -354,6 +360,29 @@
this.unicodeScript = unicodeScript;
}
+ /**
+ * Registers script information.
+ * @param script The script to be registered.
+ * @throws IllegalArgumentException If any components of the script have already been registered.
+ */
+ public static void register(final Script4a script) {
+ if (findFromAlpha(script.getAlphaCode()) != null) {
+ throw new IllegalArgumentException("ISO 15924 Alpha code already registered: " + script.getAlphaCode());
+ }
+ if (findFromNumeric(script.getNumericCode()) != null) {
+ throw new IllegalArgumentException("ISO 15924 Numeric code already registered: " + script.getNumericCode());
+ }
+ if (findFromIcu4jCode(script.getIcu4jCode()) != null) {
+ throw new IllegalArgumentException("ICU4J code already registered: " + script.getIcu4jCode());
+ }
+ if (findFromUnicodeScript(script.getUnicodeScript()) != null) {
+ throw new IllegalArgumentException("Unicode script already registered: " + script.getUnicodeScript());
+ }
+ /* Convert alpha map key to lower-case, to make searches case-insensitive. */
+ Script4a.mapAlpha.put(script.getAlphaCode().toLowerCase(), script);
+ Script4a.mapNumeric.put(script.getNumericCode(), script);
+ }
+
@Override
public String getAlphaCode() {
return this.alphaCode;
@@ -385,17 +414,7 @@
}
/**
- * Registers script information.
- * @param script The script to be registered.
- */
- public static void register(final Script4a script) {
- /* Convert alpha map key to lower-case, to make searches case-insensitive. */
- Script4a.mapAlpha.put(script.getAlphaCode().toLowerCase(), script);
- Script4a.mapNumeric.put(script.getNumericCode(), script);
- }
-
- /**
- * Finds an instance of this class from a given alpha script code as defined in ISO 15924.
+ * Finds an instance of this class from a given ISO 15924 alpha code.
* This search is case-insensitive.
* @param alphaCode The 4-letter ISO 15924 code.
* @return The instance matching {@code alphaCode}, or null if the code is not registered.
@@ -407,7 +426,7 @@
}
/**
- * Finds an instance of this class from a given alpha script code as defined in ISO 15924.
+ * Finds an instance of this class from a given ISO 15924 numeric code.
* @param numericCode The numeric ISO 15924 code.
* @return The instance matching {@code numericCode}, or null if the code is not registered.
*/
@@ -419,6 +438,48 @@
return Script4a.mapNumeric.get((short) numericCode);
}
+ /**
+ * Finds an instance of this class from a given ICU4J script code.
+ * @param icu4jCode The icu4j code.
+ * @return The instance matching {@code icu4jCode}, or null if the code is not registered.
+ */
+ public static Script4a findFromIcu4jCode(final int icu4jCode) {
+ /* If needed, we can do better on performance here at the expense of memory. */
+ if (icu4jCode == ICU4J_NOT_SUPPORTED) {
+ return null;
+ }
+ final Iterator<Map.Entry<Short, Script4a>> iterator = Script4a.mapNumeric.entrySet().iterator();
+ while (iterator.hasNext()) {
+ final Map.Entry<Short, Script4a> entry = iterator.next();
+ final Script4a script = entry.getValue();
+ if (script.getIcu4jCode() == icu4jCode) {
+ return script;
+ }
+ }
+ return null;
+ }
+
+ /**
+ * Finds an instance of this class from a given Unicode script code.
+ * @param unicodeScript The icu4j code.
+ * @return The instance matching {@code unicodeScript}, or null if the code is not registered.
+ */
+ public static Script4a findFromUnicodeScript(final UnicodeScript unicodeScript) {
+ /* If needed, we can do better on performance here at the expense of memory. */
+ final Iterator<Map.Entry<Short, Script4a>> iterator = Script4a.mapNumeric.entrySet().iterator();
+ if (unicodeScript == null) {
+ return null;
+ }
+ while (iterator.hasNext()) {
+ final Map.Entry<Short, Script4a> entry = iterator.next();
+ final Script4a script = entry.getValue();
+ if (script.getUnicodeScript() == unicodeScript) {
+ return script;
+ }
+ }
+ return null;
+ }
+
@Override
public String toString() {
return this.alphaCode;
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-10 21:08:15
|
Revision: 12024
http://sourceforge.net/p/foray/code/12024
Author: victormote
Date: 2021-11-10 20:56:15 +0000 (Wed, 10 Nov 2021)
Log Message:
-----------
Add icu4j and java unicode script relationships.
Modified Paths:
--------------
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 17:55:29 UTC (rev 12023)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 20:56:15 UTC (rev 12024)
@@ -68,178 +68,245 @@
/* Checkstyle: Allow Magic Numbers that are hard-coded data. */
static {
- Script4a.register(new Script4a("Arab", (short) 160, "Arabic", "arabe", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Armn", (short) 230, "Armenian", "arménien", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Bali", (short) 360, "Balinese", "balinais", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Batk", (short) 365, "Batak", "batak", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Beng", (short) 325, "Bengali", "bengalî", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Blis", (short) 550, "Blissymbols", "symboles Bliss", ICU4J_NOT_SUPPORTED,
- null));
- Script4a.register(new Script4a("Bopo", (short) 285, "Bopomofo", "bopomofo", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Brah", (short) 300, "Brahmi", "brâhmî", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Brai", (short) 570, "Braille", "braille", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Bugi", (short) 367, "Buginese", "bouguis", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Buhd", (short) 372, "Buhid", "bouhide", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Arab", (short) 160, "Arabic", "arabe",
+ (byte) UScript.ARABIC, UnicodeScript.ARABIC));
+ Script4a.register(new Script4a("Armn", (short) 230, "Armenian", "arménien",
+ (byte) UScript.ARMENIAN, UnicodeScript.ARMENIAN));
+ Script4a.register(new Script4a("Bali", (short) 360, "Balinese", "balinais",
+ (byte) UScript.BALINESE, UnicodeScript.BALINESE));
+ Script4a.register(new Script4a("Batk", (short) 365, "Batak", "batak",
+ (byte) UScript.BATAK, UnicodeScript.BATAK));
+ Script4a.register(new Script4a("Beng", (short) 325, "Bengali", "bengalî",
+ (byte) UScript.BENGALI, UnicodeScript.BENGALI));
+ Script4a.register(new Script4a("Blis", (short) 550, "Blissymbols", "symboles Bliss",
+ (byte) UScript.BLISSYMBOLS, null));
+ Script4a.register(new Script4a("Bopo", (short) 285, "Bopomofo", "bopomofo",
+ (byte) UScript.BOPOMOFO, UnicodeScript.BOPOMOFO));
+ Script4a.register(new Script4a("Brah", (short) 300, "Brahmi", "brâhmî",
+ (byte) UScript.BRAHMI, UnicodeScript.BRAHMI));
+ Script4a.register(new Script4a("Brai", (short) 570, "Braille", "braille",
+ (byte) UScript.BRAILLE, UnicodeScript.BRAILLE));
+ Script4a.register(new Script4a("Bugi", (short) 367, "Buginese", "bouguis",
+ (byte) UScript.BUGINESE, UnicodeScript.BUGINESE));
+ Script4a.register(new Script4a("Buhd", (short) 372, "Buhid", "bouhide",
+ (byte) UScript.BUHID, UnicodeScript.BUHID));
Script4a.register(new Script4a(
"Cans", (short) 440, "Unified Canadian Aboriginal Syllabics",
- "syllabaire autochtone canadien unifié", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Cari", (short) 201, "Carian", "carien", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Cham", (short) 358, "Cham", "cham (cam, tcham)", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Cher", (short) 445, "Cherokee", "tchérokî", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Cirt", (short) 291, "Cirth", "cirth", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Copt", (short) 204, "Coptic", "copte", ICU4J_NOT_SUPPORTED, null));
+ "syllabaire autochtone canadien unifié",
+ (byte) UScript.CANADIAN_ABORIGINAL, UnicodeScript.CANADIAN_ABORIGINAL));
+ Script4a.register(new Script4a("Cari", (short) 201, "Carian", "carien",
+ (byte) UScript.CARIAN, UnicodeScript.CARIAN));
+ Script4a.register(new Script4a("Cham", (short) 358, "Cham", "cham (cam, tcham)",
+ (byte) UScript.CHAM, UnicodeScript.CHAM));
+ Script4a.register(new Script4a("Cher", (short) 445, "Cherokee", "tchérokî",
+ (byte) UScript.CHEROKEE, UnicodeScript.CHEROKEE));
+ Script4a.register(new Script4a("Cirt", (short) 291, "Cirth", "cirth",
+ (byte) UScript.CIRTH, null));
+ Script4a.register(new Script4a("Copt", (short) 204, "Coptic", "copte",
+ (byte) UScript.COPTIC, UnicodeScript.COPTIC));
Script4a.register(new Script4a("Cprt", (short) 403, "Cypriot", "syllabaire chypriote",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.CYPRIOT, UnicodeScript.CYPRIOT));
Script4a.register(new Script4a("Cyrl", (short) 220, "Cyrillic", "cyrillique",
(byte) UScript.CYRILLIC, Character.UnicodeScript.CYRILLIC));
Script4a.register(new Script4a(
"Cyrs", (short) 221, "Cyrillic (Old Church Slavonic variant)", "cyrillique (variante slavonne)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.OLD_CHURCH_SLAVONIC_CYRILLIC, null));
Script4a.register(new Script4a("Deva", (short) 315, "Devanagari (Nagari)", "dévanâgarî",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.DEVANAGARI, UnicodeScript.DEVANAGARI));
Script4a.register(new Script4a("Dsrt", (short) 250, "Deseret (Mormon)", "déseret (mormon)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.DESERET, UnicodeScript.DESERET));
Script4a.register(new Script4a("Egyd", (short) 070, "Egyptian demotic", "démotique égyptien",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.DEMOTIC_EGYPTIAN, null));
Script4a.register(new Script4a("Egyh", (short) 060, "Egyptian hieratic", "hiératique égyptien",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.HIERATIC_EGYPTIAN, null));
Script4a.register(new Script4a("Egyp", (short) 050, "Egyptian hieroglyphs", "hiéroglyphes égyptiens",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Ethi", (short) 430, "Ethiopic (Ge?ez)", "éthiopien (ge?ez, guèze)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a(
- "Geok", (short) 241, "Khutsuri (Asomtavruli and Nuskhuri)",
- "khoutsouri (assomtavrouli et nouskhouri)", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.EGYPTIAN_HIEROGLYPHS, UnicodeScript.EGYPTIAN_HIEROGLYPHS));
+ Script4a.register(new Script4a("Ethi", (short) 430, "Ethiopic (Geʻez)", "éthiopien (geʻez, guèze)",
+ (byte) UScript.ETHIOPIC, UnicodeScript.ETHIOPIC));
+ Script4a.register(new Script4a("Geok", (short) 241, "Khutsuri (Asomtavruli and Nuskhuri)",
+ "khoutsouri (assomtavrouli et nouskhouri)", (byte) UScript.KHUTSURI, null));
Script4a.register(new Script4a("Geor", (short) 240, "Georgian (Mkhedruli)", "géorgien (mkhédrouli)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Glag", (short) 225, "Glagolitic", "glagolitique", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Goth", (short) 206, "Gothic", "gotique", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Grek", (short) 200, "Greek", "grec", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.GEORGIAN, UnicodeScript.GEORGIAN));
+ Script4a.register(new Script4a("Glag", (short) 225, "Glagolitic", "glagolitique",
+ (byte) UScript.GLAGOLITIC, UnicodeScript.GLAGOLITIC));
+ Script4a.register(new Script4a("Goth", (short) 206, "Gothic", "gotique",
+ (byte) UScript.GOTHIC, UnicodeScript.GOTHIC));
+ Script4a.register(new Script4a("Grek", (short) 200, "Greek", "grec",
+ (byte) UScript.GREEK, UnicodeScript.GREEK));
Script4a.register(new Script4a("Gujr", (short) 320, "Gujarati", "goudjarâtî (gujrâtî)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Guru", (short) 310, "Gurmukhi", "gourmoukhî", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.GUJARATI, UnicodeScript.GUJARATI));
+ Script4a.register(new Script4a("Guru", (short) 310, "Gurmukhi", "gourmoukhî",
+ (byte) UScript.GURMUKHI, UnicodeScript.GURMUKHI));
Script4a.register(new Script4a("Hang", (short) 286, "Hangul (Hangul, Hangeul)", "hangûl (hangul, hangeul)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.HANGUL, UnicodeScript.HANGUL));
Script4a.register(new Script4a("Hani", (short) 500, "Han (Hanzi, Kanji, Hanja)", "idéogrammes han",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.HAN, UnicodeScript.HAN));
Script4a.register(new Script4a("Hano", (short) 371, "Hanunoo (Hanunóo)", "hanounóo",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.HANUNOO, UnicodeScript.HANUNOO));
Script4a.register(new Script4a(
"Hans", (short) 501, "Han (Simplified variant)", "idéogrammes han (variante simplifiée)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.SIMPLIFIED_HAN, null));
Script4a.register(new Script4a(
"Hant", (short) 502, "Han (Traditional variant)", "idéogrammes han (variante traditionnelle)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Hebr", (short) 125, "Hebrew", "hébreu", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Hira", (short) 410, "Hiragana", "hiragana", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Hmng", (short) 450, "Pahawh Hmong", "pahawh hmong", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a(
- "Hrkt", (short) 412, "(alias for Hiragana + Katakana)", "(alias pour hiragana + katakana)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.TRADITIONAL_HAN, null));
+ Script4a.register(new Script4a("Hebr", (short) 125, "Hebrew", "hébreu",
+ (byte) UScript.HEBREW, UnicodeScript.HEBREW));
+ Script4a.register(new Script4a("Hira", (short) 410, "Hiragana", "hiragana",
+ (byte) UScript.HIRAGANA, UnicodeScript.HIRAGANA));
+ Script4a.register(new Script4a("Hmng", (short) 450, "Pahawh Hmong", "pahawh hmong",
+ (byte) UScript.PAHAWH_HMONG, null));
+ Script4a.register(new Script4a("Hrkt", (short) 412, "(alias for Hiragana + Katakana)",
+ "(alias pour hiragana + katakana)",
+ (byte) UScript.KATAKANA_OR_HIRAGANA, null));
Script4a.register(new Script4a("Hung", (short) 176, "Old Hungarian", "ancien hongrois",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Inds", (short) 610, "Indus (Harappan)", "indus", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.OLD_HUNGARIAN, null));
+ Script4a.register(new Script4a("Inds", (short) 610, "Indus (Harappan)", "indus",
+ (byte) UScript.HARAPPAN_INDUS, null));
Script4a.register(new Script4a(
"Ital", (short) 210, "Old Italic (Etruscan, Oscan, etc.)", "ancien italique (étrusque, osque, etc.)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a(
- "Java", (short) 361, "Javanese", "javanais", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a(
- "Jpan", (short) 413, "Japanese (alias for Han + Hiragana + Katakana)",
- "japonais (alias pour han + hiragana + katakana)", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Kali", (short) 357, "Kayah Li", "kayah li", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Kana", (short) 411, "Katakana", "katakana", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Khar", (short) 305, "Kharoshthi", "kharochthî", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Khmr", (short) 355, "Khmer", "khmer", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Knda", (short) 345, "Kannada", "kannara (canara)", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Lana", (short) 351, "Lanna", "lanna", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Laoo", (short) 356, "Lao", "laotien", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.OLD_ITALIC, UnicodeScript.OLD_ITALIC));
+ Script4a.register(new Script4a("Java", (short) 361, "Javanese", "javanais",
+ (byte) UScript.JAVANESE, UnicodeScript.JAVANESE));
+ Script4a.register(new Script4a("Jpan", (short) 413, "Japanese (alias for Han + Hiragana + Katakana)",
+ "japonais (alias pour han + hiragana + katakana)",
+ (byte) UScript.JAPANESE, null));
+ Script4a.register(new Script4a("Kali", (short) 357, "Kayah Li", "kayah li",
+ (byte) UScript.KAYAH_LI, UnicodeScript.KAYAH_LI));
+ Script4a.register(new Script4a("Kana", (short) 411, "Katakana", "katakana",
+ (byte) UScript.KATAKANA, UnicodeScript.KATAKANA));
+ Script4a.register(new Script4a("Khar", (short) 305, "Kharoshthi", "kharochthî",
+ (byte) UScript.KHAROSHTHI, UnicodeScript.KHAROSHTHI));
+ Script4a.register(new Script4a("Khmr", (short) 355, "Khmer", "khmer",
+ (byte) UScript.KHMER, UnicodeScript.KHMER));
+ Script4a.register(new Script4a("Knda", (short) 345, "Kannada", "kannara (canara)",
+ (byte) UScript.KANNADA, UnicodeScript.KANNADA));
+ Script4a.register(new Script4a("Lana", (short) 351, "Lanna", "lanna",
+ (byte) UScript.LANNA, null));
+ Script4a.register(new Script4a("Laoo", (short) 356, "Lao", "laotien",
+ (byte) UScript.LAO, UnicodeScript.LAO));
Script4a.register(new Script4a("Latf", (short) 217, "Latin (Fraktur variant)", "latin (variante brisée)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.LATIN_FRAKTUR, null));
Script4a.register(new Script4a("Latg", (short) 216, "Latin (Gaelic variant)", "latin (variante gaélique)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.LATIN_GAELIC, null));
Script4a.register(new Script4a("Latn", (short) 215, "Latin", "latin",
(byte) UScript.LATIN, Character.UnicodeScript.LATIN));
Script4a.register(new Script4a("Lepc", (short) 335, "Lepcha (Róng)", "lepcha (róng)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Limb", (short) 336, "Limbu", "limbou", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Lina", (short) 400, "Linear A", "linéaire A", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Linb", (short) 401, "Linear B", "linéaire B", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Lyci", (short) 202, "Lycian", "lycien", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Lydi", (short) 116, "Lydian", "lydien", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Mand", (short) 140, "Mandaean", "mandéen", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.LEPCHA, UnicodeScript.LEPCHA));
+ Script4a.register(new Script4a("Limb", (short) 336, "Limbu", "limbou",
+ (byte) UScript.LIMBU, UnicodeScript.LIMBU));
+ Script4a.register(new Script4a("Lina", (short) 400, "Linear A", "linéaire A",
+ (byte) UScript.LINEAR_A, null));
+ Script4a.register(new Script4a("Linb", (short) 401, "Linear B", "linéaire B",
+ (byte) UScript.LINEAR_B, UnicodeScript.LINEAR_B));
+ Script4a.register(new Script4a("Lyci", (short) 202, "Lycian", "lycien",
+ (byte) UScript.LYCIAN, UnicodeScript.LYCIAN));
+ Script4a.register(new Script4a("Lydi", (short) 116, "Lydian", "lydien",
+ (byte) UScript.LYDIAN, UnicodeScript.LYDIAN));
+ Script4a.register(new Script4a("Mand", (short) 140, "Mandaean", "mandéen",
+ (byte) UScript.MANDAEAN, null));
Script4a.register(new Script4a("Maya", (short) 90, "Mayan hieroglyphs", "hiéroglyphes mayas",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Mero", (short) 100, "Meroitic", "méroïtique", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Mlym", (short) 347, "Malayalam", "malayâlam", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Mong", (short) 145, "Mongolian", "mongol", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a(
- "Moon", (short) 218, "Moon (Moon code, Moon script, Moon type)", "écriture Moon",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.MAYAN_HIEROGLYPHS, null));
+ Script4a.register(new Script4a("Mero", (short) 100, "Meroitic", "méroïtique",
+ (byte) UScript.MEROITIC, null));
+ Script4a.register(new Script4a("Mlym", (short) 347, "Malayalam", "malayâlam",
+ (byte) UScript.MALAYALAM, UnicodeScript.MALAYALAM));
+ Script4a.register(new Script4a("Mong", (short) 145, "Mongolian", "mongol",
+ (byte) UScript.MONGOLIAN, UnicodeScript.MONGOLIAN));
+ Script4a.register(new Script4a("Moon", (short) 218, "Moon (Moon code, Moon script, Moon type)", "écriture Moon",
+ (byte) UScript.MOON, null));
Script4a.register(new Script4a("Mtei", (short) 337, "Meitei Mayek (Meithei, Meetei)", "meitei mayek",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Mymr", (short) 350, "Myanmar (Burmese)", "birman", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Nkoo", (short) 165, "N’Ko", "n’ko", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Ogam", (short) 212, "Ogham", "ogam", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.MEITEI_MAYEK, UnicodeScript.MEETEI_MAYEK));
+ Script4a.register(new Script4a("Mymr", (short) 350, "Myanmar (Burmese)", "birman",
+ (byte) UScript.MYANMAR, UnicodeScript.MYANMAR));
+ Script4a.register(new Script4a("Nkoo", (short) 165, "N’Ko", "n’ko",
+ (byte) UScript.NKO, UnicodeScript.NKO));
+ Script4a.register(new Script4a("Ogam", (short) 212, "Ogham", "ogam",
+ (byte) UScript.OGHAM, UnicodeScript.OGHAM));
Script4a.register(new Script4a("Olck", (short) 261, "Ol Chiki (Ol Cemet’, Ol, Santali)", "ol tchiki",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Orkh", (short) 175, "Orkhon", "orkhon", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Orya", (short) 327, "Oriya", "oriyâ", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Osma", (short) 260, "Osmanya", "osmanais", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Perm", (short) 227, "Old Permic", "ancien permien", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Phag", (short) 331, "Phags-pa", "’phags pa", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Phnx", (short) 115, "Phoenician", "phénicien", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.OL_CHIKI, UnicodeScript.OL_CHIKI));
+ Script4a.register(new Script4a("Orkh", (short) 175, "Orkhon", "orkhon",
+ (byte) UScript.ORKHON, null));
+ Script4a.register(new Script4a("Orya", (short) 327, "Oriya", "oriyâ",
+ (byte) UScript.ORIYA, UnicodeScript.ORIYA));
+ Script4a.register(new Script4a("Osma", (short) 260, "Osmanya", "osmanais",
+ (byte) UScript.OSMANYA, UnicodeScript.OSMANYA));
+ Script4a.register(new Script4a("Perm", (short) 227, "Old Permic", "ancien permien",
+ (byte) UScript.OLD_PERMIC, null));
+ Script4a.register(new Script4a("Phag", (short) 331, "Phags-pa", "’phags pa",
+ (byte) UScript.PHAGS_PA, UnicodeScript.PHAGS_PA));
+ Script4a.register(new Script4a("Phnx", (short) 115, "Phoenician", "phénicien",
+ (byte) UScript.PHOENICIAN, UnicodeScript.PHOENICIAN));
Script4a.register(new Script4a("Plrd", (short) 282, "Pollard Phonetic", "phonétique de Pollard",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.PHONETIC_POLLARD, null));
Script4a.register(new Script4a("Rjng", (short) 363, "Rejang, Redjang, Kaganga", "redjang",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Roro", (short) 620, "Rongorongo", "rongorongo", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Runr", (short) 211, "Runic", "runique", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Sara", (short) 292, "Sarati", "sarati", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Saur", (short) 344, "Saurashtra", "saurachtra", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.REJANG, UnicodeScript.REJANG));
+ Script4a.register(new Script4a("Roro", (short) 620, "Rongorongo", "rongorongo",
+ (byte) UScript.RONGORONGO, null));
+ Script4a.register(new Script4a("Runr", (short) 211, "Runic", "runique",
+ (byte) UScript.RUNIC, UnicodeScript.RUNIC));
+ Script4a.register(new Script4a("Sara", (short) 292, "Sarati", "sarati",
+ (byte) UScript.SARATI, null));
+ Script4a.register(new Script4a("Saur", (short) 344, "Saurashtra", "saurachtra",
+ (byte) UScript.SAURASHTRA, UnicodeScript.SAURASHTRA));
Script4a.register(new Script4a("Sgnw", (short) 95, "SignWriting", "SignÉcriture, SignWriting",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.SIGN_WRITING, null));
Script4a.register(new Script4a("Shaw", (short) 281, "Shavian (Shaw)", "shavien (Shaw)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Sinh", (short) 348, "Sinhala", "singhalais", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Sund", (short) 362, "Sundanese", "sundanais", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Sylo", (short) 316, "Syloti Nagri", "sylotî nâgrî", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Syrc", (short) 135, "Syriac", "syriaque", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.SHAVIAN, UnicodeScript.SHAVIAN));
+ Script4a.register(new Script4a("Sinh", (short) 348, "Sinhala", "singhalais",
+ (byte) UScript.SINHALA, UnicodeScript.SINHALA));
+ Script4a.register(new Script4a("Sund", (short) 362, "Sundanese", "sundanais",
+ (byte) UScript.SUNDANESE, UnicodeScript.SUNDANESE));
+ Script4a.register(new Script4a("Sylo", (short) 316, "Syloti Nagri", "sylotî nâgrî",
+ (byte) UScript.SYLOTI_NAGRI, UnicodeScript.SYLOTI_NAGRI));
+ Script4a.register(new Script4a("Syrc", (short) 135, "Syriac", "syriaque",
+ (byte) UScript.SYRIAC, UnicodeScript.SYRIAC));
Script4a.register(new Script4a(
"Syre", (short) 138, "Syriac (Estrangelo variant)", "syriaque (variante estranghélo)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.ESTRANGELO_SYRIAC, null));
Script4a.register(new Script4a(
"Syrj", (short) 137, "Syriac (Western variant)", "syriaque (variante occidentale)",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.WESTERN_SYRIAC, null));
Script4a.register(new Script4a(
"Syrn", (short) 136, "Syriac (Eastern variant)", "syriaque (variante orientale)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Tagb", (short) 373, "Tagbanwa", "tagbanoua", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Tale", (short) 353, "Tai Le", "taï-le", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.EASTERN_SYRIAC, null));
+ Script4a.register(new Script4a("Tagb", (short) 373, "Tagbanwa", "tagbanoua",
+ (byte) UScript.TAGBANWA, UnicodeScript.TAGBANWA));
+ Script4a.register(new Script4a("Tale", (short) 353, "Tai Le", "taï-le",
+ (byte) UScript.TAI_LE, UnicodeScript.TAI_LE));
Script4a.register(new Script4a("Talu", (short) 354, "New Tai Lue", "nouveau taï-lue",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Taml", (short) 346, "Tamil", "tamoul", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Telu", (short) 340, "Telugu", "télougou", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Teng", (short) 290, "Tengwar", "tengwar", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.NEW_TAI_LUE, UnicodeScript.NEW_TAI_LUE));
+ Script4a.register(new Script4a("Taml", (short) 346, "Tamil", "tamoul",
+ (byte) UScript.TAMIL, UnicodeScript.TAMIL));
+ Script4a.register(new Script4a("Telu", (short) 340, "Telugu", "télougou",
+ (byte) UScript.TELUGU, UnicodeScript.TELUGU));
+ Script4a.register(new Script4a("Teng", (short) 290, "Tengwar", "tengwar",
+ (byte) UScript.TENGWAR, null));
Script4a.register(new Script4a("Tfng", (short) 120, "Tifinagh (Berber)", "tifinagh (berbère)",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Tglg", (short) 370, "Tagalog", "tagal", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Thaa", (short) 170, "Thaana", "thâna", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Thai", (short) 352, "Thai", "thaï", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Tibt", (short) 330, "Tibetan", "tibétain", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Ugar", (short) 040, "Ugaritic", "ougaritique", ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Vaii", (short) 470, "Vai", "vaï", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.TIFINAGH, UnicodeScript.TIFINAGH));
+ Script4a.register(new Script4a("Tglg", (short) 370, "Tagalog", "tagal",
+ (byte) UScript.TAGALOG, UnicodeScript.TAGALOG));
+ Script4a.register(new Script4a("Thaa", (short) 170, "Thaana", "thâna",
+ (byte) UScript.THAANA, UnicodeScript.THAANA));
+ Script4a.register(new Script4a("Thai", (short) 352, "Thai", "thaï",
+ (byte) UScript.THAI, UnicodeScript.THAI));
+ Script4a.register(new Script4a("Tibt", (short) 330, "Tibetan", "tibétain",
+ (byte) UScript.TIBETAN, UnicodeScript.TIBETAN));
+ Script4a.register(new Script4a("Ugar", (short) 040, "Ugaritic", "ougaritique",
+ (byte) UScript.UGARITIC, UnicodeScript.UGARITIC));
+ Script4a.register(new Script4a("Vaii", (short) 470, "Vai", "vaï",
+ (byte) UScript.VAI, UnicodeScript.VAI));
Script4a.register(new Script4a("Visp", (short) 280, "Visible Speech", "parole visible",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.VISIBLE_SPEECH, null));
Script4a.register(new Script4a("Xpeo", (short) 030, "Old Persian", "cunéiforme persépolitain",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.OLD_PERSIAN, UnicodeScript.OLD_PERSIAN));
Script4a.register(new Script4a(
"Xsux", (short) 020, "Cuneiform, Sumero-Akkadian", "cunéiforme suméro-akkadien",
- ICU4J_NOT_SUPPORTED, null));
- Script4a.register(new Script4a("Yiii", (short) 460, "Yi", "yi", ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.CUNEIFORM, UnicodeScript.CUNEIFORM));
+ Script4a.register(new Script4a("Yiii", (short) 460, "Yi", "yi",
+ (byte) UScript.YI, UnicodeScript.YI));
Script4a.register(new Script4a(
"Zxxx", (short) 997, "Code for unwritten languages", "codet pour les langues non écrites",
- ICU4J_NOT_SUPPORTED, null));
+ (byte) UScript.UNWRITTEN_LANGUAGES, null));
Script4a.register(Script4a.UNDETERMINED);
Script4a.register(new Script4a(
"Zzzz", (short) 999, "Code for uncoded script", "codet pour écriture non codée",
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-10 17:55:31
|
Revision: 12023
http://sourceforge.net/p/foray/code/12023
Author: victormote
Date: 2021-11-10 17:55:29 +0000 (Wed, 10 Nov 2021)
Log Message:
-----------
Conform to new aXSL requirements for Script.
Modified Paths:
--------------
trunk/foray/foray-common/build.gradle
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
Modified: trunk/foray/foray-common/build.gradle
===================================================================
--- trunk/foray/foray-common/build.gradle 2021-11-10 17:22:53 UTC (rev 12022)
+++ trunk/foray/foray-common/build.gradle 2021-11-10 17:55:29 UTC (rev 12023)
@@ -4,6 +4,7 @@
api group: 'org.slf4j', name: 'slf4j-api', version: slf4jVersion
api group: 'commons-io', name: 'commons-io', version: commonsIoVersion
api group: 'xml-resolver', name: 'xml-resolver', version: xmlResolverVersion
+ api group: 'com.ibm.icu', name: 'icu4j', version: icu4jVersion
api group: 'org.axsl', name: 'axsl-common', version: axslVersion
api group: 'org.axsl', name: 'axsl-ps', version: axslVersion
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java 2021-11-10 17:22:53 UTC (rev 12022)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Country4a.java 2021-11-10 17:55:29 UTC (rev 12023)
@@ -394,7 +394,7 @@
* Registers country information.
* @param country The country to be registered.
*/
- public static void registerCountry(final Country4a country) {
+ private static void registerCountry(final Country4a country) {
Country4a.map2char.put(country.getAlpha2Code(), country);
Country4a.map3char.put(country.getAlpha3Code(), country);
Country4a.mapNumeric.put(country.getNumericCode(), country);
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 17:22:53 UTC (rev 12022)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Script4a.java 2021-11-10 17:55:29 UTC (rev 12023)
@@ -31,6 +31,9 @@
import org.axsl.common.i18n.Script;
+import com.ibm.icu.lang.UScript;
+
+import java.lang.Character.UnicodeScript;
import java.util.HashMap;
import java.util.Map;
@@ -41,10 +44,6 @@
*/
public final class Script4a implements Script {
- /** The script representing an undetermined value. */
- public static final Script4a UNDETERMINED = new Script4a("Zyyy", (short) 998, "Code for undetermined script",
- "codet pour écriture indéterminée");
-
/** The Latin script. */
public static final Script4a LATIN;
@@ -51,6 +50,13 @@
/** The Cyrillic script. */
public static final Script4a CYRILLIC;
+ /** Constant return value for {@link #getIcu4jCode()}. */
+ private static final byte ICU4J_NOT_SUPPORTED = -2;
+
+ /** The script representing an undetermined value. */
+ public static final Script4a UNDETERMINED = new Script4a("Zyyy", (short) 998, "Code for undetermined script",
+ "codet pour écriture indéterminée", ICU4J_NOT_SUPPORTED, null);
+
/** The initial size of the data structures. */
private static final int INITIAL_CAPACITY = 150;
@@ -62,140 +68,182 @@
/* Checkstyle: Allow Magic Numbers that are hard-coded data. */
static {
- Script4a.register(new Script4a("Arab", (short) 160, "Arabic", "arabe"));
- Script4a.register(new Script4a("Armn", (short) 230, "Armenian", "arménien"));
- Script4a.register(new Script4a("Bali", (short) 360, "Balinese", "balinais"));
- Script4a.register(new Script4a("Batk", (short) 365, "Batak", "batak"));
- Script4a.register(new Script4a("Beng", (short) 325, "Bengali", "bengalî"));
- Script4a.register(new Script4a("Blis", (short) 550, "Blissymbols", "symboles Bliss"));
- Script4a.register(new Script4a("Bopo", (short) 285, "Bopomofo", "bopomofo"));
- Script4a.register(new Script4a("Brah", (short) 300, "Brahmi", "brâhmî"));
- Script4a.register(new Script4a("Brai", (short) 570, "Braille", "braille"));
- Script4a.register(new Script4a("Bugi", (short) 367, "Buginese", "bouguis"));
- Script4a.register(new Script4a("Buhd", (short) 372, "Buhid", "bouhide"));
+ Script4a.register(new Script4a("Arab", (short) 160, "Arabic", "arabe", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Armn", (short) 230, "Armenian", "arménien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Bali", (short) 360, "Balinese", "balinais", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Batk", (short) 365, "Batak", "batak", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Beng", (short) 325, "Bengali", "bengalî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Blis", (short) 550, "Blissymbols", "symboles Bliss", ICU4J_NOT_SUPPORTED,
+ null));
+ Script4a.register(new Script4a("Bopo", (short) 285, "Bopomofo", "bopomofo", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Brah", (short) 300, "Brahmi", "brâhmî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Brai", (short) 570, "Braille", "braille", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Bugi", (short) 367, "Buginese", "bouguis", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Buhd", (short) 372, "Buhid", "bouhide", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
"Cans", (short) 440, "Unified Canadian Aboriginal Syllabics",
- "syllabaire autochtone canadien unifié"));
- Script4a.register(new Script4a("Cari", (short) 201, "Carian", "carien"));
- Script4a.register(new Script4a("Cham", (short) 358, "Cham", "cham (cam, tcham)"));
- Script4a.register(new Script4a("Cher", (short) 445, "Cherokee", "tchérokî"));
- Script4a.register(new Script4a("Cirt", (short) 291, "Cirth", "cirth"));
- Script4a.register(new Script4a("Copt", (short) 204, "Coptic", "copte"));
- Script4a.register(new Script4a("Cprt", (short) 403, "Cypriot", "syllabaire chypriote"));
- Script4a.register(new Script4a("Cyrl", (short) 220, "Cyrillic", "cyrillique"));
+ "syllabaire autochtone canadien unifié", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cari", (short) 201, "Carian", "carien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cham", (short) 358, "Cham", "cham (cam, tcham)", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cher", (short) 445, "Cherokee", "tchérokî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cirt", (short) 291, "Cirth", "cirth", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Copt", (short) 204, "Coptic", "copte", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cprt", (short) 403, "Cypriot", "syllabaire chypriote",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Cyrl", (short) 220, "Cyrillic", "cyrillique",
+ (byte) UScript.CYRILLIC, Character.UnicodeScript.CYRILLIC));
Script4a.register(new Script4a(
- "Cyrs", (short) 221, "Cyrillic (Old Church Slavonic variant)", "cyrillique (variante slavonne)"));
- Script4a.register(new Script4a("Deva", (short) 315, "Devanagari (Nagari)", "dévanâgarî"));
- Script4a.register(new Script4a("Dsrt", (short) 250, "Deseret (Mormon)", "déseret (mormon)"));
- Script4a.register(new Script4a("Egyd", (short) 070, "Egyptian demotic", "démotique égyptien"));
- Script4a.register(new Script4a("Egyh", (short) 060, "Egyptian hieratic", "hiératique égyptien"));
- Script4a.register(new Script4a("Egyp", (short) 050, "Egyptian hieroglyphs", "hiéroglyphes égyptiens"));
- Script4a.register(new Script4a("Ethi", (short) 430, "Ethiopic (Ge?ez)", "éthiopien (ge?ez, guèze)"));
+ "Cyrs", (short) 221, "Cyrillic (Old Church Slavonic variant)", "cyrillique (variante slavonne)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Deva", (short) 315, "Devanagari (Nagari)", "dévanâgarî",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Dsrt", (short) 250, "Deseret (Mormon)", "déseret (mormon)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Egyd", (short) 070, "Egyptian demotic", "démotique égyptien",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Egyh", (short) 060, "Egyptian hieratic", "hiératique égyptien",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Egyp", (short) 050, "Egyptian hieroglyphs", "hiéroglyphes égyptiens",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Ethi", (short) 430, "Ethiopic (Ge?ez)", "éthiopien (ge?ez, guèze)",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
"Geok", (short) 241, "Khutsuri (Asomtavruli and Nuskhuri)",
- "khoutsouri (assomtavrouli et nouskhouri)"));
- Script4a.register(new Script4a("Geor", (short) 240, "Georgian (Mkhedruli)", "géorgien (mkhédrouli)"));
- Script4a.register(new Script4a("Glag", (short) 225, "Glagolitic", "glagolitique"));
- Script4a.register(new Script4a("Goth", (short) 206, "Gothic", "gotique"));
- Script4a.register(new Script4a("Grek", (short) 200, "Greek", "grec"));
- Script4a.register(new Script4a("Gujr", (short) 320, "Gujarati", "goudjarâtî (gujrâtî)"));
- Script4a.register(new Script4a("Guru", (short) 310, "Gurmukhi", "gourmoukhî"));
- Script4a.register(new Script4a("Hang", (short) 286, "Hangul (Hangul, Hangeul)", "hangûl (hangul, hangeul)"));
- Script4a.register(new Script4a("Hani", (short) 500, "Han (Hanzi, Kanji, Hanja)", "idéogrammes han"));
- Script4a.register(new Script4a("Hano", (short) 371, "Hanunoo (Hanunóo)", "hanounóo"));
+ "khoutsouri (assomtavrouli et nouskhouri)", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Geor", (short) 240, "Georgian (Mkhedruli)", "géorgien (mkhédrouli)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Glag", (short) 225, "Glagolitic", "glagolitique", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Goth", (short) 206, "Gothic", "gotique", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Grek", (short) 200, "Greek", "grec", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Gujr", (short) 320, "Gujarati", "goudjarâtî (gujrâtî)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Guru", (short) 310, "Gurmukhi", "gourmoukhî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hang", (short) 286, "Hangul (Hangul, Hangeul)", "hangûl (hangul, hangeul)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hani", (short) 500, "Han (Hanzi, Kanji, Hanja)", "idéogrammes han",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hano", (short) 371, "Hanunoo (Hanunóo)", "hanounóo",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Hans", (short) 501, "Han (Simplified variant)", "idéogrammes han (variante simplifiée)"));
+ "Hans", (short) 501, "Han (Simplified variant)", "idéogrammes han (variante simplifiée)",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Hant", (short) 502, "Han (Traditional variant)", "idéogrammes han (variante traditionnelle)"));
- Script4a.register(new Script4a("Hebr", (short) 125, "Hebrew", "hébreu"));
- Script4a.register(new Script4a("Hira", (short) 410, "Hiragana", "hiragana"));
- Script4a.register(new Script4a("Hmng", (short) 450, "Pahawh Hmong", "pahawh hmong"));
+ "Hant", (short) 502, "Han (Traditional variant)", "idéogrammes han (variante traditionnelle)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hebr", (short) 125, "Hebrew", "hébreu", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hira", (short) 410, "Hiragana", "hiragana", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hmng", (short) 450, "Pahawh Hmong", "pahawh hmong", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Hrkt", (short) 412, "(alias for Hiragana + Katakana)", "(alias pour hiragana + katakana)"));
- Script4a.register(new Script4a("Hung", (short) 176, "Old Hungarian", "ancien hongrois"));
- Script4a.register(new Script4a("Inds", (short) 610, "Indus (Harappan)", "indus"));
+ "Hrkt", (short) 412, "(alias for Hiragana + Katakana)", "(alias pour hiragana + katakana)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Hung", (short) 176, "Old Hungarian", "ancien hongrois",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Inds", (short) 610, "Indus (Harappan)", "indus", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Ital", (short) 210, "Old Italic (Etruscan, Oscan, etc.)", "ancien italique (étrusque, osque, etc.)"));
+ "Ital", (short) 210, "Old Italic (Etruscan, Oscan, etc.)", "ancien italique (étrusque, osque, etc.)",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Java", (short) 361, "Javanese", "javanais"));
+ "Java", (short) 361, "Javanese", "javanais", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
"Jpan", (short) 413, "Japanese (alias for Han + Hiragana + Katakana)",
- "japonais (alias pour han + hiragana + katakana)"));
- Script4a.register(new Script4a("Kali", (short) 357, "Kayah Li", "kayah li"));
- Script4a.register(new Script4a("Kana", (short) 411, "Katakana", "katakana"));
- Script4a.register(new Script4a("Khar", (short) 305, "Kharoshthi", "kharochthî"));
- Script4a.register(new Script4a("Khmr", (short) 355, "Khmer", "khmer"));
- Script4a.register(new Script4a("Knda", (short) 345, "Kannada", "kannara (canara)"));
- Script4a.register(new Script4a("Lana", (short) 351, "Lanna", "lanna"));
- Script4a.register(new Script4a("Laoo", (short) 356, "Lao", "laotien"));
- Script4a.register(new Script4a("Latf", (short) 217, "Latin (Fraktur variant)", "latin (variante brisée)"));
- Script4a.register(new Script4a("Latg", (short) 216, "Latin (Gaelic variant)", "latin (variante gaélique)"));
- Script4a.register(new Script4a("Latn", (short) 215, "Latin", "latin"));
- Script4a.register(new Script4a("Lepc", (short) 335, "Lepcha (Róng)", "lepcha (róng)"));
- Script4a.register(new Script4a("Limb", (short) 336, "Limbu", "limbou"));
- Script4a.register(new Script4a("Lina", (short) 400, "Linear A", "linéaire A"));
- Script4a.register(new Script4a("Linb", (short) 401, "Linear B", "linéaire B"));
- Script4a.register(new Script4a("Lyci", (short) 202, "Lycian", "lycien"));
- Script4a.register(new Script4a("Lydi", (short) 116, "Lydian", "lydien"));
- Script4a.register(new Script4a("Mand", (short) 140, "Mandaean", "mandéen"));
- Script4a.register(new Script4a("Maya", (short) 90, "Mayan hieroglyphs", "hiéroglyphes mayas"));
- Script4a.register(new Script4a("Mero", (short) 100, "Meroitic", "méroïtique"));
- Script4a.register(new Script4a("Mlym", (short) 347, "Malayalam", "malayâlam"));
- Script4a.register(new Script4a("Mong", (short) 145, "Mongolian", "mongol"));
+ "japonais (alias pour han + hiragana + katakana)", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Kali", (short) 357, "Kayah Li", "kayah li", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Kana", (short) 411, "Katakana", "katakana", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Khar", (short) 305, "Kharoshthi", "kharochthî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Khmr", (short) 355, "Khmer", "khmer", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Knda", (short) 345, "Kannada", "kannara (canara)", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Lana", (short) 351, "Lanna", "lanna", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Laoo", (short) 356, "Lao", "laotien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Latf", (short) 217, "Latin (Fraktur variant)", "latin (variante brisée)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Latg", (short) 216, "Latin (Gaelic variant)", "latin (variante gaélique)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Latn", (short) 215, "Latin", "latin",
+ (byte) UScript.LATIN, Character.UnicodeScript.LATIN));
+ Script4a.register(new Script4a("Lepc", (short) 335, "Lepcha (Róng)", "lepcha (róng)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Limb", (short) 336, "Limbu", "limbou", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Lina", (short) 400, "Linear A", "linéaire A", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Linb", (short) 401, "Linear B", "linéaire B", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Lyci", (short) 202, "Lycian", "lycien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Lydi", (short) 116, "Lydian", "lydien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mand", (short) 140, "Mandaean", "mandéen", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Maya", (short) 90, "Mayan hieroglyphs", "hiéroglyphes mayas",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mero", (short) 100, "Meroitic", "méroïtique", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mlym", (short) 347, "Malayalam", "malayâlam", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mong", (short) 145, "Mongolian", "mongol", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Moon", (short) 218, "Moon (Moon code, Moon script, Moon type)", "écriture Moon"));
- Script4a.register(new Script4a("Mtei", (short) 337, "Meitei Mayek (Meithei, Meetei)", "meitei mayek"));
- Script4a.register(new Script4a("Mymr", (short) 350, "Myanmar (Burmese)", "birman"));
- Script4a.register(new Script4a("Nkoo", (short) 165, "N’Ko", "n’ko"));
- Script4a.register(new Script4a("Ogam", (short) 212, "Ogham", "ogam"));
- Script4a.register(new Script4a("Olck", (short) 261, "Ol Chiki (Ol Cemet’, Ol, Santali)", "ol tchiki"));
- Script4a.register(new Script4a("Orkh", (short) 175, "Orkhon", "orkhon"));
- Script4a.register(new Script4a("Orya", (short) 327, "Oriya", "oriyâ"));
- Script4a.register(new Script4a("Osma", (short) 260, "Osmanya", "osmanais"));
- Script4a.register(new Script4a("Perm", (short) 227, "Old Permic", "ancien permien"));
- Script4a.register(new Script4a("Phag", (short) 331, "Phags-pa", "’phags pa"));
- Script4a.register(new Script4a("Phnx", (short) 115, "Phoenician", "phénicien"));
- Script4a.register(new Script4a("Plrd", (short) 282, "Pollard Phonetic", "phonétique de Pollard"));
- Script4a.register(new Script4a("Rjng", (short) 363, "Rejang, Redjang, Kaganga", "redjang"));
- Script4a.register(new Script4a("Roro", (short) 620, "Rongorongo", "rongorongo"));
- Script4a.register(new Script4a("Runr", (short) 211, "Runic", "runique"));
- Script4a.register(new Script4a("Sara", (short) 292, "Sarati", "sarati"));
- Script4a.register(new Script4a("Saur", (short) 344, "Saurashtra", "saurachtra"));
- Script4a.register(new Script4a("Sgnw", (short) 95, "SignWriting", "SignÉcriture, SignWriting"));
- Script4a.register(new Script4a("Shaw", (short) 281, "Shavian (Shaw)", "shavien (Shaw)"));
- Script4a.register(new Script4a("Sinh", (short) 348, "Sinhala", "singhalais"));
- Script4a.register(new Script4a("Sund", (short) 362, "Sundanese", "sundanais"));
- Script4a.register(new Script4a("Sylo", (short) 316, "Syloti Nagri", "sylotî nâgrî"));
- Script4a.register(new Script4a("Syrc", (short) 135, "Syriac", "syriaque"));
+ "Moon", (short) 218, "Moon (Moon code, Moon script, Moon type)", "écriture Moon",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mtei", (short) 337, "Meitei Mayek (Meithei, Meetei)", "meitei mayek",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Mymr", (short) 350, "Myanmar (Burmese)", "birman", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Nkoo", (short) 165, "N’Ko", "n’ko", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Ogam", (short) 212, "Ogham", "ogam", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Olck", (short) 261, "Ol Chiki (Ol Cemet’, Ol, Santali)", "ol tchiki",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Orkh", (short) 175, "Orkhon", "orkhon", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Orya", (short) 327, "Oriya", "oriyâ", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Osma", (short) 260, "Osmanya", "osmanais", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Perm", (short) 227, "Old Permic", "ancien permien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Phag", (short) 331, "Phags-pa", "’phags pa", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Phnx", (short) 115, "Phoenician", "phénicien", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Plrd", (short) 282, "Pollard Phonetic", "phonétique de Pollard",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Rjng", (short) 363, "Rejang, Redjang, Kaganga", "redjang",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Roro", (short) 620, "Rongorongo", "rongorongo", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Runr", (short) 211, "Runic", "runique", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Sara", (short) 292, "Sarati", "sarati", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Saur", (short) 344, "Saurashtra", "saurachtra", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Sgnw", (short) 95, "SignWriting", "SignÉcriture, SignWriting",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Shaw", (short) 281, "Shavian (Shaw)", "shavien (Shaw)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Sinh", (short) 348, "Sinhala", "singhalais", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Sund", (short) 362, "Sundanese", "sundanais", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Sylo", (short) 316, "Syloti Nagri", "sylotî nâgrî", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Syrc", (short) 135, "Syriac", "syriaque", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Syre", (short) 138, "Syriac (Estrangelo variant)", "syriaque (variante estranghélo)"));
+ "Syre", (short) 138, "Syriac (Estrangelo variant)", "syriaque (variante estranghélo)",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Syrj", (short) 137, "Syriac (Western variant)", "syriaque (variante occidentale)"));
+ "Syrj", (short) 137, "Syriac (Western variant)", "syriaque (variante occidentale)",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Syrn", (short) 136, "Syriac (Eastern variant)", "syriaque (variante orientale)"));
- Script4a.register(new Script4a("Tagb", (short) 373, "Tagbanwa", "tagbanoua"));
- Script4a.register(new Script4a("Tale", (short) 353, "Tai Le", "taï-le"));
- Script4a.register(new Script4a("Talu", (short) 354, "New Tai Lue", "nouveau taï-lue"));
- Script4a.register(new Script4a("Taml", (short) 346, "Tamil", "tamoul"));
- Script4a.register(new Script4a("Telu", (short) 340, "Telugu", "télougou"));
- Script4a.register(new Script4a("Teng", (short) 290, "Tengwar", "tengwar"));
- Script4a.register(new Script4a("Tfng", (short) 120, "Tifinagh (Berber)", "tifinagh (berbère)"));
- Script4a.register(new Script4a("Tglg", (short) 370, "Tagalog", "tagal"));
- Script4a.register(new Script4a("Thaa", (short) 170, "Thaana", "thâna"));
- Script4a.register(new Script4a("Thai", (short) 352, "Thai", "thaï"));
- Script4a.register(new Script4a("Tibt", (short) 330, "Tibetan", "tibétain"));
- Script4a.register(new Script4a("Ugar", (short) 040, "Ugaritic", "ougaritique"));
- Script4a.register(new Script4a("Vaii", (short) 470, "Vai", "vaï"));
- Script4a.register(new Script4a("Visp", (short) 280, "Visible Speech", "parole visible"));
- Script4a.register(new Script4a("Xpeo", (short) 030, "Old Persian", "cunéiforme persépolitain"));
+ "Syrn", (short) 136, "Syriac (Eastern variant)", "syriaque (variante orientale)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Tagb", (short) 373, "Tagbanwa", "tagbanoua", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Tale", (short) 353, "Tai Le", "taï-le", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Talu", (short) 354, "New Tai Lue", "nouveau taï-lue",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Taml", (short) 346, "Tamil", "tamoul", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Telu", (short) 340, "Telugu", "télougou", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Teng", (short) 290, "Tengwar", "tengwar", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Tfng", (short) 120, "Tifinagh (Berber)", "tifinagh (berbère)",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Tglg", (short) 370, "Tagalog", "tagal", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Thaa", (short) 170, "Thaana", "thâna", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Thai", (short) 352, "Thai", "thaï", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Tibt", (short) 330, "Tibetan", "tibétain", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Ugar", (short) 040, "Ugaritic", "ougaritique", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Vaii", (short) 470, "Vai", "vaï", ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Visp", (short) 280, "Visible Speech", "parole visible",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Xpeo", (short) 030, "Old Persian", "cunéiforme persépolitain",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Xsux", (short) 020, "Cuneiform, Sumero-Akkadian", "cunéiforme suméro-akkadien"));
- Script4a.register(new Script4a("Yiii", (short) 460, "Yi", "yi"));
+ "Xsux", (short) 020, "Cuneiform, Sumero-Akkadian", "cunéiforme suméro-akkadien",
+ ICU4J_NOT_SUPPORTED, null));
+ Script4a.register(new Script4a("Yiii", (short) 460, "Yi", "yi", ICU4J_NOT_SUPPORTED, null));
Script4a.register(new Script4a(
- "Zxxx", (short) 997, "Code for unwritten languages", "codet pour les langues non écrites"));
+ "Zxxx", (short) 997, "Code for unwritten languages", "codet pour les langues non écrites",
+ ICU4J_NOT_SUPPORTED, null));
Script4a.register(Script4a.UNDETERMINED);
Script4a.register(new Script4a(
- "Zzzz", (short) 999, "Code for uncoded script", "codet pour écriture non codée"));
+ "Zzzz", (short) 999, "Code for uncoded script", "codet pour écriture non codée",
+ ICU4J_NOT_SUPPORTED, null));
LATIN = Script4a.findFromAlpha("Latn");
CYRILLIC = Script4a.findFromAlpha("Cyrl");
@@ -202,31 +250,41 @@
}
/* Checkstyle: Restart Magic Number checking. */
- /** The English name of this script. */
+ /** The ISO-15924 English name of this script. */
private String englishName;
- /** The French name of this script. */
+ /** The ISO-15924 French name of this script. */
private String frenchName;
- /** The alpha code for this script. */
+ /** The ISO-15924 alpha code for this script. */
private String alphaCode;
- /** The numeric code for this script. */
+ /** The ISO-15924 numeric code for this script. */
private short numericCode;
+ /** The ICU4J code for this script. */
+ private byte icu4jCode;
+
+ /** The Unicode script corresponding to this ISO-15924 script. */
+ private UnicodeScript unicodeScript;
+
/**
* Constructor.
- * @param alpha The alpha code for this script.
- * @param numeric The numeric code for this script.
- * @param englishName The English name of this script.
- * @param frenchName The French name of this script.
+ * @param alpha The ISO-15924 alpha code for this script.
+ * @param numeric The ISO-15924 numeric code for this script.
+ * @param englishName The ISO-15924 English name of this script.
+ * @param frenchName The ISO-15924 French name of this script.
+ * @param icu4jCode The ICU4J code for this script.
+ * @param unicodeScript The Unicode script corresponding to this ISO-15924 script.
*/
- private Script4a(final String alpha, final short numeric,
- final String englishName, final String frenchName) {
+ private Script4a(final String alpha, final short numeric, final String englishName, final String frenchName,
+ final byte icu4jCode, final UnicodeScript unicodeScript) {
this.alphaCode = alpha;
this.numericCode = numeric;
this.englishName = englishName;
this.frenchName = frenchName;
+ this.icu4jCode = icu4jCode;
+ this.unicodeScript = unicodeScript;
}
@Override
@@ -249,6 +307,16 @@
return this.frenchName;
}
+ @Override
+ public int getIcu4jCode() {
+ return this.icu4jCode;
+ }
+
+ @Override
+ public UnicodeScript getUnicodeScript() {
+ return this.unicodeScript;
+ }
+
/**
* Registers script information.
* @param script The script to be registered.
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-10 17:22:56
|
Revision: 12022
http://sourceforge.net/p/foray/code/12022
Author: victormote
Date: 2021-11-10 17:22:53 +0000 (Wed, 10 Nov 2021)
Log Message:
-----------
Minor doc cleanup.
Modified Paths:
--------------
trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Orthography4a.java
Modified: trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Orthography4a.java
===================================================================
--- trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Orthography4a.java 2021-11-09 13:50:21 UTC (rev 12021)
+++ trunk/foray/foray-common/src/main/java/org/foray/common/i18n/Orthography4a.java 2021-11-10 17:22:53 UTC (rev 12022)
@@ -200,7 +200,7 @@
* Indicates whether the language in the orthography contains a valid 3-character alpha code.
* @param orthography The orthography to be tested.
* Note that this does not need to be an instance of this class, but can be any instance of {@link Orthography}.
- * @return True if and only if the orthography contains a valid 3-character alpha code.
+ * @return True if and only if the orthography contains a language with a valid 3-character alpha code.
*/
public static boolean is3CharacterLanguageCodeValid(final Orthography orthography) {
if (orthography == null
@@ -212,10 +212,10 @@
}
/**
- * Indicates whether the language in the orthography contains a valid 3-character alpha code.
+ * Indicates whether the country in the orthography contains a valid 3-character alpha code.
* @param orthography The orthography to be tested.
* Note that this does not need to be an instance of this class, but can be any instance of {@link Orthography}.
- * @return True if and only if the orthography contains a valid 3-character alpha code.
+ * @return True if and only if the orthography contains a country with a valid 3-character alpha code.
*/
public static boolean is3CharacterCountryCodeValid(final Orthography orthography) {
if (orthography == null
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-09 13:50:24
|
Revision: 12021
http://sourceforge.net/p/foray/code/12021
Author: victormote
Date: 2021-11-09 13:50:21 +0000 (Tue, 09 Nov 2021)
Log Message:
-----------
Update main English orthography.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-09 12:56:37 UTC (rev 12020)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-09 13:50:21 UTC (rev 12021)
@@ -2279,8 +2279,8 @@
<w><t>ad-vi-so-ry</t></w>
<w><t>ad-vo-caat</t></w>
<w><t>ad-vo-ca-cy</t></w>
-<w><t>ad-vo-cate</t></w>
-<w><t>Ad-vo-cate De-pute</t></w>
+<w><t>ad-vo-cate</t><noun number="pluralizable" convertible-to-possessive="true"/><verb/></w>
+<phrase><t>Ad-vo-cate De-pute</t></phrase>
<w><t>ad-vo-cat-ed</t></w>
<w><t>ad-vo-cat-ing</t></w>
<w><t>ad-vo-ca-tion</t></w>
@@ -34402,17 +34402,17 @@
<w><t>cours-es</t></w>
<w><t>cours-ing</t></w>
<w><t>Court</t></w>
-<w><t>court</t></w>
-<w><t>court cir-cu-lar</t></w>
-<w><t>court mar-tial</t></w>
-<w><t>Court of Ap-peal</t></w>
-<w><t>Court of Ex-cheq-uer</t></w>
-<w><t>court of hon-or</t></w>
-<w><t>court of in-quir-y</t></w>
-<w><t>Court of Jus-ti-ci-ar-y</t></w>
-<w><t>Court of Ses-sion</t></w>
-<w><t>court plas-ter</t></w>
-<w><t>court ten-nis</t></w>
+<w><t>court</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
+<phrase><t>court cir-cu-lar</t></phrase>
+<phrase><t>court mar-tial</t></phrase>
+<phrase><t>Court of Ap-peal</t></phrase>
+<phrase><t>Court of Ex-cheq-uer</t></phrase>
+<phrase><t>court of hon-or</t></phrase>
+<phrase><t>court of in-quir-y</t></phrase>
+<phrase><t>Court of Jus-ti-ci-ar-y</t></phrase>
+<phrase><t>Court of Ses-sion</t></phrase>
+<phrase><t>court plas-ter</t></phrase>
+<phrase><t>court ten-nis</t></phrase>
<w><t>court=bar-on</t></w>
<w><t>court=bouil-lon</t></w>
<w><t>court=mar-tial</t></w>
@@ -47277,12 +47277,12 @@
<w><t>ef-flu-vi-al</t></w>
<w><t>ef-flu-vi-um</t></w>
<w><t>ef-flux</t></w>
-<w><t>ef-fort</t></w>
-<w><t>ef-fort-ful</t></w>
-<w><t>ef-fort-ful-ly</t></w>
-<w><t>ef-fort-less</t></w>
-<w><t>ef-fort-less-ly</t></w>
-<w><t>ef-fort-less-ness</t></w>
+<w><t>ef-fort</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>ef-fort-ful</t><adjective/></w>
+<w><t>ef-fort-ful-ly</t><adverb/></w>
+<w><t>ef-fort-less</t><adjective/></w>
+<w><t>ef-fort-less-ly</t><adverb/></w>
+<w><t>ef-fort-less-ness</t><noun number="singular"/></w>
<w><t>ef-frac-tion</t></w>
<w><t>ef-frac-tor</t></w>
<w><t>ef-fron-ter-ies</t></w>
@@ -62820,9 +62820,9 @@
<w><t>go-by</t></w>
<w><t>GOC</t></w>
<w><t>Go-clen-i-us</t></w>
-<w><t>God</t></w>
-<w><t>god</t></w>
-<w><t>God's a-cre</t></w>
+<w><t>God</t><noun number="singular" convertible-to-possessive="true"/></w>
+<w><t>god</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<phrase><t>God's a-cre</t></phrase>
<w><t>God=aw-ful</t></w>
<w><t>god=fear-ing</t></w>
<w><t>God=fear-ing</t></w>
@@ -91941,7 +91941,7 @@
<w><t>ma-qui</t></w>
<w><t>ma-quill-age</t></w>
<w><t>ma-quis</t></w>
-<w><t>mar</t></w>
+<w><t>mar</t><verb regular-root="true"/></w>
<w><t>Mar</t></w>
<w><t>Ma-r</t></w>
<w><t>Mar del Pla-ta</t></w>
@@ -92342,7 +92342,7 @@
<w><t>mar-riage-a-ble-ness</t></w>
<w><t>mar-ried</t></w>
<w><t>mar-ried-ly</t></w>
-<w><t>mar-ring</t></w>
+<!--<w><t>mar-ring</t></w>-->
<w><t>mar-ron</t></w>
<w><t>mar-rons gla-c</t></w>
<w><t>mar-row</t></w>
Modified: trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml 2021-11-09 12:56:37 UTC (rev 12020)
+++ trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml 2021-11-09 13:50:21 UTC (rev 12021)
@@ -23,7 +23,7 @@
<match>^([a-zA-Z\-]+)’s$</match>
<replace>$1</replace>
<derivative-rule>
- <noun regular-root="true"/>
+ <noun convertible-to-possessive="true"/>
<derivative-type type="possessive"/>
</derivative-rule>
</derivative-pattern>
@@ -31,7 +31,7 @@
<match>^([a-zA-Z\-]+)ies$</match>
<replace>$1y</replace>
<derivative-rule>
- <noun regular-root="true"/>
+ <noun number="pluralizable"/>
<derivative-type type="plural"/>
</derivative-rule>
<derivative-rule>
@@ -38,6 +38,10 @@
<verb regular-root="true"/>
<derivative-type type="verb-form" desc="3rd person singular, present tense"/>
</derivative-rule>
+ <derivative-rule>
+ <cardinal/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
</derivative-pattern>
<derivative-pattern desc="ends with /-ied/, stem ends with /y/">
<match>^([a-zA-Z\-]+)ied$</match>
@@ -52,7 +56,7 @@
<match>^([a-zA-Z\-]+)([sxz]|sh|ch)es$</match>
<replace>$1$2</replace>
<derivative-rule>
- <noun regular-root="true"/>
+ <noun number="pluralizable"/>
<derivative-type type="plural"/>
</derivative-rule>
<derivative-rule>
@@ -59,12 +63,16 @@
<verb regular-root="true"/>
<derivative-type type="verb-form" desc="3rd person singular present tense"/>
</derivative-rule>
+ <derivative-rule>
+ <cardinal/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
</derivative-pattern>
<derivative-pattern desc="ends with /-s/">
<match>^([a-zA-Z\-]+)s$</match>
<replace>$1</replace>
<derivative-rule>
- <noun regular-root="true"/>
+ <noun number="pluralizable"/>
<derivative-type type="plural"/>
</derivative-rule>
<derivative-rule>
@@ -72,10 +80,23 @@
<derivative-type type="plural"/>
</derivative-rule>
<derivative-rule>
+ <cardinal/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
+ <derivative-rule>
<verb regular-root="true"/>
<derivative-type type="verb-form" desc="3rd person singular present tense"/>
</derivative-rule>
</derivative-pattern>
+ <derivative-pattern desc="ends with double consonant, then /-ed/">
+ <match>^([a-zA-Z\-]+)([bcdfgklmnprstvz])(\2)ed$</match>
+ <replace>$1$2</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="past tense"/>
+ <derivative-type type="past-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
<derivative-pattern desc="ends with /-ed/">
<match>^([a-zA-Z\-]+)ed$</match>
<replace>$1</replace>
@@ -94,6 +115,14 @@
<derivative-type type="past-participle" desc="past tense"/>
</derivative-rule>
</derivative-pattern>
+ <derivative-pattern desc="ends with double consonant, then /-ing/">
+ <match>^([a-zA-Z\-]+)([bcdfgklmnprstvz])(\2)ing$</match>
+ <replace>$1$2</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="present-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
<derivative-pattern desc="ends with /-ing/, stem ends with silent /e/">
<match>^([a-zA-Z\-]+)ing$</match>
<replace>$1e</replace>
@@ -114,7 +143,7 @@
<match>^([a-zA-Z\-]+)er$</match>
<replace>$1</replace>
<derivative-rule>
- <adjective regular-root="true"/>
+ <adjective extensible="true"/>
<derivative-type type="comparative" desc="single-syllable root"/>
</derivative-rule>
</derivative-pattern>
@@ -122,7 +151,7 @@
<match>^([a-zA-Z\-]+)est$</match>
<replace>$1</replace>
<derivative-rule>
- <adjective regular-root="true"/>
+ <adjective extensible="true"/>
<derivative-type type="superlative" desc="single-syllable root"/>
</derivative-rule>
</derivative-pattern>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-09 12:56:40
|
Revision: 12020
http://sourceforge.net/p/foray/code/12020
Author: victormote
Date: 2021-11-09 12:56:37 +0000 (Tue, 09 Nov 2021)
Log Message:
-----------
Remove more element content from spell checker.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-09 12:16:18 UTC (rev 12019)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-09 12:56:37 UTC (rev 12020)
@@ -33,6 +33,7 @@
import org.foray.common.i18n.Orthography4a;
import org.foray.common.i18n.Script4a;
import org.foray.common.primitive.ObjectUtils;
+import org.foray.common.primitive.StringUtils;
import org.foray.common.primitive.XmlUtils;
import org.foray.hyphen.HyphenationServer4a;
import org.foray.hyphen.HyphenationServerConfig;
@@ -149,7 +150,10 @@
* entities. */
// private EntityResolver entityResolver;
- /** A reusable buffer. */
+ /** Flag indicating whether text parsing is active. Some elements contain text that should not be accumulated. */
+ private boolean textParsingActive;
+
+ /** A reusable buffer used to accumulate the parsed text content. */
private StringBuilder charBuffer = new StringBuilder();
/** The element stack. */
@@ -165,7 +169,7 @@
private HyphenationServer4a server;
/** The list of elements having no content but that can be placed in the middle of a word. */
- private List<String> elementIgnoreList = Arrays.asList(new String[] {"Page"});
+ private List<String> elementIgnoreList = Arrays.asList(new String[] {"Page", "MendOut", "Comment", "ToDo"});
/** The list of dictionaries that are currently active, i.e. that match the current orthography. */
private List<Dictionary> currentDictionaries = new ArrayList<Dictionary>();
@@ -296,6 +300,7 @@
@Override
public void startDocument() throws SAXException {
+ this.textParsingActive = true;
}
@@ -313,11 +318,6 @@
@Override
public void startElement(final String uri, final String localName, final String qName, final Attributes attributes)
throws SAXException {
- final Element element = new Element();
- element.namespace = uri;
- element.localName = localName;
- element.qName = qName;
-
/* Some elements, having no content, can be placed in the middle of a word, making it look like that word is
* two words. Ignore such elements. */
if (this.elementIgnoreList.contains(localName)) {
@@ -324,6 +324,16 @@
return;
}
+ final Element element = new Element();
+ element.namespace = uri;
+ element.localName = localName;
+ element.qName = qName;
+
+ parseOrthography(attributes, element);
+ this.elementStack.push(element);
+ }
+
+ private void parseOrthography(final Attributes attributes, final Element element) {
String languageAttr = null;
languageAttr = attributes.getValue("xml:lang");
if (languageAttr == null) {
@@ -370,7 +380,6 @@
}
}
- this.elementStack.push(element);
}
@@ -395,7 +404,7 @@
checkWords(words);
/* Clear the character buffer. */
- this.charBuffer.delete(0, this.charBuffer.length());
+ StringUtils.clear(this.charBuffer);
/* This element should match the top of the element stack. Pop it. */
if (element.matches(uri, localName, qName)) {
@@ -466,7 +475,9 @@
@Override
public void characters(final char[] buffer, final int offset, final int length) {
- this.charBuffer.append(buffer, offset, length);
+ if (this.textParsingActive) {
+ this.charBuffer.append(buffer, offset, length);
+ }
}
/**
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-09 12:16:20
|
Revision: 12019
http://sourceforge.net/p/foray/code/12019
Author: victormote
Date: 2021-11-09 12:16:18 +0000 (Tue, 09 Nov 2021)
Log Message:
-----------
Normal dictionary editing.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-09 03:05:48 UTC (rev 12018)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-09 12:16:18 UTC (rev 12019)
@@ -26569,9 +26569,9 @@
<w><t>chief-dom</t></w>
<w><t>chief-less</t></w>
<w><t>chief-ly</t></w>
-<w><t>chief-tain</t></w>
-<w><t>chief-tain-cy</t></w>
-<w><t>chief-tain-ship</t></w>
+<w><t>chief-tain</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>chief-tain-cy</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>chief-tain-ship</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>chield</t></w>
<w><t>Chieng-mai</t></w>
<w><t>Chieng-rai</t></w>
@@ -28801,7 +28801,7 @@
<w><t>climb-ing i-rons</t></w>
<w><t>climb-ing-fish</t></w>
<w><t>climb-ing-fish-es</t></w>
-<w><t>clime</t></w>
+<w><t>clime</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>cli-nah</t></w>
<w><t>clin-al</t></w>
<w><t>clin-al-ly</t></w>
@@ -40458,13 +40458,13 @@
<w><t>de-sen-si-tiz-ing</t></w>
<w><t>des-ert</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>de-sert</t><noun/><verb regular-root="true"/></w>
-<w><t>des-ert boots</t></w>
-<w><t>des-ert cool-er</t></w>
-<w><t>des-ert is-land</t></w>
-<w><t>des-ert lynx</t></w>
-<w><t>des-ert pea</t></w>
-<w><t>des-ert rat</t></w>
-<w><t>des-ert soil</t></w>
+<phrase><t>des-ert boots</t></phrase>
+<phrase><t>des-ert cool-er</t></phrase>
+<phrase><t>des-ert is-land</t></phrase>
+<phrase><t>des-ert lynx</t></phrase>
+<phrase><t>des-ert pea</t></phrase>
+<phrase><t>des-ert rat</t></phrase>
+<phrase><t>des-ert soil</t></phrase>
<w><t>de-sert-ed</t><adjective/></w>
<w><t>de-sert-ed-ly</t></w>
<w><t>de-sert-ed-ness</t></w>
@@ -40473,14 +40473,14 @@
<w><t>des-er-tic-o-lous</t></w>
<w><t>de-ser-tion</t></w>
<w><t>des-ert-like</t></w>
-<w><t>de-serve</t></w>
-<w><t>de-served</t></w>
-<w><t>de-serv-ed-ly</t></w>
+<w><t>de-serve</t><verb regular-root="true"/></w>
+<w><t>de-served</t><adjective/></w>
+<w><t>de-serv-ed-ly</t><adverb/></w>
<w><t>de-serv-ed-ness</t></w>
<w><t>de-serv-er</t></w>
-<w><t>de-serv-ing</t></w>
-<w><t>de-serv-ing-ly</t></w>
-<w><t>de-serv-ing-ness</t></w>
+<w><t>de-serv-ing</t><adjective/></w>
+<w><t>de-serv-ing-ly</t><adverb/></w>
+<w><t>de-serv-ing-ness</t><noun number="singular" convertible-to-possessive="false"/></w>
<w><t>de-sex</t></w>
<w><t>de-sex-u-al-ise</t></w>
<w><t>de-sex-u-al-ize</t></w>
@@ -47989,7 +47989,6 @@
<w><t>el-e-va-tor</t></w>
<w><t>e-lev-en</t><cardinal/></w>
<w><t>e-lev-en=plus</t></w>
-<w><t>e-lev-ens</t></w>
<w><t>e-lev-ens-es</t></w>
<w><t>e-lev-en-ses</t></w>
<w><t>e-lev-enth</t><ordinal/></w>
@@ -49389,7 +49388,7 @@
<w><t>en-joy-a-bly</t></w>
<w><t>en-joy-er</t></w>
<w><t>en-joy-ing-ly</t></w>
-<w><t>en-joy-ment</t></w>
+<w><t>en-joy-ment</t><noun number="pluralizable"/></w>
<w><t>en-keph-a-lin</t></w>
<w><t>En-ki</t></w>
<w><t>En-ki-du</t></w>
@@ -50805,7 +50804,7 @@
<w><t>er-ro-ne-ous</t></w>
<w><t>er-ro-ne-ous-ly</t></w>
<w><t>er-ro-ne-ous-ness</t></w>
-<w><t>er-ror</t></w>
+<w><t>er-ror</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>er-ror of clo-sure</t></w>
<w><t>er-ror-less</t></w>
<w><t>ers</t></w>
@@ -55342,25 +55341,7 @@
<w><t>fifth-ly</t></w>
<w><t>fif-ti-eth</t><ordinal/></w>
<w><t>fif-ty</t><cardinal/></w>
-<w><t>fif-ty=eight</t></w>
-<w><t>fif-ty=eighth</t></w>
-<w><t>fif-ty=fifth</t></w>
<w><t>fif-ty=fif-ty</t></w>
-<w><t>fif-ty=first</t></w>
-<w><t>fif-ty=five</t></w>
-<w><t>fif-ty=four</t></w>
-<w><t>fif-ty=fourth</t></w>
-<w><t>fif-ty=nine</t></w>
-<w><t>fif-ty=ninth</t></w>
-<w><t>fif-ty=one</t></w>
-<w><t>fif-ty=sec-ond</t></w>
-<w><t>fif-ty=sev-en</t></w>
-<w><t>fif-ty=sev-enth</t></w>
-<w><t>fif-ty=six</t></w>
-<w><t>fif-ty=sixth</t></w>
-<w><t>fif-ty=third</t></w>
-<w><t>fif-ty=three</t></w>
-<w><t>fif-ty=two</t></w>
<w><t>fif-ty-pen-ny</t></w>
<w><t>fifty-ty=fif-ty</t></w>
<w><t>fig</t></w>
@@ -56001,7 +55982,6 @@
<w><t>five-pen-ny</t></w>
<w><t>five-pins</t></w>
<w><t>fiv-er</t></w>
-<w><t>fives</t></w>
<w><t>fix</t></w>
<w><t>fix-a-ble</t></w>
<w><t>fix-ate</t></w>
@@ -57722,7 +57702,7 @@
<w><t>for-lorn-ly</t></w>
<w><t>for-lorn-ness</t></w>
<w><t>For-lì</t></w>
-<w><t>form</t></w>
+<w><t>form</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>Form</t></w>
<w><t>form ge-nus</t></w>
<w><t>form let-ter</t></w>
@@ -59244,13 +59224,9 @@
<w><t>Ful-bert</t></w>
<w><t>Ful-bright</t></w>
<w><t>ful-crum</t></w>
-<w><t>ful-fil</t></w>
-<w><t>ful-fill</t></w>
-<w><t>ful-filled</t></w>
+<w><t>ful-filled</t><adjective/></w>
<w><t>ful-fill-er</t></w>
-<w><t>ful-fil-ling</t></w>
-<w><t>ful-fill-ment</t></w>
-<w><t>ful-fil-ment</t></w>
+<w><t>ful-fill-ing</t><adjective/></w>
<w><t>Ful-gen-cio</t></w>
<w><t>ful-gent</t></w>
<w><t>ful-gent-ly</t></w>
@@ -67074,7 +67050,7 @@
<w><t>hard-pan</t></w>
<w><t>hards</t></w>
<w><t>hard-scrab-ble</t></w>
-<w><t>hard-ship</t></w>
+<w><t>hard-ship</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>hard-tack</t></w>
<w><t>hard-tail</t></w>
<w><t>hard-top</t></w>
@@ -78020,14 +77996,14 @@
<w><t>in-ter-ep-i-dem-ic</t></w>
<w><t>in-ter-ep-i-the-li-al</t></w>
<w><t>in-ter-e-qui-noc-tial</t></w>
-<w><t>in-ter-est</t></w>
-<w><t>in-ter-est-ed</t></w>
-<w><t>in-ter-est-ed-ly</t></w>
-<w><t>in-ter-est-ed-ness</t></w>
+<w><t>in-ter-est</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
+<w><t>in-ter-est-ed</t><adjective/></w>
+<w><t>in-ter-est-ed-ly</t><adverb/></w>
+<w><t>in-ter-est-ed-ness</t><noun number="singular"/></w>
<w><t>in-ter-es-ter-i-fi-ca-tion</t></w>
-<w><t>in-ter-est-ing</t></w>
-<w><t>in-ter-est-ing-ly</t></w>
-<w><t>in-ter-est-ing-ness</t></w>
+<w><t>in-ter-est-ing</t><adjective/></w>
+<w><t>in-ter-est-ing-ly</t><adverb/></w>
+<w><t>in-ter-est-ing-ness</t><noun number="singular"/></w>
<w><t>in-ter-es-tu-a-rine</t></w>
<w><t>in-ter-face</t></w>
<w><t>in-ter-faced</t></w>
@@ -87584,18 +87560,18 @@
<w><t>lieu-ten-ant gov-er-nor</t></w>
<w><t>lieve</t></w>
<w><t>Li-far</t></w>
-<w><t>life</t></w>
-<w><t>life as-sur-ance</t></w>
-<w><t>life cy-cle</t></w>
-<w><t>life es-tate</t></w>
-<w><t>life ex-pec-tan-cy</t></w>
-<w><t>life his-to-ry</t></w>
-<w><t>life in-stinct</t></w>
-<w><t>life in-ter-est</t></w>
-<w><t>life jack-et</t></w>
-<w><t>life pre-serv-er</t></w>
-<w><t>life sci-ence</t></w>
-<w><t>life ta-ble</t></w>
+<w><t>life</t><noun number="singular" convertible-to-possessive="true"/></w>
+<phrase><t>life as-sur-ance</t></phrase>
+<phrase><t>life cy-cle</t></phrase>
+<phrase><t>life es-tate</t></phrase>
+<phrase><t>life ex-pec-tan-cy</t></phrase>
+<phrase><t>life his-to-ry</t></phrase>
+<phrase><t>life in-stinct</t></phrase>
+<phrase><t>life in-ter-est</t></phrase>
+<phrase><t>life jack-et</t></phrase>
+<phrase><t>life pre-serv-er</t></phrase>
+<phrase><t>life sci-ence</t></phrase>
+<phrase><t>life ta-ble</t></phrase>
<w><t>life=giv-ing</t></w>
<w><t>life=sav-er</t></w>
<w><t>life=sup-port sys-tem</t></w>
@@ -88516,9 +88492,9 @@
<w><t>liv-a-bil-i-ty</t></w>
<w><t>liv-a-ble</t></w>
<w><t>liv-a-ble-ness</t></w>
-<w><t>live</t></w>
-<w><t>live cen-tre</t></w>
-<w><t>live to-geth-er</t></w>
+<w><t>live</t><verb regular-root="true"/><adjective extensible="false"/><adverb/></w>
+<phrase><t>live cen-tre</t></phrase>
+<phrase><t>live to-geth-er</t></phrase>
<w><t>live=bear-er</t></w>
<w><t>live=for-ev-er</t></w>
<w><t>live-a-bil-i-ty</t></w>
@@ -88538,11 +88514,11 @@
<w><t>live-ness</t></w>
<w><t>Li-ven-za</t></w>
<w><t>liv-er</t></w>
-<w><t>liv-er ex-tract</t></w>
-<w><t>liv-er fluke</t></w>
-<w><t>liv-er of sul-phur</t></w>
-<w><t>liv-er salts</t></w>
-<w><t>liv-er sau-sage</t></w>
+<phrase><t>liv-er ex-tract</t></phrase>
+<phrase><t>liv-er fluke</t></phrase>
+<phrase><t>liv-er of sul-phur</t></phrase>
+<phrase><t>liv-er salts</t></phrase>
+<phrase><t>liv-er sau-sage</t></phrase>
<w><t>liv-er=rot</t></w>
<w><t>liv-er-ber-ry</t></w>
<w><t>liv-er-ied</t></w>
@@ -88560,7 +88536,7 @@
<w><t>liv-er-y com-pa-ny</t></w>
<w><t>liv-er-y sta-ble</t></w>
<w><t>liv-er-y-man</t></w>
-<w><t>lives</t></w>
+<w><t>lives</t><noun number="plural"/></w>
<w><t>liv-est</t></w>
<w><t>live-stock</t></w>
<w><t>live-ware</t></w>
@@ -91180,10 +91156,10 @@
<w><t>Ma-kar-i-os III</t></w>
<w><t>Ma-kas-ar</t></w>
<w><t>Ma-kas-sar</t></w>
-<w><t>make</t></w>
-<w><t>make a-way</t></w>
-<w><t>make be-lieve</t></w>
-<w><t>make o-ver</t></w>
+<w><t>make</t><noun number="pluralizable" convertible-to-possessive="true"/><verb/></w>
+<phrase><t>make a-way</t></phrase>
+<phrase><t>make be-lieve</t></phrase>
+<phrase><t>make o-ver</t></phrase>
<w><t>make=be-lieve</t></w>
<w><t>make=read-y</t></w>
<w><t>make=up</t></w>
@@ -91193,6 +91169,7 @@
<w><t>make-less</t></w>
<w><t>Mak-er</t></w>
<w><t>mak-er</t></w>
+<w><t>makes</t><verb/></w>
<w><t>make-shift</t></w>
<w><t>make-shift-ness</t></w>
<w><t>make-shift-y</t></w>
@@ -93872,14 +93849,14 @@
<w><t>Me-los</t></w>
<w><t>Mel-pom-e-ne</t></w>
<w><t>Mel-rose</t></w>
-<w><t>melt</t></w>
+<w><t>melt</t><noun number="pluralizable"/><verb regular-root="true"/></w>
<w><t>melt-a-bil-i-ty</t></w>
-<w><t>melt-a-ble</t></w>
+<w><t>melt-a-ble</t><adjective/></w>
<w><t>melt-age</t></w>
-<w><t>melt-er</t></w>
-<w><t>melt-ing point</t></w>
-<w><t>melt-ing pot</t></w>
-<w><t>melt-ing-ly</t></w>
+<w><t>melt-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<phrase><t>melt-ing point</t></phrase>
+<phrase><t>melt-ing pot</t></phrase>
+<w><t>melt-ing-ly</t><adverb/></w>
<w><t>melt-ing-ness</t></w>
<w><t>mel-ton</t></w>
<w><t>melt-wa-ter</t></w>
@@ -96931,13 +96908,9 @@
<w><t>mod-at-ed</t></w>
<w><t>mod-at-ing</t></w>
<w><t>mode</t></w>
-<w><t>mod-el</t></w>
-<w><t>mod-eled</t></w>
-<w><t>mod-el-er</t></w>
-<w><t>mod-el-ing</t></w>
-<w><t>mod-elled</t></w>
-<w><t>mod-el-ler</t></w>
-<w><t>mod-el-ling</t></w>
+<w><t>mod-el</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>mod-el-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<!--<w><t>mod-el-ler</t></w>-->
<w><t>mo-dem</t></w>
<w><t>Mo-de-na</t></w>
<w><t>mod-er-ate</t></w>
@@ -133906,34 +133879,34 @@
<w><t>re-cy-cle</t></w>
<w><t>Red</t></w>
<w><t>red</t></w>
-<w><t>red ad-mi-ral</t></w>
-<w><t>red al-gae</t></w>
-<w><t>red bid-dy</t></w>
-<w><t>Red Bri-gades</t></w>
-<w><t>red car-pet</t></w>
-<w><t>red ce-dar</t></w>
-<w><t>Red Chi-na</t></w>
-<w><t>red clo-ver</t></w>
-<w><t>red cor-al</t></w>
-<w><t>red cor-pus-cle</t></w>
-<w><t>Red Cres-cent</t></w>
-<w><t>red dust-er</t></w>
-<w><t>Red En-sign</t></w>
-<w><t>red gi-ant</t></w>
-<w><t>red her-ring</t></w>
-<w><t>Red In-di-an</t></w>
-<w><t>red mul-let</t></w>
-<w><t>red o-chre</t></w>
-<w><t>red o-sier</t></w>
-<w><t>red pack-et</t></w>
-<w><t>red pep-per</t></w>
-<w><t>Red Plan-et</t></w>
-<w><t>Red Riv-er</t></w>
-<w><t>red salm-on</t></w>
-<w><t>red set-ter</t></w>
-<w><t>red snap-per</t></w>
-<w><t>red spi-der</t></w>
-<w><t>red squir-rel</t></w>
+<phrase><t>red ad-mi-ral</t></phrase>
+<phrase><t>red al-gae</t></phrase>
+<phrase><t>red bid-dy</t></phrase>
+<phrase><t>Red Bri-gades</t></phrase>
+<phrase><t>red car-pet</t></phrase>
+<phrase><t>red ce-dar</t></phrase>
+<phrase><t>Red Chi-na</t></phrase>
+<phrase><t>red clo-ver</t></phrase>
+<phrase><t>red cor-al</t></phrase>
+<phrase><t>red cor-pus-cle</t></phrase>
+<phrase><t>Red Cres-cent</t></phrase>
+<phrase><t>red dust-er</t></phrase>
+<phrase><t>Red En-sign</t></phrase>
+<phrase><t>red gi-ant</t></phrase>
+<phrase><t>red her-ring</t></phrase>
+<phrase><t>Red In-di-an</t></phrase>
+<phrase><t>red mul-let</t></phrase>
+<phrase><t>red o-chre</t></phrase>
+<phrase><t>red o-sier</t></phrase>
+<phrase><t>red pack-et</t></phrase>
+<phrase><t>red pep-per</t></phrase>
+<phrase><t>Red Plan-et</t></phrase>
+<phrase><t>Red Riv-er</t></phrase>
+<phrase><t>red salm-on</t></phrase>
+<phrase><t>red set-ter</t></phrase>
+<phrase><t>red snap-per</t></phrase>
+<phrase><t>red spi-der</t></phrase>
+<phrase><t>red squir-rel</t></phrase>
<w><t>red=al-der</t></w>
<w><t>red=blood-ed</t></w>
<w><t>red=blood-ed-ness</t></w>
@@ -134269,8 +134242,8 @@
<w><t>Red-stone</t></w>
<w><t>red-tap-ism</t></w>
<w><t>red-top</t></w>
-<w><t>re-duce</t></w>
-<w><t>re-duced</t></w>
+<w><t>re-duce</t><verb regular-root="true"/></w>
+<w><t>re-duced</t><adjective/></w>
<w><t>re-duced lev-el</t></w>
<w><t>re-ducer</t></w>
<w><t>re-duc-er</t></w>
@@ -141652,12 +141625,12 @@
<w><t>Sa-va</t></w>
<w><t>sav-a-ble</t></w>
<w><t>sav-a-ble-ness</t></w>
-<w><t>sav-age</t></w>
+<w><t>sav-age</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/><adjective extensible="false"/></w>
<w><t>Sav-age</t></w>
-<w><t>Sav-age Is-land</t></w>
-<w><t>sav-age-ly</t></w>
-<w><t>sav-age-ness</t></w>
-<w><t>sav-age-ry</t></w>
+<phrase><t>Sav-age Is-land</t></phrase>
+<w><t>sav-age-ly</t><adverb/></w>
+<w><t>sav-age-ness</t><noun number="singular"/></w>
+<w><t>sav-age-ry</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>sav-ag-ism</t></w>
<w><t>Sa-vai-i</t></w>
<w><t>Sa-van-na</t></w>
@@ -145866,7 +145839,6 @@
<w><t>sev-en-bark</t></w>
<w><t>Sev-en-er</t></w>
<w><t>sev-en-fold</t></w>
-<w><t>sev-ens</t></w>
<w><t>sev-en-teen</t><cardinal/></w>
<w><t>sev-en-teen=year lo-cust</t></w>
<w><t>sev-en-teenth</t><ordinal/></w>
@@ -146962,11 +146934,11 @@
<w><t>shot-ting</t></w>
<w><t>Shot-well</t></w>
<w><t>should</t></w>
-<w><t>shoul-der</t></w>
-<w><t>shoul-der blade</t></w>
-<w><t>shoul-der pad</t></w>
-<w><t>shoul-der patch</t></w>
-<w><t>shoul-der strap</t></w>
+<w><t>shoul-der</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
+<phrase><t>shoulder blade</t></phrase>
+<phrase><t>shoulder pad</t></phrase>
+<phrase><t>shoulder patch</t></phrase>
+<phrase><t>shoulder strap</t></phrase>
<w><t>should-est</t></w>
<w><t>should-n't</t></w>
<w><t>should-na</t></w>
@@ -148070,7 +148042,7 @@
<w><t>sis-sonne</t></w>
<w><t>sis-sy</t></w>
<w><t>sis-sy-ish</t></w>
-<w><t>sis-ter</t></w>
+<w><t>sis-ter</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>sis-ter=in=law</t></w>
<w><t>sis-ter-hood</t></w>
<w><t>sis-ter-less</t></w>
@@ -158337,20 +158309,20 @@
<w><t>sweep-stakes</t></w>
<w><t>sweep-y</t></w>
<w><t>sweer</t></w>
-<w><t>sweet</t></w>
+<w><t>sweet</t><adjective extensible="true"/></w>
<w><t>Sweet</t></w>
-<w><t>sweet a-lys-sum</t></w>
-<w><t>sweet bas-il</t></w>
-<w><t>sweet cher-ry</t></w>
-<w><t>sweet chest-nut</t></w>
-<w><t>sweet cic-e-ly</t></w>
-<w><t>sweet ci-der</t></w>
-<w><t>sweet clo-ver</t></w>
-<w><t>sweet mar-jo-ram</t></w>
-<w><t>sweet pep-per</t></w>
-<w><t>sweet po-ta-to</t></w>
-<w><t>sweet wil-liam</t></w>
-<w><t>sweet wood-ruff</t></w>
+<phrase><t>sweet a-lys-sum</t></phrase>
+<phrase><t>sweet bas-il</t></phrase>
+<phrase><t>sweet cher-ry</t></phrase>
+<phrase><t>sweet chest-nut</t></phrase>
+<phrase><t>sweet cic-e-ly</t></phrase>
+<phrase><t>sweet ci-der</t></phrase>
+<phrase><t>sweet clo-ver</t></phrase>
+<phrase><t>sweet mar-jo-ram</t></phrase>
+<phrase><t>sweet pep-per</t></phrase>
+<phrase><t>sweet po-ta-to</t></phrase>
+<phrase><t>sweet wil-liam</t></phrase>
+<phrase><t>sweet wood-ruff</t></phrase>
<w><t>sweet=scent-ed</t></w>
<w><t>sweet=tem-pered</t></w>
<w><t>sweet=tem-pered-ness</t></w>
@@ -158358,8 +158330,8 @@
<w><t>sweet-bread</t></w>
<w><t>sweet-bri-ar</t></w>
<w><t>sweet-bri-er</t></w>
-<w><t>sweet-en</t></w>
-<w><t>sweet-en-er</t></w>
+<w><t>sweet-en</t><verb regular-root="true"/></w>
+<w><t>sweet-en-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>sweet-en-ing</t></w>
<w><t>sweet-heart</t></w>
<w><t>sweet-heart a-gree-ment</t></w>
@@ -161933,7 +161905,7 @@
<w><t>Thim-phu</t></w>
<w><t>thin</t></w>
<w><t>thine</t></w>
-<w><t>thing</t></w>
+<w><t>thing</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>thing=in=it-self</t></w>
<w><t>thing-a-ma-bob</t></w>
<w><t>thing-a-ma-jig</t></w>
@@ -164997,7 +164969,7 @@
<w><t>trib-al-ist</t></w>
<w><t>trib-al-ly</t></w>
<w><t>tri-ba-sic</t></w>
-<w><t>tribe</t></w>
+<w><t>tribe</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>tribe-less</t></w>
<w><t>tribe-let</t></w>
<w><t>tribes-man</t></w>
@@ -166599,8 +166571,8 @@
<w><t>twit-ter-y</t></w>
<w><t>twixt</t></w>
<w><t>two</t><cardinal/></w>
-<w><t>Two Sic-i-lies</t></w>
-<w><t>Two=and=a=half In-ter-na-tion-al</t></w>
+<phrase><t>Two Sic-i-lies</t></phrase>
+<phrase><t>Two=and=a=half In-ter-na-tion-al</t></phrase>
<w><t>two=bit</t></w>
<w><t>two=col-or</t></w>
<w><t>two=cy-cle</t></w>
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml 2021-11-09 03:05:48 UTC (rev 12018)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml 2021-11-09 12:16:18 UTC (rev 12019)
@@ -16,7 +16,12 @@
-->
<w><t>co=la-bour-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>ful-fil</t><verb/></w>
+<w><t>ful-fils</t><verb number="singular"/></w>
+<w><t>ful-fil-ment</t><noun/></w>
<w><t>la-bour</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>la-boured</t><adjective/></w>
<w><t>la-bour-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>mod-elled</t><verb/></w>
+<w><t>mod-ell-ing</t><noun number="singular" convertible-to-possessive="true"/><verb/></w>
</axsl-dictionary>
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml 2021-11-09 03:05:48 UTC (rev 12018)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml 2021-11-09 12:16:18 UTC (rev 12019)
@@ -16,7 +16,11 @@
-->
<w><t>co=la-bor-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>ful-fill</t><verb regular-root="true"/></w>
+<w><t>ful-fill-ment</t><noun/></w>
<w><t>la-bor</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>la-bored</t><adjective/></w>
<w><t>la-bor-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>mod-eled</t><verb/></w>
+<w><t>mod-el-ing</t><noun number="singular" convertible-to-possessive="true"/><verb/></w>
</axsl-dictionary>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-09 03:05:51
|
Revision: 12018
http://sourceforge.net/p/foray/code/12018
Author: victormote
Date: 2021-11-09 03:05:48 +0000 (Tue, 09 Nov 2021)
Log Message:
-----------
Fix bug when qualifier is null.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java 2021-11-08 23:08:39 UTC (rev 12017)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java 2021-11-09 03:05:48 UTC (rev 12018)
@@ -207,7 +207,9 @@
* Indicates whether a given set of flags matches both a given part of speech and qualifier.
* @param flags The encoded flags to be tested.
* @param pos The part of speech being tested for.
+ * This cannot be null.
* @param qualifier The qualifier being tested for.
+ * This can be null, which implies that only the part-of-speech should be tested.
* @return True if and only if the flags match both the part of speech and the qualifier.
*/
public static boolean isOfQualifiedType(final char flags, final PartOfSpeech pos, final PosQualifier qualifier) {
@@ -215,6 +217,9 @@
if (isPosMatch == false) {
return false;
}
+ if (qualifier == null) {
+ return true;
+ }
final int index = computeQualifierIndex(pos, qualifier);
if (index < 0) {
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-08 23:08:40
|
Revision: 12017
http://sourceforge.net/p/foray/code/12017
Author: victormote
Date: 2021-11-08 23:08:39 +0000 (Mon, 08 Nov 2021)
Log Message:
-----------
Conform to aXSL change, adding "phrase" element to the axsl-dictionary.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-08 22:43:35 UTC (rev 12016)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-08 23:08:39 UTC (rev 12017)
@@ -456,7 +456,7 @@
<w><t>a-bou-dik-ro</t></w>
<w><t>a-bought</t></w>
<w><t>A-bou-kir</t></w>
-<w><t>A-bou-kir Bay</t></w>
+<phrase><t>Aboukir Bay</t></phrase>
<w><t>a-bou-li-a</t></w>
<w><t>a-bou-lic</t></w>
<w><t>a-bound</t></w>
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-08 22:43:35 UTC (rev 12016)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-08 23:08:39 UTC (rev 12017)
@@ -384,6 +384,7 @@
break;
}
case "axsl-dictionaries": break;
+ case "phrase": break;
default: {
throw new IllegalStateException("Unknown element started: " + localName);
}
@@ -481,6 +482,7 @@
break;
}
case "axsl-dictionaries": break;
+ case "phrase": break;
default: {
throw new IllegalStateException("Unknown element ended: " + localName);
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-08 22:43:38
|
Revision: 12016
http://sourceforge.net/p/foray/code/12016
Author: victormote
Date: 2021-11-08 22:43:35 +0000 (Mon, 08 Nov 2021)
Log Message:
-----------
Conform to aXSL changes providing more options and flexibility for part-of-speech qualifiers.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativeRule.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/StringWord.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/WordWrapper.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/ConfigParser.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/PosUtilsTests.java
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-08 22:43:35 UTC (rev 12016)
@@ -4587,7 +4587,7 @@
<w><t>Al-tair</t></w>
<w><t>Al-ta-ir</t></w>
<w><t>Al-ta-mi-ra</t></w>
-<w><t>al-tar</t><noun regular-root="true"/></w>
+<w><t>al-tar</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>al-tar boy</t></w>
<w><t>al-tar-age</t></w>
<w><t>al-tar-piece</t></w>
@@ -6120,7 +6120,7 @@
<w><t>An-garsk</t></w>
<w><t>an-ga-ry</t></w>
<w><t>an-ge-kok</t></w>
-<w><t>an-gel</t><noun regular-root="true"/></w>
+<w><t>an-gel</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>An-gel</t></w>
<w><t>an-gel cake</t></w>
<w><t>An-gel Falls</t></w>
@@ -11845,7 +11845,7 @@
<w><t>a-wak-en-ing</t></w>
<w><t>a-wak-en-ing-ly</t></w>
<w><t>a-wak-ing</t></w>
-<w><t>a-ward</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>a-ward</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>a-ward wage</t></w>
<w><t>a-ward-a-ble</t></w>
<w><t>a-ward-er</t></w>
@@ -14247,7 +14247,7 @@
<w><t>beak-like</t></w>
<w><t>beak-y</t></w>
<w><t>Beal</t></w>
-<w><t>beam</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>beam</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>beam aer-i-al</t></w>
<w><t>beam com-pass</t></w>
<w><t>beam rid-ing</t></w>
@@ -19633,7 +19633,7 @@
<w><t>Brig-ham</t></w>
<w><t>Brig-house</t></w>
<w><t>Bright</t></w>
-<w><t>bright</t><adjective regular-root="true"/></w>
+<w><t>bright</t><adjective extensible="true"/></w>
<w><t>Bright's dis-ease</t></w>
<w><t>bright-en</t></w>
<w><t>bright-en-er</t></w>
@@ -20902,7 +20902,7 @@
<w><t>Bur-mese</t></w>
<w><t>Bur-mese cat</t></w>
<w><t>bur-mite</t></w>
-<w><t>burn</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>burn</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>burn=up</t></w>
<w><t>burn-a-ble</t></w>
<w><t>burned</t></w>
@@ -27280,7 +27280,7 @@
<w><t>chris-mon</t></w>
<w><t>chris-om</t></w>
<w><t>Chris-sie</t></w>
-<w><t>Christ</t><noun regular-root="true"/></w>
+<w><t>Christ</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>Chris-ta-bel</t></w>
<w><t>Chris-ta-del-phi-an</t></w>
<w><t>Christ-church</t></w>
@@ -30399,7 +30399,7 @@
<w><t>com-bi-na-to-ri-al</t></w>
<w><t>com-bi-na-to-ri-al a-nal-y-sis</t></w>
<w><t>com-bi-na-to-ry</t></w>
-<w><t>com-bine</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>com-bine</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>com-bine har-ves-ter</t></w>
<w><t>com-bined-ly</t></w>
<w><t>com-bined-ness</t></w>
@@ -38744,7 +38744,7 @@
<w><t>de-duc-tive-ly</t></w>
<w><t>Dee</t></w>
<w><t>dee</t></w>
-<w><t>deed</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>deed</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>deed-less</t></w>
<w><t>dee-jay</t></w>
<w><t>deek</t></w>
@@ -40456,7 +40456,7 @@
<w><t>de-sen-si-tized</t></w>
<w><t>de-sen-si-tiz-er</t></w>
<w><t>de-sen-si-tiz-ing</t></w>
-<w><t>des-ert</t><noun regular-root="true"/></w>
+<w><t>des-ert</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>de-sert</t><noun/><verb regular-root="true"/></w>
<w><t>des-ert boots</t></w>
<w><t>des-ert cool-er</t></w>
@@ -48706,7 +48706,7 @@
<w><t>emp-ti-ly</t></w>
<w><t>emp-ti-ness</t></w>
<w><t>emp-tor</t></w>
-<w><t>emp-ty</t><noun regular-root="true"/><verb regular-root="true"/><adjective/></w>
+<w><t>emp-ty</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/><adjective/></w>
<w><t>Emp-ty Quar-ter</t></w>
<w><t>emp-ty=hand-ed</t></w>
<w><t>emp-ty=head-ed</t></w>
@@ -61048,7 +61048,7 @@
<w><t>gen-er-ate</t></w>
<w><t>gen-er-at-ed</t></w>
<w><t>gen-er-at-ing</t></w>
-<w><t>gen-er-a-tion</t><noun regular-root="true"/></w>
+<w><t>gen-er-a-tion</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>gen-er-a-tion gap</t></w>
<w><t>gen-er-a-tive</t></w>
<w><t>gen-er-a-tive gram-mar</t></w>
@@ -61762,9 +61762,9 @@
<w><t>Gie-rek</t></w>
<w><t>Gie-se-king</t></w>
<w><t>Gies-sen</t></w>
-<w><t>gift</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>gift</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>gift=wrap-per</t></w>
-<w><t>gift-ed</t><adjective regular-root="false"/></w>
+<w><t>gift-ed</t><adjective extensible="false"/></w>
<w><t>gift-ed-ly</t></w>
<w><t>gift-ed-ness</t></w>
<w><t>gift-less</t></w>
@@ -62034,7 +62034,7 @@
<w><t>Giu-lio Ro-ma-no</t></w>
<w><t>Giu-sep-pe</t></w>
<w><t>giu-sto</t></w>
-<w><t>give</t><noun regular-root="false"/><verb regular-root="false"/></w>
+<w><t>give</t><noun number="singular" convertible-to-possessive="false"/><verb regular-root="false"/></w>
<w><t>give a-way</t></w>
<w><t>give on-to</t></w>
<w><t>give o-ver</t></w>
@@ -62519,7 +62519,7 @@
<w><t>Glov-er</t></w>
<w><t>Glov-ers-ville</t></w>
<w><t>glov-ing</t></w>
-<w><t>glow</t><noun regular-root="false"/><verb regular-root="true"/></w>
+<w><t>glow</t><noun number="singular"/><verb regular-root="true"/></w>
<w><t>glow dis-charge</t></w>
<w><t>glow-er</t></w>
<w><t>glow-er-ing-ly</t></w>
@@ -63541,7 +63541,7 @@
<w><t>grab-bler</t></w>
<w><t>gra-ben</t></w>
<w><t>Grac-chus</t></w>
-<w><t>grace</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>grace</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>Grace</t></w>
<w><t>grace=and=fa-vor</t></w>
<w><t>grace=and=fa-vour</t></w>
@@ -64012,7 +64012,7 @@
<w><t>grav</t></w>
<w><t>gra-va-men</t></w>
<w><t>gra-va-vam-i-na</t></w>
-<w><t>grave</t><noun regular-root="true"/><verb regular-root="false"/><adjective regular-root="true"/></w>
+<w><t>grave</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="false"/><adjective extensible="true"/></w>
<!-- Don't use. Too difficult to disambiguate. <w><t>gra-ve</t></w> -->
<w><t>grave-clothes</t></w>
<w><t>grave-dig-ger</t></w>
@@ -69570,7 +69570,7 @@
<w><t>hig-gle</t></w>
<w><t>hig-gle-dy=pig-gle-dy</t></w>
<w><t>hig-gler</t></w>
-<w><t>high</t><noun regular-root="true"/><adjective regular-root="true"/><adverb/></w>
+<w><t>high</t><noun number="pluralizable" convertible-to-possessive="true"/><adjective extensible="true"/><adverb/></w>
<w><t>high al-tar</t></w>
<w><t>high com-e-dy</t></w>
<w><t>high com-mand</t></w>
@@ -89067,7 +89067,7 @@
<w><t>lone-some-ness</t></w>
<w><t>Lo-ney</t></w>
<w><t>Long</t></w>
-<w><t>long</t><adjective regular-root="true"/></w>
+<w><t>long</t><adjective extensible="true"/></w>
<w><t>Long Ea-ton</t></w>
<w><t>long hun-dred-weight</t></w>
<w><t>Long Is-land</t></w>
@@ -89514,7 +89514,7 @@
<w><t>lov-age</t></w>
<w><t>lov-at</t></w>
<w><t>Love</t></w>
-<w><t>love</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>love</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>love af-fair</t></w>
<w><t>love ap-ple</t></w>
<w><t>love let-ter</t></w>
@@ -89562,7 +89562,7 @@
<w><t>lov-ing-ly</t></w>
<w><t>lov-ing-ness</t></w>
<w><t>Lov-ing-ton</t></w>
-<w><t>low</t><noun regular-root="true"/><verb regular-root="true"/><adjective regular-root="true"/><adverb/></w>
+<w><t>low</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/><adjective extensible="true"/><adverb/></w>
<w><t>Low</t></w>
<w><t>Low Ar-chi-pel-a-go</t></w>
<w><t>low com-e-dy</t></w>
@@ -92700,7 +92700,7 @@
<w><t>mas-ta-bah</t></w>
<w><t>mas-tax</t></w>
<w><t>mas-tec-to-my</t></w>
-<w><t>mas-ter</t><noun regular-root="true"/><verb regular-root="true"/><adjective regular-root="false"/></w>
+<w><t>mas-ter</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/><adjective extensible="false"/></w>
<w><t>Mas-ter</t></w>
<w><t>mas-ter build-er</t></w>
<w><t>mas-ter cyl-in-der</t></w>
@@ -94484,7 +94484,7 @@
<w><t>Mes-se-ni-a</t></w>
<w><t>Mes-ser-schmitt</t></w>
<w><t>Mes-siaen</t></w>
-<w><t>mes-si-ah</t><noun regular-root="true"/></w>
+<w><t>mes-si-ah</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>mes-si-an-ic</t></w>
<w><t>Mes-si-an-i-cal-ly</t></w>
<w><t>Mes-si-dor</t></w>
@@ -96560,7 +96560,7 @@
<w><t>mis-sil-ry</t></w>
<w><t>mis-sing</t></w>
<w><t>mis-sing link</t></w>
-<w><t>mis-sion</t><noun regular-root="true"/></w>
+<w><t>mis-sion</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>Mis-sion</t></w>
<w><t>mis-sion-ar-ies</t></w>
<w><t>mis-sion-ar-y</t></w>
@@ -100106,7 +100106,7 @@
<w><t>nathe-less</t></w>
<w><t>nath-less</t></w>
<w><t>Na-tick</t></w>
-<w><t>na-tion</t><noun regular-root="true"/></w>
+<w><t>na-tion</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>na-tion=state</t></w>
<w><t>Na-tion-al</t></w>
<w><t>na-tion-al</t></w>
@@ -123272,7 +123272,7 @@
<w><t>Prax-it-e-les</t></w>
<w><t>Prax-ith-e-a</t></w>
<w><t>pray</t></w>
-<w><t>prayer</t><noun regular-root="true"/></w>
+<w><t>prayer</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>prayer meet-ing</t></w>
<w><t>prayer-ful</t></w>
<w><t>prayer-ful-ly</t></w>
@@ -127853,13 +127853,13 @@
<w><t>pro-mis-cu-ous</t></w>
<w><t>pro-mis-cu-ous-ly</t></w>
<w><t>pro-mis-cu-ous-ness</t></w>
-<w><t>prom-ise</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>prom-ise</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>Prom-ised Land</t></w>
<w><t>prom-is-ee</t></w>
<w><t>prom-i-see</t></w>
<w><t>prom-ise-ful</t></w>
<w><t>prom-is-er</t></w>
-<w><t>prom-is-ing</t><adjective regular-root="false"/></w>
+<w><t>prom-is-ing</t><adjective extensible="false"/></w>
<w><t>prom-i-sor</t></w>
<w><t>prom-is-so-ri-ly</t></w>
<w><t>prom-is-so-ry</t></w>
@@ -130457,7 +130457,7 @@
<w><t>qual-i-ta-tive</t></w>
<w><t>qual-i-ta-tive a-nal-y-sis</t></w>
<w><t>qual-i-ta-tive-ly</t></w>
-<w><t>qual-i-ty</t><noun regular-root="true"/></w>
+<w><t>qual-i-ty</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>qual-i-ty con-trol</t></w>
<w><t>qual-i-ty-less</t></w>
<w><t>qualm</t></w>
@@ -132975,7 +132975,7 @@
<w><t>re-al-ist</t></w>
<w><t>re-al-is-tic</t></w>
<w><t>re-al-is-ti-cal-ly</t></w>
-<w><t>re-al-i-ty</t><noun regular-root="true"/></w>
+<w><t>re-al-i-ty</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>re-al-iz-a-bil-i-ty</t></w>
<w><t>re-al-iz-a-ble</t></w>
<w><t>re-al-iz-a-ble-ness</t></w>
@@ -133767,7 +133767,7 @@
<w><t>re-cop-ied</t></w>
<w><t>re-cop-y</t></w>
<w><t>re-cop-y-ing</t></w>
-<w><t>rec-ord</t><noun regular-root="true"/></w>
+<w><t>rec-ord</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>re-cord</t><verb regular-root="true"/></w>
<w><t>rec-ord=chang-er</t></w>
<w><t>rec-ord=play-er</t></w>
@@ -136991,7 +136991,7 @@
<w><t>re-sprung</t></w>
<w><t>re-squan-der</t></w>
<w><t>res-sen-ti-ment</t></w>
-<w><t>rest</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>rest</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>re-stab</t></w>
<w><t>re-stabbed</t></w>
<w><t>re-stab-bing</t></w>
@@ -140504,7 +140504,7 @@
<w><t>sail-plan-ing</t></w>
<w><t>sain</t></w>
<w><t>sain-foin</t></w>
-<w><t>saint</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>saint</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>Saint Ag-nes's Eve</t></w>
<w><t>Saint Al-bans</t></w>
<w><t>Saint An-tho-ny's Cross</t></w>
@@ -142749,7 +142749,7 @@
<w><t>scrip-tur-al</t></w>
<w><t>scrip-tur-al-ly</t></w>
<w><t>scrip-tur-al-ness</t></w>
-<w><t>scrip-ture</t><noun regular-root="true"/></w>
+<w><t>scrip-ture</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>Scrip-ture</t></w>
<w><t>script-writ-er</t></w>
<w><t>script-writ-ing</t></w>
@@ -143176,7 +143176,7 @@
<w><t>sec-re-tar-i-at</t></w>
<w><t>sec-re-tar-i-ate</t></w>
<w><t>sec-re-tar-ies=gen-er-al</t></w>
-<w><t>sec-re-tar-y</t><noun regular-root="true"/></w>
+<w><t>sec-re-tar-y</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>sec-re-tar-y bird</t></w>
<w><t>sec-re-tar-y of state</t></w>
<w><t>sec-re-tar-y=gen-er-al</t></w>
@@ -148729,7 +148729,7 @@
<w><t>sli-er</t></w>
<w><t>sliest</t></w>
<w><t>sli-est</t></w>
-<w><t>slight</t><adjective regular-root="true"/></w>
+<w><t>slight</t><adjective extensible="true"/></w>
<w><t>slight-er</t></w>
<w><t>slight-ing</t></w>
<w><t>slight-ing-ly</t></w>
@@ -149268,7 +149268,7 @@
<w><t>snarl-ing-ly</t></w>
<w><t>snarl-y</t></w>
<w><t>snash</t></w>
-<w><t>snatch</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>snatch</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>snatch-a-ble</t></w>
<w><t>snatch-er</t></w>
<w><t>snatch-i-er</t></w>
@@ -152575,7 +152575,7 @@
<w><t>staph-y-lot-o-my</t></w>
<w><t>sta-ple</t></w>
<w><t>sta-pler</t></w>
-<w><t>star</t><noun regular-root="true"/><verb regular-root="false"/></w>
+<w><t>star</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="false"/></w>
<w><t>Star Cham-ber</t></w>
<w><t>star con-nec-tion</t></w>
<w><t>Star of Beth-le-hem</t></w>
@@ -153657,7 +153657,7 @@
<w><t>stomp-er</t></w>
<w><t>stomp-ing-ly</t></w>
<w><t>ston-a-ble</t></w>
-<w><t>stone</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>stone</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>Stone</t></w>
<w><t>stone boil-ing</t></w>
<w><t>stone bram-ble</t></w>
@@ -157973,7 +157973,7 @@
<w><t>sur-plus-age</t></w>
<w><t>sur-print</t></w>
<w><t>sur-pris-al</t></w>
-<w><t>sur-prise</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>sur-prise</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>sur-prised</t><adjective/></w>
<w><t>sur-pris-ed-ly</t></w>
<w><t>sur-pris-er</t></w>
@@ -159439,7 +159439,7 @@
<w><t>Tak-a-mat-su</t></w>
<w><t>Ta-ka-ma-tsu</t></w>
<w><t>Ta-kao</t></w>
-<w><t>take</t><noun regular-root="true"/><verb regular-root="false"/></w>
+<w><t>take</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="false"/></w>
<w><t>take a-back</t></w>
<w><t>take af-ter</t></w>
<w><t>take a-part</t></w>
@@ -163572,7 +163572,7 @@
<w><t>Tor-bay</t></w>
<w><t>tor-bern-ite</t></w>
<w><t>torc</t></w>
-<w><t>torch</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>torch</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>torch-bear-er</t></w>
<w><t>torch-i-er</t></w>
<w><t>tor-chier</t></w>
@@ -164954,7 +164954,7 @@
<w><t>tri-ad-i-cal-ly</t></w>
<w><t>tri-ad-ism</t></w>
<w><t>tri-age</t></w>
-<w><t>tri-al</t><noun regular-root="true"/></w>
+<w><t>tri-al</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>tri-al and er-ror</t></w>
<w><t>tri-al bal-ance</t></w>
<w><t>tri-al bal-loon</t></w>
@@ -181042,7 +181042,7 @@
<w><t>Vi-et-nam-ese</t></w>
<w><t>Vi-et-nam-i-sa-tion</t></w>
<w><t>Vi-et-nam-i-za-tion</t></w>
-<w><t>view</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>view</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>view hal-loo</t></w>
<w><t>view-a-ble</t></w>
<w><t>view-er</t></w>
@@ -184885,7 +184885,7 @@
<w><t>Wil-ming-to-ni-an</t></w>
<w><t>Wil-more</t></w>
<w><t>Wil-no</t></w>
-<w><t>Wil-son</t><noun regular-root="true"/></w>
+<w><t>Wil-son</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>Wil-son cloud cham-ber</t></w>
<w><t>Wil-son's pet-rel</t></w>
<w><t>Wil-son's snipe</t></w>
@@ -185423,8 +185423,7 @@
<w><t>Wol-ver-hamp-ton</t></w>
<w><t>wol-ver-ine</t></w>
<w><t>wolves</t></w>
-<w><t>wom-an</t><noun regular-root="false"/></w>
-<w><t>wom-an’s</t><adjective regular-root="false"/></w>
+<w><t>wom-an</t><noun number="singular" convertible-to-possessive="true"/></w>
<w><t>wom-an=chas-er</t></w>
<w><t>wom-an=hat-er</t></w>
<w><t>wom-an-hood</t></w>
@@ -185457,7 +185456,7 @@
<w><t>won</t></w>
<w><t>won't</t></w>
<w><t>Won-der</t></w>
-<w><t>won-der</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>won-der</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>won-der=strick-en</t></w>
<w><t>won-der-ber-ry</t></w>
<w><t>won-der-er</t></w>
@@ -185708,7 +185707,7 @@
<w><t>work-wom-an</t></w>
<w><t>work-wom-en</t></w>
<w><t>Wor-land</t></w>
-<w><t>world</t><noun regular-root="true"/></w>
+<w><t>world</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>World Health Or-gan-i-za-tion</t></w>
<w><t>world lan-guage</t></w>
<w><t>world pow-er</t></w>
@@ -186324,7 +186323,7 @@
<w><t>yeal-ing</t></w>
<w><t>yean</t></w>
<w><t>yean-ling</t></w>
-<w><t>year</t><noun regular-root="true"/></w>
+<w><t>year</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
<w><t>year=a-round</t></w>
<w><t>year-book</t></w>
<w><t>year-ling</t></w>
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml 2021-11-08 22:43:35 UTC (rev 12016)
@@ -15,8 +15,8 @@
eng-999-Latn.dict.xml.
-->
-<w><t>co=la-bour-er</t><noun regular-root="true"/></w>
-<w><t>la-bour</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>co=la-bour-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>la-bour</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>la-boured</t><adjective/></w>
-<w><t>la-bour-er</t><noun regular-root="true"/></w>
+<w><t>la-bour-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
</axsl-dictionary>
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml 2021-11-08 22:43:35 UTC (rev 12016)
@@ -15,8 +15,8 @@
eng-999-Latn.dict.xml.
-->
-<w><t>co=la-bor-er</t><noun regular-root="true"/></w>
-<w><t>la-bor</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>co=la-bor-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
+<w><t>la-bor</t><noun number="pluralizable" convertible-to-possessive="true"/><verb regular-root="true"/></w>
<w><t>la-bored</t><adjective/></w>
-<w><t>la-bor-er</t><noun regular-root="true"/></w>
+<w><t>la-bor-er</t><noun number="pluralizable" convertible-to-possessive="true"/></w>
</axsl-dictionary>
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -85,7 +85,7 @@
}
for (int index = 0; index < this.alternatives.length; index ++) {
final T word = alternatives[index];
- if (word.isOfType(pos, null)) {
+ if (word.isOfType(pos)) {
return word;
}
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativeRule.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativeRule.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativeRule.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -30,7 +30,7 @@
import org.axsl.hyphen.DerivativeType;
import org.axsl.hyphen.PartOfSpeech;
-import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.PosQualifier;
import org.axsl.hyphen.Word;
import java.util.ArrayList;
@@ -45,16 +45,16 @@
/** The part of speech to which the root must belong. */
private PartOfSpeech rootPos;
- /** Indicates whether the {@link #rootPos} must be of regular form for this rule to apply. */
- private boolean isRegular;
+ /** The qualifier that must be true for {@link #rootPos} for this rule to apply. */
+ private PosQualifier qualifier;
/** The (unmodifiable) list of derivative types which this rule applies.
* In other words, if this rule applies, identifies the types of derivative that the derivative could be. */
private List<DerivativeType> types;
- public DerivativeRule(final PartOfSpeech rootPos, final boolean isRegular, final List<DerivativeType> types) {
+ public DerivativeRule(final PartOfSpeech rootPos, final PosQualifier qualifier, final List<DerivativeType> types) {
this.rootPos = rootPos;
- this.isRegular = isRegular;
+ this.qualifier = qualifier;
final List<DerivativeType> defensiveCopy = new ArrayList<DerivativeType>(types.size());
Collections.copy(types, defensiveCopy);
this.types = Collections.unmodifiableList(defensiveCopy);
@@ -66,8 +66,10 @@
* @return True if and only if {@code word} meets the criteria for this rule.
*/
public boolean matches(final Word word) {
- final PosRegularity regularity = this.isRegular ? PosRegularity.REGULAR : PosRegularity.IRREGULAR;
- return word.isOfType(this.rootPos, regularity);
+ if (! word.isOfType(this.rootPos)) {
+ return false;
+ }
+ return word.isOfQualifiedType(this.rootPos, this.qualifier);
}
/**
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/PosUtils.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -29,8 +29,10 @@
package org.foray.hyphen;
import org.axsl.hyphen.PartOfSpeech;
-import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.PosQualifier;
+import org.xml.sax.Attributes;
+
/**
* Utilities related to {@link PartOfSpeech}.
*/
@@ -38,36 +40,49 @@
/* Checkstyle: Allow Magic Numbers that are hard-coded data. */
- /** Index to the flag indicating whether the word is a regular noun. */
- private static final int REGULAR_NOUN_INDEX = 13;
+ /** Message format for unexpected XML attribute values. */
+// private static final String INVALID_QUALIFIER_VALUE_MESSAGE = "Unexpected value for \"%1\": %2";
- /** Index to the flag indicating whether the word is a regular verb. */
+ /** Return value indicating that the requested qualifier is a valid qualifier in this context, but that there is no
+ * index for it, i.e. we have chosen not to store it. */
+ private static final int VALID_QUALIFIER_WITH_NO_INDEX = -1;
+
+ /** Return valud indicating that the requested qualifier is not valid in this context, i.e. does not apply to the
+ * corresponding part of speech. */
+ private static final int INVALID_QUALIFIER = -2;
+
+ /** Index to the flag indicating whether the noun is pluralizable. */
+ private static final int NOUN_PLURALIZABLE_INDEX = 12;
+
+ /** Index to the flag indicating whether the noun is convertible to a possessive. */
+ private static final int NOUN_CONVERTIBLE_TO_POSSESSIVE_INDEX = 13;
+
+ /** Index to the flag indicating whether the verb is a regular verb. */
private static final int REGULAR_VERB_INDEX = 14;
/** Index to the flag indicating whether the word is a regular adjective. */
- private static final int REGULAR_ADJECTIVE_INDEX = 15;
+ private static final int ADJ_EXTENSIBLE_INDEX = 15;
/** Mask suitable for accumulating multiple parts of speech in one char. */
private static final char[] MASKS = new char[16];
static {
- MASKS[PartOfSpeech.NOUN.getNumericValue()] = 0x0001; // Index 0. 1
- MASKS[PartOfSpeech.PRONOUN.getNumericValue()] = 0x0002; // Index 1. 2
- MASKS[PartOfSpeech.VERB.getNumericValue()] = 0x0004; // Index 2. 4
- MASKS[PartOfSpeech.ADJECTIVE.getNumericValue()] = 0x0008; // Index 3. 8
- MASKS[PartOfSpeech.ADVERB.getNumericValue()] = 0x0010; // Index 4. 16
- MASKS[PartOfSpeech.PREPOSITION.getNumericValue()] = 0x0020; // Index 5. 32
- MASKS[PartOfSpeech.CONJUNCTION.getNumericValue()] = 0x0040; // Index 6. 64
- MASKS[PartOfSpeech.ARTICLE.getNumericValue()] = 0x0080; // Index 7. 128
- MASKS[PartOfSpeech.INTERJECTION.getNumericValue()] = 0x0100; // Index 8. 256
- MASKS[PartOfSpeech.CARDINAL.getNumericValue()] = 0x0200; // Index 9. 512
- MASKS[PartOfSpeech.ORDINAL.getNumericValue()] = 0x0400; // Index 10. 1,024
+ MASKS[PartOfSpeech.NOUN.getIndex()] = 0x0001; // Index 0. 1
+ MASKS[PartOfSpeech.PRONOUN.getIndex()] = 0x0002; // Index 1. 2
+ MASKS[PartOfSpeech.VERB.getIndex()] = 0x0004; // Index 2. 4
+ MASKS[PartOfSpeech.ADJECTIVE.getIndex()] = 0x0008; // Index 3. 8
+ MASKS[PartOfSpeech.ADVERB.getIndex()] = 0x0010; // Index 4. 16
+ MASKS[PartOfSpeech.PREPOSITION.getIndex()] = 0x0020; // Index 5. 32
+ MASKS[PartOfSpeech.CONJUNCTION.getIndex()] = 0x0040; // Index 6. 64
+ MASKS[PartOfSpeech.DETERMINER.getIndex()] = 0x0080; // Index 7. 128
+ MASKS[PartOfSpeech.INTERJECTION.getIndex()] = 0x0100; // Index 8. 256
+ MASKS[PartOfSpeech.CARDINAL.getIndex()] = 0x0200; // Index 9. 512
+ MASKS[PartOfSpeech.ORDINAL.getIndex()] = 0x0400; // Index 10. 1,024
/* Leave some room in the middle for expansion from either end. */
MASKS[11] = 0x0800; // Index 11. 2,048
- MASKS[12] = 0x1000; // Index 12. 4,096
- MASKS[REGULAR_NOUN_INDEX] = 0x2000; // Index 13. 8,192
+ MASKS[NOUN_PLURALIZABLE_INDEX] = 0x1000; // Index 12. 4,096
+ MASKS[NOUN_CONVERTIBLE_TO_POSSESSIVE_INDEX] = 0x2000; // Index 13. 8,192
MASKS[REGULAR_VERB_INDEX] = 0x4000; // Index 14. 16,384
- MASKS[REGULAR_ADJECTIVE_INDEX] = 0x8000; // Index 15. 32,768
- /* */
+ MASKS[ADJ_EXTENSIBLE_INDEX] = 0x8000; // Index 15. 32,768
}
/* Checkstyle: Restart Magic Number checking. */
@@ -78,6 +93,69 @@
private PosUtils() { }
/**
+ * Computes the index into {@link #MASKS} for a given combination of a part of speech and a qualifier for that
+ * part of speech.
+ * @param pos The part of speech for which the mask index for a qualifier is needed.
+ * @param qualifier The qualifier for which the mask index is needed.
+ * @return The index into {@link #MASKS} that corresponds to the parameters, or -1 if the qualifier is one that we
+ * have chosen not to store, or {@value #INVALID_QUALIFIER} if the qualifier is not valid with the {@code pos}.
+ */
+ private static int computeQualifierIndex(final PartOfSpeech pos, final PosQualifier qualifier) {
+ switch(pos) {
+ case NOUN: {
+ switch(qualifier) {
+ case SINGULAR: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case PLURAL: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case PLURALIZABLE: return NOUN_PLURALIZABLE_INDEX;
+ case MASCULINE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case FEMININE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case NEUTER: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case CONVERTIBLE_TO_POSSESSIVE: return NOUN_CONVERTIBLE_TO_POSSESSIVE_INDEX;
+ default: return INVALID_QUALIFIER;
+ }
+ }
+ case PRONOUN: {
+ /* Unless there are words that can be both noun and pronoun, most or all of these qualifiers could be
+ * shared with noun. If we make that assumption, throw an Exception if that condition is violated, i.e. if
+ * a word tries to describe itself as both a noun and a pronoun. */
+ switch(qualifier) {
+ case SINGULAR: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case PLURAL: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case MASCULINE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case FEMININE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case NEUTER: return VALID_QUALIFIER_WITH_NO_INDEX;
+ default: return INVALID_QUALIFIER;
+ }
+ }
+ case VERB: {
+ switch(qualifier) {
+ case SINGULAR: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case PLURAL: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case MASCULINE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case FEMININE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case NEUTER: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case REGULAR_ROOT: return REGULAR_VERB_INDEX;
+ default: return INVALID_QUALIFIER;
+ }
+ }
+ case ADJECTIVE: {
+ switch(qualifier) {
+ case SINGULAR: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case PLURAL: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case MASCULINE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case FEMININE: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case NEUTER: return VALID_QUALIFIER_WITH_NO_INDEX;
+ case EXTENSIBLE: return ADJ_EXTENSIBLE_INDEX;
+ default: return INVALID_QUALIFIER;
+ }
+ }
+ default: {
+ return INVALID_QUALIFIER;
+ }
+ }
+ }
+
+ /**
* Adds the flag for a given part of speech to the existing flags.
* @param existing The existing flags.
* @param pos The part of speech to be added to existing flags.
@@ -84,50 +162,35 @@
* @return The new flags, consisting of all existing flags plus the one just added.
*/
public static char encodePosInfo(final char existing, final PartOfSpeech pos) {
- final int index = pos.getNumericValue();
+ final int index = pos.getIndex();
final char mask = MASKS[index];
return (char) (existing | mask);
}
/**
- * Adds the "regular noun" flag to the existing flags.
+ * Adds a qualifier to the existing flags.
* @param existing The existing flags.
+ * @param pos The part of speech to which the qualifier should be added.
+ * @param qualifier The qualifier which should be added.
* @return The new flags, consisting of all existing flags plus the one just added.
*/
- public static char encodeRegularNoun(final char existing) {
- if (! PosUtils.isPartOfSpeech(existing, PartOfSpeech.NOUN)) {
- throw new IllegalStateException("Cannot set \"regular noun\" flag unless already a noun.");
- }
- final char mask = MASKS[REGULAR_NOUN_INDEX];
- return (char) (existing | mask);
- }
+ public static char encodePosQualifier(final char existing, final PartOfSpeech pos, final PosQualifier qualifier) {
- /**
- * Adds the "regular verb" flag to the existing flags.
- * @param existing The existing flags.
- * @return The new flags, consisting of all existing flags plus the one just added.
- */
- public static char encodeRegularVerb(final char existing) {
- if (! PosUtils.isPartOfSpeech(existing, PartOfSpeech.VERB)) {
- throw new IllegalStateException("Cannot set \"regular verb\" flag unless already a verb.");
- }
- final char mask = MASKS[REGULAR_VERB_INDEX];
- return (char) (existing | mask);
- }
- /**
- * Adds the "regular adjective" flag to the existing flags.
- * @param existing The existing flags.
- * @return The new flags, consisting of all existing flags plus the one just added.
- */
- public static char encodeRegularAdjective(final char existing) {
- if (! PosUtils.isPartOfSpeech(existing, PartOfSpeech.ADJECTIVE)) {
- throw new IllegalStateException("Cannot set \"regular adjective\" flag unless already an adjective.");
+
+
+ if (! isPartOfSpeech(existing, pos)) {
+ throw new IllegalStateException("Part-of-speech " + pos + " has not been set.");
}
- final char mask = MASKS[REGULAR_ADJECTIVE_INDEX];
+ final int index = computeQualifierIndex(pos, qualifier);
+ if (index < 0) {
+ return existing;
+ }
+ final char mask = MASKS[index];
return (char) (existing | mask);
}
+
/**
* Indicates whether a given part of speech flag is set.
* @param flags The flags being tested.
@@ -135,78 +198,136 @@
* @return True if and only if the flag is set for {@code pos}.
*/
public static boolean isPartOfSpeech(final char flags, final PartOfSpeech pos) {
- final int index = pos.getNumericValue();
+ final int index = pos.getIndex();
final char mask = MASKS[index];
return (flags & mask) != 0;
}
/**
- * Indicates whether the "regular noun" flag is set.
- * @param flags The flags being tested.
- * @return True if and only if the "regular noun" flag is set.
+ * Indicates whether a given set of flags matches both a given part of speech and qualifier.
+ * @param flags The encoded flags to be tested.
+ * @param pos The part of speech being tested for.
+ * @param qualifier The qualifier being tested for.
+ * @return True if and only if the flags match both the part of speech and the qualifier.
*/
- public static boolean isRegularNoun(final char flags) {
- final char mask = MASKS[REGULAR_NOUN_INDEX];
+ public static boolean isOfQualifiedType(final char flags, final PartOfSpeech pos, final PosQualifier qualifier) {
+ final boolean isPosMatch = PosUtils.isPartOfSpeech(flags, pos);
+ if (isPosMatch == false) {
+ return false;
+ }
+
+ final int index = computeQualifierIndex(pos, qualifier);
+ if (index < 0) {
+ return false;
+ }
+ final char mask = MASKS[index];
return (flags & mask) != 0;
}
- /**
- * Indicates whether the "regular verb" flag is set.
- * @param flags The flags being tested.
- * @return True if and only if the "regular verb" flag is set.
- */
- public static boolean isRegularVerb(final char flags) {
- final char mask = MASKS[REGULAR_VERB_INDEX];
- return (flags & mask) != 0;
+ public static PosQualifier parseSingleQualifier(final Attributes attributes) {
+ PosQualifier qualifier = null;
+ for (int index = 0; index < attributes.getLength(); index ++) {
+ final String attributeName = attributes.getLocalName(index);
+ final String value = attributes.getValue(index);
+ final PosQualifier parsedQualifier = parseQualifier(attributeName, value);
+ if (parsedQualifier == null) {
+ continue;
+ }
+ if (qualifier == null) {
+ qualifier = parsedQualifier;
+ continue;
+ } else {
+ throw new IllegalArgumentException("Multiple qualifiers for element.");
+ }
+ }
+ return qualifier;
}
- /**
- * Indicates whether the "regular adjective" flag is set.
- * @param flags The flags being tested.
- * @return True if and only if the "regular adjective" flag is set.
- */
- public static boolean isRegularAdjective(final char flags) {
- final char mask = MASKS[REGULAR_ADJECTIVE_INDEX];
- return (flags & mask) != 0;
+ public static char parseAndEncodeQualifier(final char existing, final PartOfSpeech pos,
+ final Attributes attributes) {
+ char encoded = existing;
+ for (int index = 0; index < attributes.getLength(); index ++) {
+ final String attributeName = attributes.getLocalName(index);
+ final String value = attributes.getValue(index);
+ final PosQualifier qualifier = parseQualifier(attributeName, value);
+ if (qualifier == null) {
+ continue;
+ }
+ encoded = encodePosQualifier(encoded, pos, qualifier);
+ }
+ return encoded;
}
+// private static void invalidQualifierValue(final String attribute, final String value) {
+// String message = String.format(INVALID_QUALIFIER_VALUE_MESSAGE, attribute, value);
+// throw new IllegalArgumentException(message);
+// }
+
/**
- * Indicates whether a given set of flags matches both a given part of speech and regularity indicator.
- * @param flags The encoded flags to be tested.
- * @param pos The part of speech being tested for.
- * @param regularity The regularity being tested for.
- * @return True if and only if the flags match both the part of speech and regularity indicator.
+ * Parses one part-of-speech qualifier from an XML attribute.
+ * @param attribute The attribute name.
+ * @param value The attribute value.
+ * @return The qualifier signified by the parameters.
*/
- public static boolean isOfType(final char flags, final PartOfSpeech pos, final PosRegularity regularity) {
- final boolean isPosMatch = PosUtils.isPartOfSpeech(flags, pos);
- if (isPosMatch == false) {
- return false;
+ public static PosQualifier parseQualifier(final String attribute, final String value) {
+ if (attribute == null) {
+ return null;
}
-
- if (regularity == null
- || regularity == PosRegularity.IRREGULAR) {
- return true;
+ switch (attribute) {
+ case "number": {
+ switch (value) {
+ case "singular": return PosQualifier.SINGULAR;
+ case "plural": return PosQualifier.PLURAL;
+ case "pluralizable": return PosQualifier.PLURALIZABLE;
+ case "both": return PosQualifier.NUMBER_BOTH;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
-
- boolean isRegularityMatch = false;
- switch(pos) {
- case NOUN: {
- isRegularityMatch = PosUtils.isRegularNoun(flags);
- break;
+ case "gender": {
+ switch (value) {
+ case "masculine": return PosQualifier.MASCULINE;
+ case "feminine": return PosQualifier.FEMININE;
+ case "neuter": return PosQualifier.NEUTER;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
- case VERB: {
- isRegularityMatch = PosUtils.isRegularVerb(flags);
- break;
+ case "convertible-to-possessive": {
+ switch (value) {
+ case "true": return PosQualifier.CONVERTIBLE_TO_POSSESSIVE;
+ case "false": return null;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
- case ADJECTIVE: {
- isRegularityMatch = PosUtils.isRegularAdjective(flags);
- break;
+ case "regular-root": {
+ switch (value) {
+ case "true": return PosQualifier.REGULAR_ROOT;
+ case "false": return null;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
- default: {
- isRegularityMatch = true;
+ case "extensible": {
+ switch (value) {
+ case "true": return PosQualifier.EXTENSIBLE;
+ case "false": return null;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
+ case "possessive": {
+ switch (value) {
+ case "true": return PosQualifier.POSSESSIVE;
+ case "false": return null;
+// default: invalidQualifierValue(attribute, value);
+ default: return null;
+ }
}
- return isRegularityMatch;
+// default: throw new IllegalArgumentException("Unexpected attribute: " + attribute);
+ default: return null;
+ }
}
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -32,6 +32,7 @@
import org.axsl.hyphen.Dictionary;
import org.axsl.hyphen.PartOfSpeech;
+import org.axsl.hyphen.PosQualifier;
import java.util.Arrays;
import java.util.HashMap;
@@ -230,4 +231,10 @@
return 0;
}
+ @Override
+ public boolean supportsQualifiedType(final PartOfSpeech pos, final PosQualifier qualifier) {
+ /* TODO: Implement this. */
+ return false;
+ }
+
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -29,7 +29,7 @@
package org.foray.hyphen;
import org.axsl.hyphen.PartOfSpeech;
-import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.PosQualifier;
import org.axsl.hyphen.Word;
import org.axsl.hyphen.WordSegment;
@@ -100,8 +100,13 @@
}
@Override
- public Boolean isOfType(final PartOfSpeech pos, final PosRegularity regularity) {
- return PosUtils.isOfType(this.partsOfSpeech, pos, regularity);
+ public Boolean isOfType(final PartOfSpeech pos) {
+ return PosUtils.isPartOfSpeech(this.partsOfSpeech, pos);
}
+ @Override
+ public Boolean isOfQualifiedType(final PartOfSpeech pos, final PosQualifier qualifier) {
+ return PosUtils.isOfQualifiedType(this.partsOfSpeech, pos, qualifier);
+ }
+
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/StringWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/StringWord.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/StringWord.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -31,7 +31,7 @@
import org.foray.common.primitive.CharSequenceUtils;
import org.axsl.hyphen.PartOfSpeech;
-import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.PosQualifier;
/**
* A word implementation that wraps a set of {@link StringWordSegmentUtf16}, a thin wrapper around a {@link String}.
@@ -105,10 +105,15 @@
}
@Override
- public Boolean isOfType(final PartOfSpeech pos, final PosRegularity regularity) {
- return PosUtils.isOfType(this.partsOfSpeech, pos, regularity);
+ public Boolean isOfType(final PartOfSpeech pos) {
+ return PosUtils.isPartOfSpeech(this.partsOfSpeech, pos);
}
+ @Override
+ public Boolean isOfQualifiedType(final PartOfSpeech pos, final PosQualifier qualifier) {
+ return PosUtils.isOfQualifiedType(this.partsOfSpeech, pos, qualifier);
+ }
+
/**
* Returns the encoded parts of speech data for this word.
* @return The encoded parts of speech data for this word.
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/WordWrapper.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/WordWrapper.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/WordWrapper.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -29,7 +29,7 @@
package org.foray.hyphen;
import org.axsl.hyphen.PartOfSpeech;
-import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.PosQualifier;
import org.axsl.hyphen.Word;
import org.axsl.hyphen.WordSegment;
@@ -77,8 +77,15 @@
}
@Override
- public Boolean isOfType(final PartOfSpeech pos, final PosRegularity regularity) {
- return this.wrappedWord.isOfType(pos, regularity);
+ public Boolean isOfType(final PartOfSpeech pos) {
+ /* TODO: Fix this. */
+ return this.wrappedWord.isOfType(pos);
}
+ @Override
+ public Boolean isOfQualifiedType(final PartOfSpeech pos, final PosQualifier qualifier) {
+ /* TODO: Fix this. */
+ return this.wrappedWord.isOfQualifiedType(pos, qualifier);
+ }
+
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/ConfigParser.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/ConfigParser.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/ConfigParser.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -43,6 +43,7 @@
import org.foray.hyphen.HyphenationPatternsResource;
import org.foray.hyphen.HyphenationServer4a;
import org.foray.hyphen.OrthographyConfig4a;
+import org.foray.hyphen.PosUtils;
import org.foray.hyphen.WordBreaker;
import org.foray.hyphen.WordWrapperFactory;
@@ -49,6 +50,7 @@
import org.axsl.hyphen.DerivativeType;
import org.axsl.hyphen.HyphenationException;
import org.axsl.hyphen.PartOfSpeech;
+import org.axsl.hyphen.PosQualifier;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -108,7 +110,7 @@
private PartOfSpeech currentPartOfSpeech;
/** Component of: derivative-rule. */
- private Boolean currentRegularity;
+ private PosQualifier currentQualifier;
/** Component of: derivative-rule. */
private List<DerivativeType> currentDerivativeTypeList;
@@ -259,7 +261,7 @@
}
case "derivative-rule": {
this.currentPartOfSpeech = null;
- this.currentRegularity = null;
+ this.currentQualifier = null;
this.currentDerivativeTypeList = new ArrayList<DerivativeType>();
return;
}
@@ -407,57 +409,57 @@
}
case "noun": {
this.currentPartOfSpeech = PartOfSpeech.NOUN;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "pronoun": {
this.currentPartOfSpeech = PartOfSpeech.PRONOUN;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "verb": {
this.currentPartOfSpeech = PartOfSpeech.VERB;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "adjective": {
this.currentPartOfSpeech = PartOfSpeech.ADJECTIVE;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "adverb": {
this.currentPartOfSpeech = PartOfSpeech.ADVERB;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "preposition": {
this.currentPartOfSpeech = PartOfSpeech.PREPOSITION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "conjunction": {
this.currentPartOfSpeech = PartOfSpeech.CONJUNCTION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "article": {
- this.currentPartOfSpeech = PartOfSpeech.ARTICLE;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentPartOfSpeech = PartOfSpeech.DETERMINER;
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "interjection": {
this.currentPartOfSpeech = PartOfSpeech.INTERJECTION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "cardinal": {
this.currentPartOfSpeech = PartOfSpeech.CARDINAL;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
case "ordinal": {
this.currentPartOfSpeech = PartOfSpeech.ORDINAL;
- this.currentRegularity = parseRegularRootAttribute(attributes);
+ this.currentQualifier = PosUtils.parseSingleQualifier(attributes);
return;
}
default: {
@@ -467,14 +469,6 @@
}
}
- private boolean parseRegularRootAttribute(final Attributes attributes) {
- final String value = attributes.getValue("regular-root");
- if (value == null) {
- return false;
- }
- return "true".equals(value);
- }
-
/**
* Parses the "orthography" element.
* @param attributes The raw parsed attributes.
@@ -589,11 +583,11 @@
return;
}
case "derivative-rule": {
- final DerivativeRule rule = new DerivativeRule(this.currentPartOfSpeech, this.currentRegularity,
+ final DerivativeRule rule = new DerivativeRule(this.currentPartOfSpeech, this.currentQualifier,
this.currentDerivativeTypeList);
this.currentDerivativeRuleList.add(rule);
this.currentPartOfSpeech = null;
- this.currentRegularity = null;
+ this.currentQualifier = null;
this.currentDerivativeTypeList = null;
return;
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -299,58 +299,68 @@
}
case "noun": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.NOUN);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularNoun(this.currentPartsOfSpeech);
- }
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.NOUN, attributes);
break;
}
case "pronoun": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.PRONOUN);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.PRONOUN, attributes);
break;
}
case "verb": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.VERB);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularVerb(this.currentPartsOfSpeech);
- }
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.VERB, attributes);
break;
}
case "adjective": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ADJECTIVE);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularAdjective(this.currentPartsOfSpeech);
- }
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.ADJECTIVE, attributes);
break;
}
case "adverb": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ADVERB);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.ADVERB, attributes);
break;
}
case "preposition": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.PREPOSITION);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.PREPOSITION, attributes);
break;
}
case "conjunction": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.CONJUNCTION);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.CONJUNCTION, attributes);
break;
}
- case "article": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ARTICLE);
+ case "determiner": {
+ this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.DETERMINER);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.DETERMINER, attributes);
break;
}
case "interjection": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.INTERJECTION);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.INTERJECTION, attributes);
break;
}
case "cardinal": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.CARDINAL);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.CARDINAL, attributes);
break;
}
case "ordinal": {
this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ORDINAL);
+ this.currentPartsOfSpeech =
+ PosUtils.parseAndEncodeQualifier(this.currentPartsOfSpeech, PartOfSpeech.ORDINAL, attributes);
break;
}
case "word-group": break;
@@ -459,7 +469,7 @@
case "adverb": break;
case "preposition": break;
case "conjunction": break;
- case "article": break;
+ case "determiner": break;
case "interjection": break;
case "cardinal": break;
case "ordinal": break;
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/PosUtilsTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/PosUtilsTests.java 2021-11-08 16:11:54 UTC (rev 12015)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/PosUtilsTests.java 2021-11-08 22:43:35 UTC (rev 12016)
@@ -29,6 +29,7 @@
package org.foray.hyphen;
import org.axsl.hyphen.PartOfSpeech;
+import org.axsl.hyphen.PosQualifier;
import org.junit.Assert;
import org.junit.Test;
@@ -58,24 +59,28 @@
Assert.assertEquals(63, running);
running = PosUtils.encodePosInfo(running, PartOfSpeech.CONJUNCTION);
Assert.assertEquals(127, running);
- running = PosUtils.encodePosInfo(running, PartOfSpeech.ARTICLE);
+ running = PosUtils.encodePosInfo(running, PartOfSpeech.DETERMINER);
Assert.assertEquals(255, running);
running = PosUtils.encodePosInfo(running, PartOfSpeech.INTERJECTION);
Assert.assertEquals(511, running);
- /* Index 9, if used, would add 512. */
- /* Index 10, if used, would add 1024. */
+ running = PosUtils.encodePosInfo(running, PartOfSpeech.CARDINAL);
+ Assert.assertEquals(1_023, running);
+ running = PosUtils.encodePosInfo(running, PartOfSpeech.ORDINAL);
+ Assert.assertEquals(2_047, ...
[truncated message content] |
|
From: <vic...@us...> - 2021-11-08 16:11:57
|
Revision: 12015
http://sourceforge.net/p/foray/code/12015
Author: victormote
Date: 2021-11-08 16:11:54 +0000 (Mon, 08 Nov 2021)
Log Message:
-----------
Normal dictionary editing.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-07 18:48:34 UTC (rev 12014)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-08 16:11:54 UTC (rev 12015)
@@ -11839,13 +11839,13 @@
<w><t>a-wake</t></w>
<w><t>a-wake-a-ble</t></w>
<w><t>a-waked</t></w>
-<w><t>a-wak-en</t></w>
+<w><t>a-wak-en</t><verb regular-root="true"/></w>
<w><t>a-wak-en-a-ble</t></w>
<w><t>a-wak-en-er</t></w>
<w><t>a-wak-en-ing</t></w>
<w><t>a-wak-en-ing-ly</t></w>
<w><t>a-wak-ing</t></w>
-<w><t>a-ward</t></w>
+<w><t>a-ward</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>a-ward wage</t></w>
<w><t>a-ward-a-ble</t></w>
<w><t>a-ward-er</t></w>
@@ -30399,14 +30399,12 @@
<w><t>com-bi-na-to-ri-al</t></w>
<w><t>com-bi-na-to-ri-al a-nal-y-sis</t></w>
<w><t>com-bi-na-to-ry</t></w>
-<w><t>com-bine</t></w>
+<w><t>com-bine</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>com-bine har-ves-ter</t></w>
-<w><t>com-bined</t></w>
<w><t>com-bined-ly</t></w>
<w><t>com-bined-ness</t></w>
<w><t>com-bin-er</t></w>
<w><t>comb-ings</t></w>
-<w><t>com-bin-ing</t></w>
<w><t>com-bin-ing form</t></w>
<w><t>comb-less</t></w>
<w><t>comb-less-ness</t></w>
@@ -38746,7 +38744,7 @@
<w><t>de-duc-tive-ly</t></w>
<w><t>Dee</t></w>
<w><t>dee</t></w>
-<w><t>deed</t></w>
+<w><t>deed</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>deed-less</t></w>
<w><t>dee-jay</t></w>
<w><t>deek</t></w>
@@ -52673,7 +52671,7 @@
<w><t>ex-o-tox-in</t></w>
<w><t>ex-o-tro-pi-a</t></w>
<w><t>exp</t></w>
-<w><t>ex-pand</t></w>
+<w><t>ex-pand</t><verb regular-root="true"/></w>
<w><t>ex-pand-a-bil-i-ty</t></w>
<w><t>ex-pand-a-ble</t></w>
<w><t>ex-pand-ed</t></w>
@@ -61764,9 +61762,9 @@
<w><t>Gie-rek</t></w>
<w><t>Gie-se-king</t></w>
<w><t>Gies-sen</t></w>
-<w><t>gift</t></w>
+<w><t>gift</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>gift=wrap-per</t></w>
-<w><t>gift-ed</t></w>
+<w><t>gift-ed</t><adjective regular-root="false"/></w>
<w><t>gift-ed-ly</t></w>
<w><t>gift-ed-ness</t></w>
<w><t>gift-less</t></w>
@@ -62036,7 +62034,7 @@
<w><t>Giu-lio Ro-ma-no</t></w>
<w><t>Giu-sep-pe</t></w>
<w><t>giu-sto</t></w>
-<w><t>give</t></w>
+<w><t>give</t><noun regular-root="false"/><verb regular-root="false"/></w>
<w><t>give a-way</t></w>
<w><t>give on-to</t></w>
<w><t>give o-ver</t></w>
@@ -62045,6 +62043,7 @@
<w><t>giv-en</t></w>
<w><t>giv-en name</t></w>
<w><t>giv-er</t></w>
+<w><t>gives</t><verb regular-root="false"/></w>
<w><t>giv-ing</t></w>
<w><t>Gi-za</t></w>
<w><t>Gi-zeh</t></w>
@@ -63542,7 +63541,7 @@
<w><t>grab-bler</t></w>
<w><t>gra-ben</t></w>
<w><t>Grac-chus</t></w>
-<w><t>grace</t></w>
+<w><t>grace</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>Grace</t></w>
<w><t>grace=and=fa-vor</t></w>
<w><t>grace=and=fa-vour</t></w>
@@ -92701,7 +92700,7 @@
<w><t>mas-ta-bah</t></w>
<w><t>mas-tax</t></w>
<w><t>mas-tec-to-my</t></w>
-<w><t>mas-ter</t></w>
+<w><t>mas-ter</t><noun regular-root="true"/><verb regular-root="true"/><adjective regular-root="false"/></w>
<w><t>Mas-ter</t></w>
<w><t>mas-ter build-er</t></w>
<w><t>mas-ter cyl-in-der</t></w>
@@ -100107,7 +100106,7 @@
<w><t>nathe-less</t></w>
<w><t>nath-less</t></w>
<w><t>Na-tick</t></w>
-<w><t>na-tion</t></w>
+<w><t>na-tion</t><noun regular-root="true"/></w>
<w><t>na-tion=state</t></w>
<w><t>Na-tion-al</t></w>
<w><t>na-tion-al</t></w>
@@ -123273,7 +123272,7 @@
<w><t>Prax-it-e-les</t></w>
<w><t>Prax-ith-e-a</t></w>
<w><t>pray</t></w>
-<w><t>pray-er</t></w>
+<w><t>prayer</t><noun regular-root="true"/></w>
<w><t>prayer meet-ing</t></w>
<w><t>prayer-ful</t></w>
<w><t>prayer-ful-ly</t></w>
@@ -127854,14 +127853,13 @@
<w><t>pro-mis-cu-ous</t></w>
<w><t>pro-mis-cu-ous-ly</t></w>
<w><t>pro-mis-cu-ous-ness</t></w>
-<w><t>prom-ise</t></w>
-<w><t>prom-ised</t></w>
+<w><t>prom-ise</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>Prom-ised Land</t></w>
<w><t>prom-is-ee</t></w>
<w><t>prom-i-see</t></w>
<w><t>prom-ise-ful</t></w>
<w><t>prom-is-er</t></w>
-<w><t>prom-is-ing</t></w>
+<w><t>prom-is-ing</t><adjective regular-root="false"/></w>
<w><t>prom-i-sor</t></w>
<w><t>prom-is-so-ri-ly</t></w>
<w><t>prom-is-so-ry</t></w>
@@ -140506,7 +140504,7 @@
<w><t>sail-plan-ing</t></w>
<w><t>sain</t></w>
<w><t>sain-foin</t></w>
-<w><t>saint</t></w>
+<w><t>saint</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>Saint Ag-nes's Eve</t></w>
<w><t>Saint Al-bans</t></w>
<w><t>Saint An-tho-ny's Cross</t></w>
@@ -142751,7 +142749,7 @@
<w><t>scrip-tur-al</t></w>
<w><t>scrip-tur-al-ly</t></w>
<w><t>scrip-tur-al-ness</t></w>
-<w><t>scrip-ture</t></w>
+<w><t>scrip-ture</t><noun regular-root="true"/></w>
<w><t>Scrip-ture</t></w>
<w><t>script-writ-er</t></w>
<w><t>script-writ-ing</t></w>
@@ -152577,7 +152575,7 @@
<w><t>staph-y-lot-o-my</t></w>
<w><t>sta-ple</t></w>
<w><t>sta-pler</t></w>
-<w><t>star</t></w>
+<w><t>star</t><noun regular-root="true"/><verb regular-root="false"/></w>
<w><t>Star Cham-ber</t></w>
<w><t>star con-nec-tion</t></w>
<w><t>Star of Beth-le-hem</t></w>
@@ -152633,6 +152631,7 @@
<w><t>star-ring</t></w>
<w><t>star-ry</t></w>
<w><t>star-ry=eyed</t></w>
+<w><t>stars</t><verb regular-root="false"/></w>
<w><t>stars=of=Beth-le-hem</t></w>
<w><t>stars=of=Je-ru-sa-lem</t></w>
<w><t>start</t></w>
@@ -159440,7 +159439,7 @@
<w><t>Tak-a-mat-su</t></w>
<w><t>Ta-ka-ma-tsu</t></w>
<w><t>Ta-kao</t></w>
-<w><t>take</t></w>
+<w><t>take</t><noun regular-root="true"/><verb regular-root="false"/></w>
<w><t>take a-back</t></w>
<w><t>take af-ter</t></w>
<w><t>take a-part</t></w>
@@ -159455,6 +159454,7 @@
<w><t>take-o-ver</t></w>
<w><t>tak-er</t></w>
<w><t>tak-er=in</t></w>
+<w><t>takes</t><verb regular-root="false"/></w>
<w><t>tak-in</t></w>
<w><t>tak-ing</t></w>
<w><t>Ta-ko-ra-di</t></w>
@@ -164954,7 +164954,7 @@
<w><t>tri-ad-i-cal-ly</t></w>
<w><t>tri-ad-ism</t></w>
<w><t>tri-age</t></w>
-<w><t>tri-al</t></w>
+<w><t>tri-al</t><noun regular-root="true"/></w>
<w><t>tri-al and er-ror</t></w>
<w><t>tri-al bal-ance</t></w>
<w><t>tri-al bal-loon</t></w>
@@ -181042,7 +181042,7 @@
<w><t>Vi-et-nam-ese</t></w>
<w><t>Vi-et-nam-i-sa-tion</t></w>
<w><t>Vi-et-nam-i-za-tion</t></w>
-<w><t>view</t></w>
+<w><t>view</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>view hal-loo</t></w>
<w><t>view-a-ble</t></w>
<w><t>view-er</t></w>
@@ -185423,7 +185423,8 @@
<w><t>Wol-ver-hamp-ton</t></w>
<w><t>wol-ver-ine</t></w>
<w><t>wolves</t></w>
-<w><t>wom-an</t></w>
+<w><t>wom-an</t><noun regular-root="false"/></w>
+<w><t>wom-an’s</t><adjective regular-root="false"/></w>
<w><t>wom-an=chas-er</t></w>
<w><t>wom-an=hat-er</t></w>
<w><t>wom-an-hood</t></w>
@@ -185456,7 +185457,7 @@
<w><t>won</t></w>
<w><t>won't</t></w>
<w><t>Won-der</t></w>
-<w><t>won-der</t></w>
+<w><t>won-der</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>won-der=strick-en</t></w>
<w><t>won-der-ber-ry</t></w>
<w><t>won-der-er</t></w>
@@ -185707,7 +185708,7 @@
<w><t>work-wom-an</t></w>
<w><t>work-wom-en</t></w>
<w><t>Wor-land</t></w>
-<w><t>world</t></w>
+<w><t>world</t><noun regular-root="true"/></w>
<w><t>World Health Or-gan-i-za-tion</t></w>
<w><t>world lan-guage</t></w>
<w><t>world pow-er</t></w>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-07 18:48:37
|
Revision: 12014
http://sourceforge.net/p/foray/code/12014
Author: victormote
Date: 2021-11-07 18:48:34 +0000 (Sun, 07 Nov 2021)
Log Message:
-----------
Handle zero-length words.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-06 21:04:14 UTC (rev 12013)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-07 18:48:34 UTC (rev 12014)
@@ -263,6 +263,9 @@
@Override
public boolean isValidWord(final CharSequence wordChars, final PartOfSpeech pos,
final List<Dictionary> adhocDictionaries) {
+ if (wordChars.length() < 1) {
+ return false;
+ }
/* 1. Check exact matches in adhoc dictionaries. */
if (adhocDictionaries != null) {
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 21:04:14 UTC (rev 12013)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-07 18:48:34 UTC (rev 12014)
@@ -138,7 +138,7 @@
/** The input source to be spell-checked. */
private InputSource input;
- /** The output stream to which the pretty-printed output should be sent. */
+ /** The output stream to which the output should be sent. */
private PrintStream output;
/** The locator instance for identifying the document, line, and column
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java 2021-11-06 21:04:14 UTC (rev 12013)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java 2021-11-07 18:48:34 UTC (rev 12014)
@@ -78,7 +78,7 @@
/** The input stream containing the list of words being reported. */
private BufferedReader input;
- /** The output stream to which the pretty-printed output should be sent. */
+ /** The output stream to which the output should be sent. */
private PrintStream output;
/** The logger. */
@@ -140,6 +140,9 @@
}
public void processWord(final String word) {
+ if (word.length() < 1) {
+ return;
+ }
if (this.currentOrthographyConfig.isValidWord(word, null, this.currentDictionaries)) {
this.output.println("Found: " + word);
return;
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 21:04:16
|
Revision: 12013
http://sourceforge.net/p/foray/code/12013
Author: victormote
Date: 2021-11-06 21:04:14 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Add existing orthography config to code base.
Added Paths:
-----------
trunk/foray/foray-hyphen/src/main/data/orthographies/
trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
Added: trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml (rev 0)
+++ trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml 2021-11-06 21:04:14 UTC (rev 12013)
@@ -0,0 +1,177 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!DOCTYPE axsl-orthography-config
+ PUBLIC "-//aXSL//DTD Orthography Configuration V0.1//EN"
+ "http://www.axsl.org/dtds/0.1/en/axsl-orthography-config.dtd">
+
+<axsl-orthography-config>
+
+ <match-rule-list id="eng-999-match-rules">
+ <match desc="Arabic digits">^[0-9]+$</match>
+ <match desc="Uppercase Roman numerals">^[IVXLCDM]+$</match>
+ <match desc="Lowercase Roman numerals">^[ivxlcdm]+$</match>
+ <match desc="Currency">^[$£][0-9]+[0-9,\.]*$</match>
+ <match desc="English ordinal ending in 1">^[0-9]*1st$</match>
+ <match desc="English ordinal ending in 2">^[0-9]*2n?d$</match>
+ <match desc="English ordinal ending in 3">^[0-9]*3r?d$"</match>
+ <match desc="English ordinal ending in 0 or 4 thru 9">^[0-9]*[04-9]th$</match>
+ <match desc="A single capital letter, such as a person's initial">^[A-Z]$</match>
+ </match-rule-list>
+
+ <derivative-pattern-list id="eng-999-derivative-patterns">
+ <derivative-pattern desc="ends with /’s/">
+ <match>^([a-zA-Z\-]+)’s$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <noun regular-root="true"/>
+ <derivative-type type="possessive"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-ies/, stem ends with /y/">
+ <match>^([a-zA-Z\-]+)ies$</match>
+ <replace>$1y</replace>
+ <derivative-rule>
+ <noun regular-root="true"/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="3rd person singular, present tense"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-ied/, stem ends with /y/">
+ <match>^([a-zA-Z\-]+)ied$</match>
+ <replace>$1y</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="past tense"/>
+ <derivative-type type="past-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-es/, stem ends with s-like sound">
+ <match>^([a-zA-Z\-]+)([sxz]|sh|ch)es$</match>
+ <replace>$1$2</replace>
+ <derivative-rule>
+ <noun regular-root="true"/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="3rd person singular present tense"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-s/">
+ <match>^([a-zA-Z\-]+)s$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <noun regular-root="true"/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
+ <derivative-rule>
+ <ordinal/>
+ <derivative-type type="plural"/>
+ </derivative-rule>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="3rd person singular present tense"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-ed/">
+ <match>^([a-zA-Z\-]+)ed$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="past tense"/>
+ <derivative-type type="past-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-d/, stem ends with silent /-e/">
+ <match>^([a-zA-Z\-]+e)d$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="verb-form" desc="past tense"/>
+ <derivative-type type="past-participle" desc="past tense"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-ing/, stem ends with silent /e/">
+ <match>^([a-zA-Z\-]+)ing$</match>
+ <replace>$1e</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="present-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-ing/">
+ <match>^([a-zA-Z\-]+)ing$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <verb regular-root="true"/>
+ <derivative-type type="present-participle"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-er/">
+ <match>^([a-zA-Z\-]+)er$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <adjective regular-root="true"/>
+ <derivative-type type="comparative" desc="single-syllable root"/>
+ </derivative-rule>
+ </derivative-pattern>
+ <derivative-pattern desc="ends with /-est/">
+ <match>^([a-zA-Z\-]+)est$</match>
+ <replace>$1</replace>
+ <derivative-rule>
+ <adjective regular-root="true"/>
+ <derivative-type type="superlative" desc="single-syllable root"/>
+ </derivative-rule>
+ </derivative-pattern>
+ </derivative-pattern-list>
+
+ <derivative-factory-list id="eng-999-derivatives">
+ <derivative-factory class="org.foray.hyphen.wrapper.LatinPlural1WordFactory"/>
+ <derivative-factory class="org.foray.hyphen.wrapper.LatinPlural2WordFactory"/>
+ <derivative-factory class="org.foray.hyphen.wrapper.LatinPossessive1WordFactory"/>
+ <derivative-factory class="org.foray.hyphen.wrapper.LatinPossessive2WordFactory"/>
+ </derivative-factory-list>
+
+ <dictionary-resource id="dictionary-eng-moby">
+ <parsed-resource>
+ <resource-location type="classpath">/resources/org/foray/dictionaries/en-moby.jbso</resource-location>
+ </parsed-resource>
+ <parsed-resource>
+ <resource-location type="url">file:///C:/vic/foray/trunk/foray/foray-hyphen/src/main/resources/resources/org/foray/dictionaries/eng-999-Latn.dict.jbso</resource-location>
+ </parsed-resource>
+ <unparsed-dictionary>
+ <dictionary-element>
+ <resource-location type="url">file:///C:/vic/foray/trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml</resource-location>
+ </dictionary-element>
+ </unparsed-dictionary>
+ </dictionary-resource>
+
+ <hyphenation-patterns-resource id="hyph-patterns-eng">
+ <parsed-resource>
+ <resource-location type="classpath">/resources/org/foray/hyphen/patterns/eng.jbso</resource-location>
+ </parsed-resource>
+ <parsed-resource>
+ <resource-location type="url">file:///C:/vic/foray/trunk/foray/foray-hyphen/src/main/resources/resources/org/foray/hyphen/patterns/eng.jbso</resource-location>
+ </parsed-resource>
+ <unparsed-hyphenation-patterns>
+ <resource-location type="url">file:///C:/vic/foray/trunk/foray/foray-hyphen/src/main/data/hyph-patterns/eng.xml</resource-location>
+ </unparsed-hyphenation-patterns>
+ </hyphenation-patterns-resource>
+
+ <configuration>
+ <match-rules reference="eng-999-match-rules"/>
+ <derivative-rules reference="eng-999-derivative-patterns"/>
+ <dictionary reference="dictionary-eng-moby"/>
+ <hyphenation-patterns reference="hyph-patterns-eng"/>
+ <derivative-factories reference="eng-999-derivatives"/>
+ <word-breaker class="org.foray.hyphen.WordBreakerLatin1"/>
+ <orthography language-iso-3char="eng" country-iso-3char="USA" script-iso-4char="Latn"/>
+ <orthography language-iso-3char="eng" country-iso-3char="USA" script-iso-4char="Zyyy"/>
+ <orthography language-iso-3char="eng" country-iso-3char="999" script-iso-4char="Latn"/>
+ <orthography language-iso-3char="eng" country-iso-3char="999" script-iso-4char="Zyyy"/>
+ </configuration>
+
+</axsl-orthography-config>
Property changes on: trunk/foray/foray-hyphen/src/main/data/orthographies/foray-orthography-config.xml
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Rev
\ No newline at end of property
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 20:52:10
|
Revision: 12012
http://sourceforge.net/p/foray/code/12012
Author: victormote
Date: 2021-11-06 20:52:07 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Normal dictionary editing.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-06 20:51:46 UTC (rev 12011)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-06 20:52:07 UTC (rev 12012)
@@ -45,6 +45,12 @@
# case-insensitive alphabetical order (actually Unicode code point order):
# 1) alphabetical to assist human authors and users, and 2) case-insensitive to
# keep similar words together, to clarify the effects of these similar words.
+#
+# No effort has (yet) been made by FOray to systematically correct errors or
+# omissions in this dictionary.
+# Use with caution.
+# If you detect an error or omissions, please either post a bug on the FOray
+# web site, or submit a patch request.
-->
<w><t>a</t></w>
@@ -4581,7 +4587,7 @@
<w><t>Al-tair</t></w>
<w><t>Al-ta-ir</t></w>
<w><t>Al-ta-mi-ra</t></w>
-<w><t>al-tar</t></w>
+<w><t>al-tar</t><noun regular-root="true"/></w>
<w><t>al-tar boy</t></w>
<w><t>al-tar-age</t></w>
<w><t>al-tar-piece</t></w>
@@ -6114,7 +6120,7 @@
<w><t>An-garsk</t></w>
<w><t>an-ga-ry</t></w>
<w><t>an-ge-kok</t></w>
-<w><t>an-gel</t></w>
+<w><t>an-gel</t><noun regular-root="true"/></w>
<w><t>An-gel</t></w>
<w><t>an-gel cake</t></w>
<w><t>An-gel Falls</t></w>
@@ -14241,7 +14247,7 @@
<w><t>beak-like</t></w>
<w><t>beak-y</t></w>
<w><t>Beal</t></w>
-<w><t>beam</t></w>
+<w><t>beam</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>beam aer-i-al</t></w>
<w><t>beam com-pass</t></w>
<w><t>beam rid-ing</t></w>
@@ -14400,7 +14406,7 @@
<w><t>Beb-ry-ces</t></w>
<w><t>be-calm</t></w>
<w><t>be-calmed</t></w>
-<w><t>be-came</t></w>
+<w><t>be-came</t><verb regular-root="false"/></w>
<w><t>be-cause</t></w>
<w><t>bec-ca-fi-co</t></w>
<w><t>be-chance</t></w>
@@ -14428,9 +14434,10 @@
<w><t>Beck-y</t></w>
<w><t>be-clasp</t></w>
<w><t>be-cloud</t></w>
-<w><t>be-come</t></w>
-<w><t>be-com-ing</t></w>
-<w><t>be-com-ing-ly</t></w>
+<w><t>be-come</t><verb regular-root="false"/></w>
+<w><t>be-comes</t><verb regular-root="false"/></w>
+<w><t>be-com-ing</t><verb regular-root="false"/><adjective/></w>
+<w><t>be-com-ing-ly</t><adverb/></w>
<w><t>be-com-ing-ness</t></w>
<w><t>Bec-que-rel</t></w>
<w><t>be-crawl</t></w>
@@ -20895,7 +20902,7 @@
<w><t>Bur-mese</t></w>
<w><t>Bur-mese cat</t></w>
<w><t>bur-mite</t></w>
-<w><t>burn</t></w>
+<w><t>burn</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>burn=up</t></w>
<w><t>burn-a-ble</t></w>
<w><t>burned</t></w>
@@ -40451,8 +40458,8 @@
<w><t>de-sen-si-tized</t></w>
<w><t>de-sen-si-tiz-er</t></w>
<w><t>de-sen-si-tiz-ing</t></w>
-<w><t>des-ert</t></w>
-<w><t>de-sert</t></w>
+<w><t>des-ert</t><noun regular-root="true"/></w>
+<w><t>de-sert</t><noun/><verb regular-root="true"/></w>
<w><t>des-ert boots</t></w>
<w><t>des-ert cool-er</t></w>
<w><t>des-ert is-land</t></w>
@@ -40460,7 +40467,7 @@
<w><t>des-ert pea</t></w>
<w><t>des-ert rat</t></w>
<w><t>des-ert soil</t></w>
-<w><t>de-sert-ed</t></w>
+<w><t>de-sert-ed</t><adjective/></w>
<w><t>de-sert-ed-ly</t></w>
<w><t>de-sert-ed-ness</t></w>
<w><t>de-sert-er</t></w>
@@ -61043,7 +61050,7 @@
<w><t>gen-er-ate</t></w>
<w><t>gen-er-at-ed</t></w>
<w><t>gen-er-at-ing</t></w>
-<w><t>gen-er-a-tion</t></w>
+<w><t>gen-er-a-tion</t><noun regular-root="true"/></w>
<w><t>gen-er-a-tion gap</t></w>
<w><t>gen-er-a-tive</t></w>
<w><t>gen-er-a-tive gram-mar</t></w>
@@ -62513,7 +62520,7 @@
<w><t>Glov-er</t></w>
<w><t>Glov-ers-ville</t></w>
<w><t>glov-ing</t></w>
-<w><t>glow</t></w>
+<w><t>glow</t><noun regular-root="false"/><verb regular-root="true"/></w>
<w><t>glow dis-charge</t></w>
<w><t>glow-er</t></w>
<w><t>glow-er-ing-ly</t></w>
@@ -94478,7 +94485,7 @@
<w><t>Mes-se-ni-a</t></w>
<w><t>Mes-ser-schmitt</t></w>
<w><t>Mes-siaen</t></w>
-<w><t>mes-si-ah</t></w>
+<w><t>mes-si-ah</t><noun regular-root="true"/></w>
<w><t>mes-si-an-ic</t></w>
<w><t>Mes-si-an-i-cal-ly</t></w>
<w><t>Mes-si-dor</t></w>
@@ -96554,7 +96561,7 @@
<w><t>mis-sil-ry</t></w>
<w><t>mis-sing</t></w>
<w><t>mis-sing link</t></w>
-<w><t>mis-sion</t></w>
+<w><t>mis-sion</t><noun regular-root="true"/></w>
<w><t>Mis-sion</t></w>
<w><t>mis-sion-ar-ies</t></w>
<w><t>mis-sion-ar-y</t></w>
@@ -127679,8 +127686,8 @@
<w><t>pro-gram-ming lan-guage</t></w>
<w><t>pro-grav-id</t></w>
<w><t>Pro-gre-so</t></w>
-<w><t>prog-ress</t></w>
-<w><t>pro-gress</t></w>
+<w><t>prog-ress</t><noun/></w>
+<w><t>pro-gress</t><verb regular-root="true"/></w>
<w><t>pro-gres-sion</t></w>
<w><t>pro-gres-sion-al</t></w>
<w><t>pro-gres-sion-al-ly</t></w>
@@ -136986,7 +136993,7 @@
<w><t>re-sprung</t></w>
<w><t>re-squan-der</t></w>
<w><t>res-sen-ti-ment</t></w>
-<w><t>rest</t></w>
+<w><t>rest</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>re-stab</t></w>
<w><t>re-stabbed</t></w>
<w><t>re-stab-bing</t></w>
@@ -149263,7 +149270,7 @@
<w><t>snarl-ing-ly</t></w>
<w><t>snarl-y</t></w>
<w><t>snash</t></w>
-<w><t>snatch</t></w>
+<w><t>snatch</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>snatch-a-ble</t></w>
<w><t>snatch-er</t></w>
<w><t>snatch-i-er</t></w>
@@ -153651,7 +153658,7 @@
<w><t>stomp-er</t></w>
<w><t>stomp-ing-ly</t></w>
<w><t>ston-a-ble</t></w>
-<w><t>stone</t></w>
+<w><t>stone</t><noun regular-root="true"/><verb regular-root="true"/></w>
<w><t>Stone</t></w>
<w><t>stone boil-ing</t></w>
<w><t>stone bram-ble</t></w>
@@ -157967,12 +157974,12 @@
<w><t>sur-plus-age</t></w>
<w><t>sur-print</t></w>
<w><t>sur-pris-al</t></w>
-<w><t>sur-prise</t></w>
-<w><t>sur-prised</t></w>
+<w><t>sur-prise</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>sur-prised</t><adjective/></w>
<w><t>sur-pris-ed-ly</t></w>
<w><t>sur-pris-er</t></w>
-<w><t>sur-pris-ing</t></w>
-<w><t>sur-pris-ing-ly</t></w>
+<w><t>sur-pris-ing</t><adjective/></w>
+<w><t>sur-pris-ing-ly</t><adverb/></w>
<w><t>sur-pris-ing-ness</t></w>
<w><t>sur-ra</t></w>
<w><t>sur-re-al</t></w>
@@ -186316,7 +186323,7 @@
<w><t>yeal-ing</t></w>
<w><t>yean</t></w>
<w><t>yean-ling</t></w>
-<w><t>year</t></w>
+<w><t>year</t><noun regular-root="true"/></w>
<w><t>year=a-round</t></w>
<w><t>year-book</t></w>
<w><t>year-ling</t></w>
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 20:51:49
|
Revision: 12011
http://sourceforge.net/p/foray/code/12011
Author: victormote
Date: 2021-11-06 20:51:46 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Add utility class to check a list of words.
Added Paths:
-----------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java
Added: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java (rev 0)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java 2021-11-06 20:51:46 UTC (rev 12011)
@@ -0,0 +1,225 @@
+/*
+ * Copyright 2021 The FOray Project.
+ * http://www.foray.org
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This work is in part derived from the following work(s), used with the
+ * permission of the licensor:
+ * Apache FOP, licensed by the Apache Software Foundation
+ *
+ */
+
+/*
+ * $LastChangedRevision$
+ * $LastChangedDate$
+ * $LastChangedBy$
+ */
+
+package org.foray.hyphen.util;
+
+import org.foray.common.i18n.Orthography4a;
+import org.foray.hyphen.HyphenationServer4a;
+import org.foray.hyphen.HyphenationServerConfig;
+import org.foray.hyphen.OrthographyConfig4a;
+import org.foray.hyphen.SegmentDictionary;
+
+import org.axsl.hyphen.Dictionary;
+import org.axsl.hyphen.HyphenationException;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.DefaultParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+
+import java.io.BufferedReader;
+import java.io.FileReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.PrintStream;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.util.ArrayList;
+import java.util.List;
+
+import javax.xml.parsers.ParserConfigurationException;
+
+/**
+ * For a list of words, reports information about each word.
+ */
+public class WordChecker {
+
+ /** Command-line status constant indicating that the command line itself was not properly formed. */
+ public static final byte STATUS_COMMAND_LINE_ERROR = 1;
+
+ /** Command-line return status constant indicating that a file was not found. */
+ public static final byte STATUS_FILE_NOT_FOUND = 2;
+
+ /** Command-line return status constant indicating that there was an error parsing the input file. */
+ public static final byte STATUS_PARSING_ERROR = 3;
+
+ /** The input stream containing the list of words being reported. */
+ private BufferedReader input;
+
+ /** The output stream to which the pretty-printed output should be sent. */
+ private PrintStream output;
+
+ /** The logger. */
+// private Logger logger = LoggerFactory.getLogger(WordChecker.class);
+
+ /** The current orthography configuration. */
+ private OrthographyConfig4a currentOrthographyConfig;
+
+ /** The Hyphenation server. */
+ private HyphenationServer4a server;
+
+ /** The list of dictionaries that are currently active, i.e. that match the current orthography. */
+ private List<Dictionary> currentDictionaries = new ArrayList<Dictionary>();
+
+ /**
+ * Constructor.
+ * @param input The input source encapsulating the document to be spell-checked.
+ * @param output The output stream to which the spelling errors should be written.
+ * @param orthographyConfigPath The path to the orthography configuration.
+ * @param adhocDictionaryPaths (optional) List of paths to ad-hoc dictionaries to be used by the spell-checker.
+ * This is useful for cases where a document has words that are not found in standard dictionaries.
+ * This can be null.
+ * @throws HyphenationException For errors during configuration of the hyphenation server.
+ * @throws IOException For input/output errors during parsing.
+ * @throws ParserConfigurationException For errors configuring the parser.
+ * @throws SAXException For errors found by the SAX parser.
+ */
+ public WordChecker(final BufferedReader input, final PrintStream output, final URL orthographyConfigPath,
+ final List<URL> adhocDictionaryPaths)
+ throws HyphenationException, IOException, SAXException, ParserConfigurationException {
+ this.input = input;
+ this.output = output;
+
+ final HyphenationServerConfig serverConfig = new HyphenationServerConfig();
+ serverConfig.setOrthographyConfigurationLocation(orthographyConfigPath);
+ this.server = new HyphenationServer4a(serverConfig);
+ /* Remove hard-coding. */
+ final Orthography4a orthography = Orthography4a.find("eng", "USA", "Latn");
+ this.currentOrthographyConfig = this.server.getOrthographyConfig(orthography);
+
+ if (adhocDictionaryPaths != null) {
+ for (URL adhocDictionaryPath : adhocDictionaryPaths) {
+ final DictionaryParserXml dictParser = new DictionaryParserXml();
+ dictParser.setLogDictionaryProblems(true);
+ final InputStream dictInput = adhocDictionaryPath.openStream();
+ final InputSource source = new InputSource(dictInput);
+ final List<SegmentDictionary> dictionaries = dictParser.parse(source, adhocDictionaryPath.toString());
+ this.currentDictionaries.addAll(dictionaries);
+ }
+ }
+ }
+
+ public void start() throws SAXException, ParserConfigurationException, IOException {
+ String line = this.input.readLine();
+ while (line != null) {
+ processWord(line);
+ line = this.input.readLine();
+ }
+ }
+
+ public void processWord(final String word) {
+ if (this.currentOrthographyConfig.isValidWord(word, null, this.currentDictionaries)) {
+ this.output.println("Found: " + word);
+ return;
+ } else {
+ this.output.println("Not Found: " + word);
+ }
+ }
+
+ /**
+ * Returns the command-line options for the {@link #main(String[])} method.
+ * @return Command-line options.
+ */
+ private static Options getCommandLineOptions() {
+ final Options clOptions = new Options();
+ final Option input = new Option("i", "input", true, "path to the input file");
+ input.setRequired(true);
+ final Option config = new Option("c", "config", true, "path to orthography configuration");
+ final Option dictionaries = new Option("d", "dictionary", true, "path(s) to ad-hoc dictionary(ies)");
+ dictionaries.setArgs(Option.UNLIMITED_VALUES);
+
+ clOptions.addOption(input);
+ clOptions.addOption(config);
+ clOptions.addOption(dictionaries);
+ return clOptions;
+ }
+
+ /**
+ * Command line interface.
+ * @param args The command-line arguments. There are two:
+ * <ol>
+ * <li>--input [input file path]</li>
+ * <li>--dict [dictionary file directory]</li>
+ * </ol>
+ */
+ public static void main(final String[] args) {
+ final Logger logger = LoggerFactory.getLogger(WordChecker.class);
+
+ final Options commandLineOptions = WordChecker.getCommandLineOptions();
+ final CommandLineParser commandLineParser = new DefaultParser();
+ CommandLine parsedCommandLine = null;
+ try {
+ parsedCommandLine = commandLineParser.parse(commandLineOptions, args);
+ } catch (final ParseException e) {
+ logger.error(e.getMessage(), e);
+ final HelpFormatter helpFormatter = new HelpFormatter();
+ helpFormatter.printHelp("java -cp $FORAY_CLASSPATH " + WordChecker.class.getName(), commandLineOptions,
+ true);
+ /* CheckStyle: Allow System.exit() in main method. */
+ System.exit(WordChecker.STATUS_COMMAND_LINE_ERROR);
+ }
+
+ final String input = parsedCommandLine.getOptionValue("input");
+ final String config = parsedCommandLine.getOptionValue("config");
+ final String[] dictionariesArray = parsedCommandLine.getOptionValues("dictionary");
+
+ List<URL> dictionaries = null;
+ try {
+ dictionaries = null;
+ if (dictionariesArray != null) {
+ dictionaries = new ArrayList<URL>();
+ for (String path : dictionariesArray) {
+ final URL dictionary = new URL(path);
+ dictionaries.add(dictionary);
+ }
+ }
+ } catch (final MalformedURLException e) {
+ /* CheckStyle: Allow System.exit() in main method. */
+ System.exit(WordChecker.STATUS_FILE_NOT_FOUND);
+ }
+
+ try (FileReader fis = new FileReader(input);
+ BufferedReader reader = new BufferedReader(fis)) {
+ final URL orthographyConfigPath = new URL(config);
+ final WordChecker checker = new WordChecker(reader, System.out, orthographyConfigPath, dictionaries);
+ checker.start();
+ } catch (IOException | HyphenationException | SAXException | ParserConfigurationException e) {
+ logger.error("File cannot be opened for input: " + input, e);
+ /* CheckStyle: Allow System.exit() in main method. */
+ System.exit(WordChecker.STATUS_FILE_NOT_FOUND);
+ }
+ }
+
+}
Property changes on: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/WordChecker.java
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Rev
\ No newline at end of property
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 19:40:57
|
Revision: 12010
http://sourceforge.net/p/foray/code/12010
Author: victormote
Date: 2021-11-06 19:40:56 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Remove remaining extra-orthography logic from spell checker.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 18:35:50 UTC (rev 12009)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 19:40:56 UTC (rev 12010)
@@ -42,7 +42,6 @@
import org.axsl.common.i18n.Orthography;
import org.axsl.hyphen.Dictionary;
import org.axsl.hyphen.HyphenationException;
-import org.axsl.hyphen.Word;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.CommandLineParser;
@@ -461,45 +460,10 @@
return;
}
- final SegmentDictionary dictionary = this.currentOrthographyConfig.getDictionary();
- if (! wordFound(dictionary, word)) {
- this.output.println("Not found: " + word + " " + locationString());
- this.notFoundCounter ++;
- }
+ this.output.println("Not found: " + word + " " + locationString());
+ this.notFoundCounter ++;
}
- /**
- * Indicates whether a given word should be marked as misspelled, i.e. cannot be found in the appropriate
- * dictionary(s) and cannot be accepted as a legitimate word any other way.
- * @param dictionary The dictionary in which {@code word} is being sought.
- * @param word The word being tested.
- * @return True if and only if {@code word} is found in {@code dict} or can be accepted as a legitimate word for
- * some other reason.
- */
- private boolean wordFound(final Dictionary dictionary, final CharSequence word) {
- /* TODO: This is all eng-us-lat specific. Need to think about how plurals and other almost-the-same cases should
- * be handled. */
-
- Word dictWord = null;
- final StringBuilder builder = new StringBuilder(word);
-
- /* If the last character is a lowercase "s" or "'s" (apostrophe "s"), this may be a plural noun, a
- * present-tense verb, or a possessive. Remove them and try again. */
- if (builder.charAt(builder.length() - 1) == 's') {
- builder.deleteCharAt(builder.length() - 1);
- if (builder.length() > 0
- && builder.charAt(builder.length() - 1) == '’') {
- builder.deleteCharAt(builder.length() - 1);
- }
- dictWord = dictionary.getWord(builder, 0);
- }
- if (dictWord != null) {
- return true;
- }
-
- return false;
- }
-
@Override
public void characters(final char[] buffer, final int offset, final int length) {
this.charBuffer.append(buffer, offset, length);
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 18:35:53
|
Revision: 12009
http://sourceforge.net/p/foray/code/12009
Author: victormote
Date: 2021-11-06 18:35:50 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Remove capitalization adjustment. This is now handled in the orthography config.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 17:27:10 UTC (rev 12008)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 18:35:50 UTC (rev 12009)
@@ -483,15 +483,6 @@
Word dictWord = null;
final StringBuilder builder = new StringBuilder(word);
- /* If the first character is capitalized, convert to lowercase & check again. */
- if (Character.isUpperCase(builder.charAt(0))) {
- builder.setCharAt(0, Character.toLowerCase(builder.charAt(0)));
- dictWord = dictionary.getWord(builder, 0);
- }
- if (dictWord != null) {
- return true;
- }
-
/* If the last character is a lowercase "s" or "'s" (apostrophe "s"), this may be a plural noun, a
* present-tense verb, or a possessive. Remove them and try again. */
if (builder.charAt(builder.length() - 1) == 's') {
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 17:27:12
|
Revision: 12008
http://sourceforge.net/p/foray/code/12008
Author: victormote
Date: 2021-11-06 17:27:10 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Handle the case of an ambiguous word.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java 2021-11-06 17:16:17 UTC (rev 12007)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java 2021-11-06 17:27:10 UTC (rev 12008)
@@ -134,12 +134,18 @@
if (root == null) {
return null;
}
- final Word word = dictionary.getWord(root, 0);
- if (word == null) {
- return null;
+
+ final int qtyAlternatives = dictionary.qtyAlternatives(root);
+ for (int index = 0; index < qtyAlternatives; index ++) {
+ final Word word = dictionary.getWord(root, index);
+ if (word != null) {
+ final DerivativeRule rule = findFirstApplicableRule(word);
+ if (rule != null) {
+ return rule;
+ }
+ }
}
- return findFirstApplicableRule(word);
-
+ return null;
}
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 17:16:19
|
Revision: 12007
http://sourceforge.net/p/foray/code/12007
Author: victormote
Date: 2021-11-06 17:16:17 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Move some spell-checking logic from the orthography to the DerivativePattern.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/DerivativeRuleTests.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java 2021-11-06 15:40:34 UTC (rev 12006)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DerivativePattern.java 2021-11-06 17:16:17 UTC (rev 12007)
@@ -28,6 +28,7 @@
package org.foray.hyphen;
+import org.axsl.hyphen.Dictionary;
import org.axsl.hyphen.Word;
import java.util.List;
@@ -65,21 +66,50 @@
}
/**
+ * Indicates whether this pattern matches a given input.
+ * @param inputWord The word being tested.
+ * @return True if and only if this pattern matches {@code inputWord}.
+ */
+ public boolean doesPatternMatch(final CharSequence inputWord) {
+ return this.match.matcher(inputWord).matches();
+ }
+
+ /**
* Applies the match and replace patterns to an input word, and returns the computed root of that input word if the
* rule applies.
* @param inputWord The input word being tested.
* @return The root, if any, indicated by this rule, for {@code inputWord}, or null if there is no match.
*/
- public CharSequence applyRule(final CharSequence inputWord) {
- return inputWord.toString().replaceAll(this.match.pattern(), replace);
+ public CharSequence getRoot(final CharSequence inputWord) {
+ if (doesPatternMatch(inputWord)) {
+ return inputWord.toString().replaceAll(this.match.pattern(), replace);
+ }
+ return null;
}
/**
+ * Returns the number of derivative rules that are attached to this pattern.
+ * @return The number of derivative rules that are attached to this pattern.
+ */
+ public int qtyRules() {
+ return this.rules.size();
+ }
+
+ /**
+ * Returns the rule at a given index.
+ * @param index The index to the rule to be returned.
+ * @return The rule at index {@code index}.
+ */
+ public DerivativeRule getRule(final int index) {
+ return this.rules.get(index);
+ }
+
+ /**
* Checks the rules that are attached to this pattern, and returns the first one that matches.
* @param word The word to be tested.
* @return The first rule in the pattern that matches {@code word}, or null if none match.
*/
- DerivativeRule doesRulyApply(final Word word) {
+ public DerivativeRule findFirstApplicableRule(final Word word) {
for (int index = 0; index < this.rules.size(); index ++) {
final DerivativeRule rule = this.rules.get(index);
if (rule.matches(word)) {
@@ -89,4 +119,27 @@
return null;
}
+ /**
+ * Finds the first rule from this pattern, if any, that applies to a given word.
+ * @param wordChars The word to be tested.
+ * @param dictionary The dictionary in which the root of {@code wordChars} will be sought.
+ * @return The first rule from this pattern, that applies to {@code wordChars}, or null if the pattern does not
+ * match or the rules do not apply.
+ */
+ public DerivativeRule findFirstApplicableRule(final CharSequence wordChars, final Dictionary dictionary) {
+ if (! doesPatternMatch(wordChars)) {
+ return null;
+ }
+ final String root = getRoot(wordChars).toString();
+ if (root == null) {
+ return null;
+ }
+ final Word word = dictionary.getWord(root, 0);
+ if (word == null) {
+ return null;
+ }
+ return findFirstApplicableRule(word);
+
+ }
+
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-06 15:40:34 UTC (rev 12006)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-06 17:16:17 UTC (rev 12007)
@@ -68,10 +68,10 @@
private HyphenationServer4a server;
/* TODO: Following orthography-specific config needs to be moved to XML or subclass. */
- /** Character marking a compound word. */
+ /** Character delimiting a compound word. NB: This variable may be orthography specific, and may therefore need to
+ * be moved to the orthography configuration. However, we have found no evidence yet for that need. */
private char compoundWordMarker = '-';
- /* TODO: Following orthography-specific config needs to be moved to XML or subclass. */
/** Regex pattern used to break compound words into their components. */
private Pattern compoundWordBreaker = Pattern.compile(Character.toString(compoundWordMarker));
@@ -339,12 +339,8 @@
final List<DerivativePattern> patternList = this.server.getDerivativePatterns(ruleListKey);
for (int patternIndex = 0; patternIndex < patternList.size(); patternIndex ++) {
final DerivativePattern pattern = patternList.get(patternIndex);
- final String root = pattern.applyRule(wordChars).toString();
- final Word word = dictionary.getWord(root, 0);
- if (word != null) {
- if (pattern.doesRulyApply(word) != null) {
- return true;
- }
+ if (pattern.findFirstApplicableRule(wordChars, dictionary) != null) {
+ return true;
}
}
}
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/DerivativeRuleTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/DerivativeRuleTests.java 2021-11-06 15:40:34 UTC (rev 12006)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/DerivativeRuleTests.java 2021-11-06 17:16:17 UTC (rev 12007)
@@ -40,13 +40,13 @@
public class DerivativeRuleTests {
/**
- * Test of {@link DerivativePattern#applyRule(CharSequence)}.
+ * Test of {@link DerivativePattern#getRoot(CharSequence)}.
*/
@Test
public void testApplyRule() {
final DerivativePattern out = new DerivativePattern(Pattern.compile("^([a-zA-Z\\-]+)ed$"), "$1",
Collections.<DerivativeRule>emptyList());
- Assert.assertEquals("trust", out.applyRule("trusted"));
+ Assert.assertEquals("trust", out.getRoot("trusted"));
}
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 15:40:37
|
Revision: 12006
http://sourceforge.net/p/foray/code/12006
Author: victormote
Date: 2021-11-06 15:40:34 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Conform to aXSL changes to Dictionary.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/HyphenationConsumer4a.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPast1WordFactory.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural1WordFactory.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural2WordFactory.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactory.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactory.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryWordTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPast1WordFactoryTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural1WordFactoryTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural2WordFactoryTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactoryTests.java
trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactoryTests.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -38,58 +38,58 @@
*/
public class AmbiguousWord<T extends Word4a> {
- /** The array of word possibilities. */
- private T[] possibilities;
+ /** The array of word alternatives. */
+ private T[] alternatives;
/**
* Constructor.
- * @param possibilities The words that are ambiguous.
- * @throws IllegalArgumentException For {@code possibilities} that is null, has less than 2 possibilities, or that
- * contains words that are not, in fact, ambiguous.
+ * @param alternatives The words that are ambiguous.
+ * @throws IllegalArgumentException For {@code alternatives} that is null, has less than 2 alternatives, or that
+ * contains alternatives that are not, in fact, ambiguous.
*/
- public AmbiguousWord(final T[] possibilities) {
- if (possibilities == null
- || possibilities.length < 2) {
- throw new IllegalArgumentException("AmbiguousWord must contain at least two possibilities.");
+ public AmbiguousWord(final T[] alternatives) {
+ if (alternatives == null
+ || alternatives.length < 2) {
+ throw new IllegalArgumentException("AmbiguousWord must contain at least two alternatives.");
}
- /* Ensure that the possibilities are, in fact, ambiguous. */
- final CharSequence base = possibilities[0].getActualContent();
- for (int index = 1; index < possibilities.length; index ++) {
- final T word = possibilities[index];
+ /* Ensure that the alternatives are, in fact, ambiguous. */
+ final CharSequence base = alternatives[0].getActualContent();
+ for (int index = 1; index < alternatives.length; index ++) {
+ final T word = alternatives[index];
if (! CharSequenceUtils.areEquivalent(base, word.getActualContent())) {
throw new IllegalArgumentException("Word not ambiguous at index: " + index);
}
}
- this.possibilities = possibilities;
+ this.alternatives = alternatives;
}
public int length() {
- return this.possibilities.length;
+ return this.alternatives.length;
}
- public T getPossibility(final int index) {
- return this.possibilities[index];
+ public T getAlternative(final int index) {
+ return this.alternatives[index];
}
/**
- * Returns the "best" choice of the possibilities, depending on how much information is given for disambiguation.
+ * Returns the "best" choice of the alternatives, depending on how much information is given for disambiguation.
* @param pos The part of speech of the desired word.
* This can be null.
- * @return The first possibility matching {@code pos}, or the first possibility if {@code pos} is null or not
+ * @return The first alternative matching {@code pos}, or the first alternative if {@code pos} is null or not
* matched.
*/
public T getBest(final PartOfSpeech pos) {
if (pos == null) {
- return this.possibilities[0];
+ return this.alternatives[0];
}
- for (int index = 0; index < this.possibilities.length; index ++) {
- final T word = possibilities[index];
+ for (int index = 0; index < this.alternatives.length; index ++) {
+ final T word = alternatives[index];
if (word.isOfType(pos, null)) {
return word;
}
}
- return this.possibilities[0];
+ return this.alternatives[0];
}
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/HyphenationConsumer4a.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/HyphenationConsumer4a.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/HyphenationConsumer4a.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -80,7 +80,7 @@
/* Look in the dictionary first, as it should be more accurate. */
final SegmentDictionary dictionary = orthographyConfig.getDictionary();
if (dictionary != null) {
- hyphenatedWord = dictionary.getWord(chars.toString().toLowerCase(), null);
+ hyphenatedWord = dictionary.getWord(chars.toString().toLowerCase(), 0);
if (hyphenatedWord == null) {
hyphenatedWord = orthographyConfig.findDerivatives(chars);
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -268,7 +268,7 @@
if (adhocDictionaries != null) {
for (int index = 0; index < adhocDictionaries.size(); index ++) {
final Dictionary adhocDictionary = adhocDictionaries.get(index);
- if (adhocDictionary.getWord(wordChars, null) != null) {
+ if (adhocDictionary.getWord(wordChars, 0) != null) {
return true;
}
}
@@ -277,7 +277,7 @@
/* 2. Check exact matches in standard dictionaries for the orthography. */
final Dictionary orthoDictionary = getDictionary();
if (orthoDictionary != null
- && orthoDictionary.getWord(wordChars, null) != null) {
+ && orthoDictionary.getWord(wordChars, 0) != null) {
return true;
}
@@ -340,7 +340,7 @@
for (int patternIndex = 0; patternIndex < patternList.size(); patternIndex ++) {
final DerivativePattern pattern = patternList.get(patternIndex);
final String root = pattern.applyRule(wordChars).toString();
- final Word word = dictionary.getWord(root, null);
+ final Word word = dictionary.getWord(root, 0);
if (word != null) {
if (pattern.doesRulyApply(word) != null) {
return true;
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -141,8 +141,23 @@
}
@Override
- public SegmentDictionaryWord getWord(final CharSequence rawWord, final PartOfSpeech pos) {
- return getExactWord(rawWord, pos);
+ public SegmentDictionaryWord getWord(final CharSequence rawWord, final int index) {
+ if (index == 0) {
+ final SegmentDictionaryWord dictWord = this.wordMap.get(rawWord);
+ if (dictWord != null) {
+ return dictWord;
+ }
+ }
+ final AmbiguousWord<SegmentDictionaryWord> ambWord = this.ambiguousWordMap.get(rawWord);
+ if (ambWord == null) {
+ return null;
+ } else {
+ if (index > -1
+ && index < ambWord.length()) {
+ return ambWord.getAlternative(index);
+ }
+ }
+ return null;
}
/**
@@ -202,4 +217,17 @@
ternaryTreeMap.optimize();
}
+ @Override
+ public int qtyAlternatives(final CharSequence wordChars) {
+ final SegmentDictionaryWord dictWord = this.wordMap.get(wordChars);
+ if (dictWord != null) {
+ return 1;
+ }
+ final AmbiguousWord<SegmentDictionaryWord> ambWord = this.ambiguousWordMap.get(wordChars);
+ if (ambWord != null) {
+ return ambWord.length();
+ }
+ return 0;
+ }
+
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -486,7 +486,7 @@
/* If the first character is capitalized, convert to lowercase & check again. */
if (Character.isUpperCase(builder.charAt(0))) {
builder.setCharAt(0, Character.toLowerCase(builder.charAt(0)));
- dictWord = dictionary.getWord(builder, null);
+ dictWord = dictionary.getWord(builder, 0);
}
if (dictWord != null) {
return true;
@@ -500,7 +500,7 @@
&& builder.charAt(builder.length() - 1) == '’') {
builder.deleteCharAt(builder.length() - 1);
}
- dictWord = dictionary.getWord(builder, null);
+ dictWord = dictionary.getWord(builder, 0);
}
if (dictWord != null) {
return true;
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPast1WordFactory.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPast1WordFactory.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPast1WordFactory.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -53,7 +53,7 @@
final int qtyToRemove = LatinPast1Word.SUFFIX.length();
final String baseWordChars = CharSequenceUtils.removeTrailing(chars, qtyToRemove).toString();
- final Word baseWord = dictionary.getWord(baseWordChars, null);
+ final Word baseWord = dictionary.getWord(baseWordChars, 0);
if (baseWord != null) {
return new LatinPast1Word(baseWord);
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural1WordFactory.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural1WordFactory.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural1WordFactory.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -53,7 +53,7 @@
final int qtyToRemove = LatinPlural1Word.Segment.WRAPPED_FORM.length();
final String baseWordChars = CharSequenceUtils.removeTrailing(chars, qtyToRemove).toString();
- final Word baseWord = dictionary.getWord(baseWordChars, null);
+ final Word baseWord = dictionary.getWord(baseWordChars, 0);
if (baseWord != null) {
return new LatinPlural1Word(baseWord);
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural2WordFactory.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural2WordFactory.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPlural2WordFactory.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -72,7 +72,7 @@
builder.append('y');
final String baseWordChars = builder.toString();
- final Word baseWord = dictionary.getWord(baseWordChars, null);
+ final Word baseWord = dictionary.getWord(baseWordChars, 0);
if (baseWord != null) {
return new LatinPlural2Word(baseWord);
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactory.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactory.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactory.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -53,7 +53,7 @@
final int qtyToRemove = LatinPossessive1Word.Segment.WRAPPED_FORM.length();
final String baseWordChars = CharSequenceUtils.removeTrailing(chars, qtyToRemove).toString();
- final Word baseWord = dictionary.getWord(baseWordChars, null);
+ final Word baseWord = dictionary.getWord(baseWordChars, 0);
if (baseWord != null) {
return new LatinPossessive1Word(baseWord);
}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactory.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactory.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactory.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -53,7 +53,7 @@
final int qtyToRemove = LatinPossessive2Word.Segment.WRAPPED_FORM.length();
final String baseWordChars = CharSequenceUtils.removeTrailing(chars, qtyToRemove).toString();
- final Word baseWord = dictionary.getWord(baseWordChars, null);
+ final Word baseWord = dictionary.getWord(baseWordChars, 0);
if (baseWord != null) {
return new LatinPossessive2Word(baseWord);
}
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -107,12 +107,12 @@
/* This is the expected case. */
}
- final SegmentDictionaryWord dictWord = out.getWord("attention", null);
+ final SegmentDictionaryWord dictWord = out.getWord("attention", 0);
Assert.assertEquals("attention", dictWord.getActualContent());
Assert.assertEquals("at-ten-tion", dictWord.toString());
/* Make sure passing a bogus key returns null. */
- Assert.assertNull(out.getWord("test", null));
+ Assert.assertNull(out.getWord("test", 0));
}
/**
@@ -125,9 +125,9 @@
words.add(WORD_HARMONIOUS);
final SegmentDictionary dictionary = SegmentDictionary.make(words);
Assert.assertEquals(2, dictionary.getSize());
- Assert.assertEquals("at-ten-tion", dictionary.getWord("attention", null).toString());
- Assert.assertEquals("har-mo-ni-ous", dictionary.getWord("harmonious", null).toString());
- Assert.assertNull(dictionary.getWord("bogus", null));
+ Assert.assertEquals("at-ten-tion", dictionary.getWord("attention", 0).toString());
+ Assert.assertEquals("har-mo-ni-ous", dictionary.getWord("harmonious", 0).toString());
+ Assert.assertNull(dictionary.getWord("bogus", 0));
}
}
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryWordTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryWordTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/SegmentDictionaryWordTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -66,7 +66,7 @@
*/
@Test
public void toStringTests() {
- Assert.assertEquals("am-bi-tion", dictionary.getWord("ambition", null).toString());
+ Assert.assertEquals("am-bi-tion", dictionary.getWord("ambition", 0).toString());
}
/**
@@ -74,7 +74,7 @@
*/
@Test
public void getActualContentTests() {
- Assert.assertEquals("ambition", dictionary.getWord("ambition", null).getActualContent());
+ Assert.assertEquals("ambition", dictionary.getWord("ambition", 0).getActualContent());
}
/**
@@ -82,7 +82,7 @@
*/
@Test
public void getQtyParaNodeChildrenTests() {
- Assert.assertEquals(5, dictionary.getWord("ambition", null).getQtyParaNodeChildren());
+ Assert.assertEquals(5, dictionary.getWord("ambition", 0).getQtyParaNodeChildren());
}
/**
@@ -90,7 +90,7 @@
*/
@Test
public void getParaNodeChildTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals("am", word.getParaNodeChild(0).toString());
Assert.assertEquals("-", word.getParaNodeChild(1).toString());
Assert.assertEquals("bi", word.getParaNodeChild(2).toString());
@@ -117,7 +117,7 @@
*/
@Test
public void getParaConfigTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertNull(word.getParaConfig());
}
@@ -126,7 +126,7 @@
*/
@Test
public void getParaNodeTypeTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(ParaNode.Type.BRANCH, word.getParaNodeType());
}
@@ -135,7 +135,7 @@
*/
@Test
public void asParaLeafTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertNull(word.asParaLeaf());
}
@@ -144,7 +144,7 @@
*/
@Test
public void asParaBranchTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
/* Test for identity equality. */
Assert.assertTrue(word.asParaBranch() == word);
}
@@ -154,7 +154,7 @@
*/
@Test
public void lengthTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(8, word.length());
}
@@ -163,7 +163,7 @@
*/
@Test
public void charAtTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals('a', word.charAt(0));
Assert.assertEquals('m', word.charAt(1));
Assert.assertEquals('b', word.charAt(2));
@@ -195,7 +195,7 @@
@Ignore
/* @TODO: Fails in some Java 8 implementations. */
public void subSequenceTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals("mbi", word.subSequence(1, 4));
try {
word.subSequence(-1, 4);
@@ -225,7 +225,7 @@
*/
@Test
public void getNormalizedContentTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals("ambition", word.getNormalizedContent());
}
@@ -234,7 +234,7 @@
*/
@Test
public void getQtyHyphenationPointsTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(2, word.getQtyHyphenationPoints());
}
@@ -243,7 +243,7 @@
*/
@Test
public void getHyphenationPointOffsetTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(2, word.getHyphenationPointOffset(0));
Assert.assertEquals(4, word.getHyphenationPointOffset(1));
try {
@@ -267,7 +267,7 @@
*/
@Test
public void getHyphenationPointTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(Quality.ACCEPTABLE, word.getHyphenationPoint(0).getQuality());
Assert.assertEquals(Quality.ACCEPTABLE, word.getHyphenationPoint(1).getQuality());
try {
@@ -291,7 +291,7 @@
*/
@Test
public void getHeinousPointsTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertNull(word.getHeinousPoints());
}
@@ -300,7 +300,7 @@
*/
@Test
public void getQtyWordSegmentsTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(3, word.getQtyWordSegments());
}
@@ -311,7 +311,7 @@
@Ignore
/* @TODO: Fails in some Java 8 implementations. */
public void getWordSegmentTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals("am", word.getWordSegment(0).toString());
Assert.assertEquals("bi", word.getWordSegment(1).toString());
Assert.assertEquals("tion", word.getWordSegment(2).toString());
@@ -336,7 +336,7 @@
*/
@Test
public void getQtyWordComponentsTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals(5, word.getQtyWordComponents());
}
@@ -347,7 +347,7 @@
@Ignore
/* @TODO: Fails in some Java 8 implementations. */
public void getgetWordComponentTests() {
- final SegmentDictionaryWord word = dictionary.getWord("ambition", null);
+ final SegmentDictionaryWord word = dictionary.getWord("ambition", 0);
Assert.assertEquals("am", word.getWordComponent(0).toString());
Assert.assertEquals("-", word.getWordComponent(1).toString());
Assert.assertEquals("bi", word.getWordComponent(2).toString());
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPast1WordFactoryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPast1WordFactoryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPast1WordFactoryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -55,7 +55,7 @@
public void before() {
this.out = new LatinPast1WordFactory();
this.dictionary = Mockito.mock(Dictionary.class);
- Mockito.when(dictionary.getWord("astonish", null)).thenReturn(StringWordTests.WORD_ASTONISH);
+ Mockito.when(dictionary.getWord("astonish", 0)).thenReturn(StringWordTests.WORD_ASTONISH);
}
/**
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural1WordFactoryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural1WordFactoryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural1WordFactoryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -55,7 +55,7 @@
public void before() {
this.out = new LatinPlural1WordFactory();
this.dictionary = Mockito.mock(Dictionary.class);
- Mockito.when(dictionary.getWord("daughter", null)).thenReturn(StringWordTests.WORD_DAUGHTER);
+ Mockito.when(dictionary.getWord("daughter", 0)).thenReturn(StringWordTests.WORD_DAUGHTER);
}
/**
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural2WordFactoryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural2WordFactoryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPlural2WordFactoryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -55,7 +55,7 @@
public void before() {
this.out = new LatinPlural2WordFactory();
this.dictionary = Mockito.mock(Dictionary.class);
- Mockito.when(dictionary.getWord("company", null)).thenReturn(StringWordTests.WORD_COMPANY);
+ Mockito.when(dictionary.getWord("company", 0)).thenReturn(StringWordTests.WORD_COMPANY);
}
/**
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactoryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactoryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive1WordFactoryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -55,7 +55,7 @@
public void before() {
this.out = new LatinPossessive1WordFactory();
this.dictionary = Mockito.mock(Dictionary.class);
- Mockito.when(dictionary.getWord("daughter", null)).thenReturn(StringWordTests.WORD_DAUGHTER);
+ Mockito.when(dictionary.getWord("daughter", 0)).thenReturn(StringWordTests.WORD_DAUGHTER);
}
/**
Modified: trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactoryTests.java
===================================================================
--- trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactoryTests.java 2021-11-06 14:26:28 UTC (rev 12005)
+++ trunk/foray/foray-hyphen/src/test/java/org/foray/hyphen/wrapper/LatinPossessive2WordFactoryTests.java 2021-11-06 15:40:34 UTC (rev 12006)
@@ -55,7 +55,7 @@
public void before() {
this.out = new LatinPossessive2WordFactory();
this.dictionary = Mockito.mock(Dictionary.class);
- Mockito.when(dictionary.getWord("daughter", null)).thenReturn(StringWordTests.WORD_DAUGHTER);
+ Mockito.when(dictionary.getWord("daughter", 0)).thenReturn(StringWordTests.WORD_DAUGHTER);
}
/**
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 14:26:30
|
Revision: 12005
http://sourceforge.net/p/foray/code/12005
Author: victormote
Date: 2021-11-06 14:26:28 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Partial solution to handling ambiguous words.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
Added Paths:
-----------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-06 10:32:12 UTC (rev 12004)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-06 14:26:28 UTC (rev 12005)
@@ -64006,8 +64006,8 @@
<w><t>grav</t></w>
<w><t>gra-va-men</t></w>
<w><t>gra-va-vam-i-na</t></w>
-<w><t>grave</t></w>
-<w><t>gra-ve</t></w>
+<w><t>grave</t><noun regular-root="true"/><verb regular-root="false"/><adjective regular-root="true"/></w>
+<!-- Don't use. Too difficult to disambiguate. <w><t>gra-ve</t></w> -->
<w><t>grave-clothes</t></w>
<w><t>grave-dig-ger</t></w>
<w><t>grav-el</t></w>
@@ -133762,8 +133762,8 @@
<w><t>re-cop-ied</t></w>
<w><t>re-cop-y</t></w>
<w><t>re-cop-y-ing</t></w>
-<w><t>rec-ord</t></w>
-<w><t>re-cord</t></w>
+<w><t>rec-ord</t><noun regular-root="true"/></w>
+<w><t>re-cord</t><verb regular-root="true"/></w>
<w><t>rec-ord=chang-er</t></w>
<w><t>rec-ord=play-er</t></w>
<w><t>re-cord-a-ble</t></w>
@@ -143171,8 +143171,7 @@
<w><t>sec-re-tar-i-at</t></w>
<w><t>sec-re-tar-i-ate</t></w>
<w><t>sec-re-tar-ies=gen-er-al</t></w>
-<w><t>sec-re-tar-y</t></w>
-<w><t>sec-re-ta-ry</t></w>
+<w><t>sec-re-tar-y</t><noun regular-root="true"/></w>
<w><t>sec-re-tar-y bird</t></w>
<w><t>sec-re-tar-y of state</t></w>
<w><t>sec-re-tar-y=gen-er-al</t></w>
Added: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java (rev 0)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java 2021-11-06 14:26:28 UTC (rev 12005)
@@ -0,0 +1,95 @@
+/*
+ * Copyright 2021 The FOray Project.
+ * http://www.foray.org
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This work is in part derived from the following work(s), used with the
+ * permission of the licensor:
+ * Apache FOP, licensed by the Apache Software Foundation
+ *
+ */
+
+/*
+ * $LastChangedRevision$
+ * $LastChangedDate$
+ * $LastChangedBy$
+ */
+
+package org.foray.hyphen;
+
+import org.foray.common.primitive.CharSequenceUtils;
+
+import org.axsl.hyphen.PartOfSpeech;
+
+/**
+ * Container for words that are spelled the same, but that have different hyphenation, depending on part-of-speech.
+ * @param <T> The type of words being stored in this instance.
+ */
+public class AmbiguousWord<T extends Word4a> {
+
+ /** The array of word possibilities. */
+ private T[] possibilities;
+
+ /**
+ * Constructor.
+ * @param possibilities The words that are ambiguous.
+ * @throws IllegalArgumentException For {@code possibilities} that is null, has less than 2 possibilities, or that
+ * contains words that are not, in fact, ambiguous.
+ */
+ public AmbiguousWord(final T[] possibilities) {
+ if (possibilities == null
+ || possibilities.length < 2) {
+ throw new IllegalArgumentException("AmbiguousWord must contain at least two possibilities.");
+ }
+
+ /* Ensure that the possibilities are, in fact, ambiguous. */
+ final CharSequence base = possibilities[0].getActualContent();
+ for (int index = 1; index < possibilities.length; index ++) {
+ final T word = possibilities[index];
+ if (! CharSequenceUtils.areEquivalent(base, word.getActualContent())) {
+ throw new IllegalArgumentException("Word not ambiguous at index: " + index);
+ }
+ }
+ this.possibilities = possibilities;
+ }
+
+ public int length() {
+ return this.possibilities.length;
+ }
+
+ public T getPossibility(final int index) {
+ return this.possibilities[index];
+ }
+
+ /**
+ * Returns the "best" choice of the possibilities, depending on how much information is given for disambiguation.
+ * @param pos The part of speech of the desired word.
+ * This can be null.
+ * @return The first possibility matching {@code pos}, or the first possibility if {@code pos} is null or not
+ * matched.
+ */
+ public T getBest(final PartOfSpeech pos) {
+ if (pos == null) {
+ return this.possibilities[0];
+ }
+ for (int index = 0; index < this.possibilities.length; index ++) {
+ final T word = possibilities[index];
+ if (word.isOfType(pos, null)) {
+ return word;
+ }
+ }
+ return this.possibilities[0];
+ }
+
+}
Property changes on: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/AmbiguousWord.java
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Rev
\ No newline at end of property
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-06 10:32:12 UTC (rev 12004)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionary.java 2021-11-06 14:26:28 UTC (rev 12005)
@@ -34,6 +34,7 @@
import org.axsl.hyphen.PartOfSpeech;
import java.util.Arrays;
+import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
@@ -53,6 +54,10 @@
* The other tradeoff axis is speed, and we are not sure which implementation is faster. */
private Map<CharSequence, SegmentDictionaryWord> wordMap = new TernaryTreeMap<SegmentDictionaryWord>();
+ /** Map of the ambiguous words. These are expected to be few, so we worry less about memory and performance. */
+ private Map<CharSequence, AmbiguousWord<SegmentDictionaryWord>> ambiguousWordMap =
+ new HashMap<CharSequence, AmbiguousWord<SegmentDictionaryWord>>();
+
/** The array of word segments in this dictionary. */
private StringWordSegment[] wordSegments;
@@ -75,23 +80,38 @@
* @param word The parsed form of this word.
*/
public void addWord(final String rawWord, final StringWord word) {
- final char[] dictionarySegmentIndexes = new char[word.getQtyWordSegments()];
- for (int segmentIndex = 0; segmentIndex < word.getQtyWordSegments(); segmentIndex ++) {
- final StringWordSegmentUtf16 wordSegment = word.getWordSegment(segmentIndex);
- final int dictionarySegmentIndex = Arrays.binarySearch(this.wordSegments, wordSegment);
- if (dictionarySegmentIndex < 0) {
- throw new IllegalArgumentException(
- "Word segment not found in dictionary: \"" + wordSegment +
- "\" while adding word: \"" + rawWord + "\"");
- }
- /* Size of this.wordSegments was checked at construction time, so this cast should be safe. */
- dictionarySegmentIndexes[segmentIndex] = (char) dictionarySegmentIndex;
+ final SegmentDictionaryWord dictWord = new SegmentDictionaryWord(word.getPartsOfSpeech(), this, word);
+
+
+
+// final char[] dictionarySegmentIndexes = new char[word.getQtyWordSegments()];
+// for (int segmentIndex = 0; segmentIndex < word.getQtyWordSegments(); segmentIndex ++) {
+// final StringWordSegmentUtf16 wordSegment = word.getWordSegment(segmentIndex);
+// final int dictionarySegmentIndex = Arrays.binarySearch(this.wordSegments, wordSegment);
+// if (dictionarySegmentIndex < 0) {
+// throw new IllegalArgumentException(
+// "Word segment not found in dictionary: \"" + wordSegment +
+// "\" while adding word: \"" + rawWord + "\"");
+// }
+// /* Size of this.wordSegments was checked at construction time, so this cast should be safe. */
+// dictionarySegmentIndexes[segmentIndex] = (char) dictionarySegmentIndex;
+// }
+// final SegmentDictionaryWord dictWord = new SegmentDictionaryWord(word.getPartsOfSpeech(), this,
+// dictionarySegmentIndexes);
+
+
+
+
+ if (this.wordMap.containsKey(rawWord)) {
+ throw new IllegalArgumentException("Duplicate word. Consider using AmbiguousWord: " + rawWord);
}
- final SegmentDictionaryWord dictWord = new SegmentDictionaryWord(
- word.getPartsOfSpeech(), this, dictionarySegmentIndexes);
this.wordMap.put(rawWord, dictWord);
}
+ public void addAmbiguousWord(final String rawWord, final AmbiguousWord<SegmentDictionaryWord> word) {
+ this.ambiguousWordMap.put(rawWord, word);
+ }
+
/**
* Makes an instance of this class from a list of {@link StringWord} instances.
* This will probably require more memory than constructing an instance and adding words individually using the
@@ -122,17 +142,26 @@
@Override
public SegmentDictionaryWord getWord(final CharSequence rawWord, final PartOfSpeech pos) {
- /* TODO: Implement PartOfSpeech logic. */
- return getExactWord(rawWord);
+ return getExactWord(rawWord, pos);
}
/**
* Retrieves a word from this dictionary.
* @param rawWord The raw word whose word should be retrieved.
+ * @param pos The part of speech desired.
+ * Used to disambiguate ambiguous words.
* @return The word matching {@code rawWord}, or null if none is found.
*/
- public SegmentDictionaryWord getExactWord(final CharSequence rawWord) {
- return this.wordMap.get(rawWord);
+ public SegmentDictionaryWord getExactWord(final CharSequence rawWord, final PartOfSpeech pos) {
+ final SegmentDictionaryWord dictWord = this.wordMap.get(rawWord);
+ if (dictWord != null) {
+ return dictWord;
+ }
+ final AmbiguousWord<SegmentDictionaryWord> ambWord = this.ambiguousWordMap.get(rawWord);
+ if (ambWord != null) {
+ return ambWord.getBest(pos);
+ }
+ return null;
}
/**
@@ -153,6 +182,17 @@
}
/**
+ * Returns the index to a given word segment in this dictionary.
+ * @param segment The segment whose index is needed.
+ * @return The index of {@code segment}, if found in this dictionary.
+ * Otherwise, a negative number corresponding to the insertion point at which the segment would be found if it were
+ * in this dictionary.
+ */
+ public int getWordSegmentIndex(final CharSequence segment) {
+ return Arrays.binarySearch(this.wordSegments, segment);
+ }
+
+ /**
* After all items have been added to the dictionary, this method can be run to give the dictionary an opportunity
* to optimize itself.
*/
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java 2021-11-06 10:32:12 UTC (rev 12004)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SegmentDictionaryWord.java 2021-11-06 14:26:28 UTC (rev 12005)
@@ -30,6 +30,7 @@
import org.axsl.hyphen.PartOfSpeech;
import org.axsl.hyphen.PosRegularity;
+import org.axsl.hyphen.Word;
import org.axsl.hyphen.WordSegment;
/**
@@ -62,6 +63,31 @@
this.segments = segments;
}
+ /**
+ * Constructor.
+ * @param partsOfSpeech The encoded part(s) of speech for this word.
+ * @param dictionary The parent dictionary that contains the character data.
+ * @param word Contains the segment information for the new word.
+ * @throws IllegalArgumentException If {@code word} contains segments that are not found in this dictionary.
+ */
+ public SegmentDictionaryWord(final int partsOfSpeech, final SegmentDictionary dictionary, final Word word) {
+ this.partsOfSpeech = (char) partsOfSpeech;
+ this.dictionary = dictionary;
+ final char[] segmentIndexes = new char[word.getQtyWordSegments()];
+ for (int segmentIndex = 0; segmentIndex < word.getQtyWordSegments(); segmentIndex ++) {
+ final WordSegment wordSegment = word.getWordSegment(segmentIndex);
+ final int dictionarySegmentIndex = dictionary.getWordSegmentIndex(wordSegment);
+ if (dictionarySegmentIndex < 0) {
+ throw new IllegalArgumentException(
+ "Word segment not found in dictionary: \"" + wordSegment +
+ "\" while adding word: \"" + word.getActualContent() + "\"");
+ }
+ /* Size of this.wordSegments was checked at construction time, so this cast should be safe. */
+ segmentIndexes[segmentIndex] = (char) dictionarySegmentIndex;
+ }
+ this.segments = segmentIndexes;
+ }
+
@Override
public int getQtyWordSegments() {
return this.segments.length;
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-06 10:32:12 UTC (rev 12004)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java 2021-11-06 14:26:28 UTC (rev 12005)
@@ -31,8 +31,10 @@
import org.foray.common.AxslDtdUtil;
import org.foray.common.i18n.Orthography4a;
import org.foray.common.primitive.StringUtils;
+import org.foray.hyphen.AmbiguousWord;
import org.foray.hyphen.PosUtils;
import org.foray.hyphen.SegmentDictionary;
+import org.foray.hyphen.SegmentDictionaryWord;
import org.foray.hyphen.StringWord;
import org.foray.hyphen.StringWordSegment;
import org.foray.hyphen.StringWordSegmentFactory;
@@ -131,6 +133,9 @@
/** The data structure containing the dictionary words. */
private Map<String, StringWord> wordMap = new HashMap<String, StringWord>();
+ /** The data structure containing ambiguous words. */
+ private Map<String, List<StringWord>> ambiguousWordMap = new HashMap<String, List<StringWord>>();
+
/** Reusable builder. */
private StringBuilder builder = new StringBuilder(MAX_EXPECTED_WORD_LENGTH);
@@ -170,9 +175,21 @@
segmentSet.toArray(uniqueWordSegments);
Arrays.sort(uniqueWordSegments);
final SegmentDictionary dictionary = new SegmentDictionary(uniqueWordSegments);
- for (Map.Entry<String, StringWord> entry : wordMap.entrySet()) {
+
+ for (Map.Entry<String, StringWord> entry : this.wordMap.entrySet()) {
dictionary.addWord(entry.getKey(), entry.getValue());
}
+ for (Map.Entry<String, List<StringWord>> entry : this.ambiguousWordMap.entrySet()) {
+ final List<StringWord> list = entry.getValue();
+ final SegmentDictionaryWord[] sdWords = new SegmentDictionaryWord[list.size()];
+ for (int index = 0; index < list.size(); index ++) {
+ final StringWord stringWord = list.get(index);
+ sdWords[index] = new SegmentDictionaryWord(stringWord.getPartsOfSpeech(), dictionary, stringWord);
+ }
+ final AmbiguousWord<SegmentDictionaryWord> ambWord = new AmbiguousWord<SegmentDictionaryWord>(sdWords);
+ dictionary.addAmbiguousWord(entry.getKey(), ambWord);
+ }
+
dictionary.optimize();
this.parsedDictionaries.add(dictionary);
return this.parsedDictionaries;
@@ -377,7 +394,23 @@
}
this.lastWord = actualContentLowercase;
}
- wordMap.put(actualContent, word);
+
+ /* Look in the ambiguous words first. */
+ if (this.ambiguousWordMap.containsKey(actualContent)) {
+ final List<StringWord> list = this.ambiguousWordMap.get(actualContent);
+ list.add(word);
+ } else {
+ /* See if this is a new ambiguous word. */
+ if (wordMap.containsKey(actualContent)) {
+ final StringWord existingMapEntry = wordMap.remove(actualContent);
+ final List<StringWord> list = new ArrayList<StringWord>();
+ list.add(existingMapEntry);
+ list.add(word);
+ this.ambiguousWordMap.put(actualContent, list);
+ } else {
+ wordMap.put(actualContent, word);
+ }
+ }
break;
}
case "t": {
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-06 10:32:16
|
Revision: 12004
http://sourceforge.net/p/foray/code/12004
Author: victormote
Date: 2021-11-06 10:32:12 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Move utility classes to .util directory.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryResource.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionarySerializer.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/HyphenationServer4a.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionarySorter.java
Added Paths:
-----------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/ConfigParser.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParser.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/DictionaryParserXml.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/util/SpellChecker.java
Removed Paths:
-------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/ConfigParser.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParser.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParserXml.java
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/SpellChecker.java
Deleted: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/ConfigParser.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/ConfigParser.java 2021-11-06 10:08:20 UTC (rev 12003)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/ConfigParser.java 2021-11-06 10:32:12 UTC (rev 12004)
@@ -1,761 +0,0 @@
-/*
- * Copyright 2019 The FOray Project.
- * http://www.foray.org
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- *
- * This work is in part derived from the following work(s), used with the
- * permission of the licensor:
- * Apache FOP, licensed by the Apache Software Foundation
- *
- */
-
-/*
- * $LastChangedRevision$
- * $LastChangedDate$
- * $LastChangedBy$
- */
-
-package org.foray.hyphen;
-
-import org.foray.common.AxslDtdUtil;
-import org.foray.common.i18n.Country4a;
-import org.foray.common.i18n.Language4a;
-import org.foray.common.i18n.Orthography4a;
-import org.foray.common.i18n.Script4a;
-import org.foray.common.primitive.StringUtils;
-import org.foray.common.resource.ResourceLocation;
-import org.foray.common.resource.ResourceLocationClasspath;
-import org.foray.common.resource.ResourceLocationUrl;
-
-import org.axsl.hyphen.DerivativeType;
-import org.axsl.hyphen.HyphenationException;
-import org.axsl.hyphen.PartOfSpeech;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-import org.xml.sax.Attributes;
-import org.xml.sax.EntityResolver;
-import org.xml.sax.InputSource;
-import org.xml.sax.Locator;
-import org.xml.sax.SAXException;
-import org.xml.sax.SAXNotRecognizedException;
-import org.xml.sax.SAXNotSupportedException;
-import org.xml.sax.XMLReader;
-import org.xml.sax.helpers.DefaultHandler;
-
-import java.io.IOException;
-import java.lang.reflect.Constructor;
-import java.lang.reflect.InvocationTargetException;
-import java.net.MalformedURLException;
-import java.net.URL;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-import java.util.Stack;
-import java.util.regex.Pattern;
-
-import javax.xml.parsers.SAXParserFactory;
-
-/**
- * SAX2 Handler which retrieves the orthography configuration information and stores it in a {@link HyphenationServer4a}
- * instance.
- * Normally this class doesn't need to be accessed directly.
- */
-public class ConfigParser extends DefaultHandler {
-
- /** The logger. */
- private Logger logger = LoggerFactory.getLogger(this.getClass());
-
- /** Stateful variable. */
- private DictionaryResource currentDictionaryResource;
-
- /** Stateful variable. */
- private DictionaryResource.WordListElement currentWordListElement;
-
- /** Stateful variable. */
- private HyphenationPatternsResource currentHyphenationPatternsResource;
-
- /** Stateful variable. */
- private List<Pattern> currentMatchRuleList;
-
- /** Stateful variable. */
- private List<DerivativePattern> currentDerivativePatternList;
-
- /** Stateful variable. */
- private List<DerivativeRule> currentDerivativeRuleList;
-
- /** Component of: derivative-rule. */
- private PartOfSpeech currentPartOfSpeech;
-
- /** Component of: derivative-rule. */
- private Boolean currentRegularity;
-
- /** Component of: derivative-rule. */
- private List<DerivativeType> currentDerivativeTypeList;
-
- /** Stateful variable. */
- private Pattern currentDerivativeRuleMatch;
-
- /** Stateful variable. */
- private String currentDerivativeRuleReplace;
-
- /** Stateful variable. */
- private List<WordWrapperFactory<?>> currentDerivateFactoryList;
-
- /** Stateful variable. */
- private ResourceLocation.Type currentResourceLocationType;
-
- /** Stateful variable. */
- private ResourceLocation currentResourceLocation;
-
- /** Receives content of text nodes. */
- private StringBuilder textAccumulator = new StringBuilder();
-
- /** Stateful variable tracking the current orthography configuration. */
- private transient OrthographyConfig4a currentOrthographyConfig;
-
-// /** The map of match rule lists, keyed by id. */
-// private Map<String, List<Pattern>> matchRuleLists = new HashMap<String, List<Pattern>>();
-//
- /** The map of derivative factory lists, keyed by id. */
- private Map<String, List<WordWrapperFactory<?>>> derivativeLists =
- new HashMap<String, List<WordWrapperFactory<?>>>();
-
- /** The map of dictionary instances, keyed by id. */
- private Map<String, DictionaryResource> dictionaries = new HashMap<String, DictionaryResource>();
-
- /** The map of hyphenation pattern tree instances, keyed by id. */
- private Map<String, HyphenationPatternsResource> hyphenationPatterns =
- new HashMap<String, HyphenationPatternsResource>();
-
- /** The InputSource encapsulating the configuration file. */
- private InputSource filename;
-
- /** The hyphenation server receiving the parsed information. */
- private HyphenationServer4a hyphenationServer;
-
- /** The XML parser's Locator instance, used to indicate line and column numbers in user messages. */
- private Locator locator;
-
- /** The stack of elements currently being processed. */
- private Stack<String> elementStack = new Stack<String>();
-
- /**
- * Register the URLStreamHandler for classpath: URLs.
- * This has to be done only once, hence a static statement.
- */
- static {
- org.foray.common.url.classpath.Handler.register();
- }
-
- /**
- * Constructor.
- * @param server The hyphenation server which will capture the information from the parsed configuration.
- * @param filename The file which contains the configuration information
- * to be parsed.
- */
- public ConfigParser(final HyphenationServer4a server, final InputSource filename) {
- this.hyphenationServer = server;
- this.filename = filename;
- }
-
- /**
- * Parses the configuration file.
- * @throws HyphenationException For errors during parsing.
- */
- public void start() throws HyphenationException {
- final XMLReader parser = createParser();
- /* Turn on validation if it is available. */
- try {
- parser.setFeature("http://xml.org/sax/features/validation", true);
- } catch (final SAXNotRecognizedException e) {
- this.logger.warn("Parser does not recognize validation.");
- } catch (final SAXNotSupportedException e) {
- this.logger.warn("Parser does not support validation.");
- }
- parser.setContentHandler(this);
- final EntityResolver resolver = AxslDtdUtil.getEntityResolver();
- parser.setEntityResolver(resolver);
-
- try {
- parser.parse(this.filename);
- } catch (final SAXException e) {
- if (e.getException() instanceof HyphenationException) {
- throw (HyphenationException) e.getException();
- }
- throw new HyphenationException(e);
- } catch (final IOException e) {
- throw new HyphenationException(e);
- }
- }
-
- /**
- * Creates a SAX parser for parsing the configuration file.
- * @return The created SAX parser.
- * @throws HyphenationException For errors creating or configuring the parser.
- */
- private XMLReader createParser() throws HyphenationException {
- try {
- final SAXParserFactory spf =
- javax.xml.parsers.SAXParserFactory.newInstance();
- spf.setNamespaceAware(true);
- final XMLReader xmlReader = spf.newSAXParser().getXMLReader();
- final EntityResolver entityResolver = this.hyphenationServer.getEntityResolver();
- xmlReader.setEntityResolver(entityResolver);
- this.logger.debug("Orthography Configuration Parsing: Using {} as SAX2 Parser",
- xmlReader.getClass().getName());
- return xmlReader;
- } catch (final javax.xml.parsers.ParserConfigurationException e) {
- throw new HyphenationException(e);
- } catch (final org.xml.sax.SAXException e) {
- throw new HyphenationException(e);
- }
- }
-
- @Override
- public void startElement(final String uri, final String localName, final String qName,
- final Attributes attributes) throws SAXException {
- this.elementStack.push(localName);
- switch(localName) {
- case "axsl-orthography-config": {
- /* Nothing to do here. */
- return;
- }
- case "match-rule-list": {
- final String id = attributes.getValue("id");
- this.currentMatchRuleList = new ArrayList<Pattern>();
- this.hyphenationServer.registerMatchRules(id, currentMatchRuleList);
- return;
- }
- case "derivative-pattern-list": {
- final String id = attributes.getValue("id");
- this.currentDerivativePatternList = new ArrayList<DerivativePattern>();
- this.hyphenationServer.registerDerivativeRules(id, currentDerivativePatternList);
- return;
- }
- case "derivative-pattern": {
- this.currentDerivativeRuleList = new ArrayList<DerivativeRule>();
- return;
- }
- case "derivative-rule": {
- this.currentPartOfSpeech = null;
- this.currentRegularity = null;
- this.currentDerivativeTypeList = new ArrayList<DerivativeType>();
- return;
- }
- case "derivative-type": {
- final String typeString = attributes.getValue("type");
- final DerivativeType type = DerivativeType.fromToken(typeString);
- this.currentDerivativeTypeList.add(type);
- return;
- }
- case "match": {
- return;
- }
- case "replace": {
- return;
- }
- case "derivative-factory-list": {
- final String id = attributes.getValue("id");
- this.currentDerivateFactoryList = new ArrayList<WordWrapperFactory<?>>();
- this.derivativeLists.put(id, currentDerivateFactoryList);
- return;
- }
- case "derivative-factory": {
- final String factoryClassName = attributes.getValue("class");
- final WordWrapperFactory<?> factory = instantiate(factoryClassName, WordWrapperFactory.class);
- if (factory == null) {
- return;
- }
- this.currentDerivateFactoryList.add(factory);
- return;
- }
- case "word-breaker": {
- final String className = attributes.getValue("class");
- final WordBreaker breaker = instantiate(className, WordBreaker.class);
- if (breaker == null) {
- return;
- }
- this.currentOrthographyConfig.setWordBreaker(breaker);
- return;
- }
- case "exclusion": {
- final String regexPatternString = attributes.getValue("regex-pattern");
- final Pattern regexPattern = Pattern.compile(regexPatternString);
- this.currentWordListElement.addExclusionPattern(regexPattern);
- return;
- }
- case "dictionary": {
- final String reference = attributes.getValue("reference");
- final DictionaryResource resource = this.dictionaries.get(reference);
- if (resource == null) {
- this.logger.error("dictionary-resource not found: {}", reference);
- this.logger.error(getContextMessage());
- } else {
- this.currentOrthographyConfig.setDictionaryResource(resource);
- }
- return;
- }
- case "hyphenation-patterns": {
- final String reference = attributes.getValue("reference");
- final HyphenationPatternsResource resource = this.hyphenationPatterns.get(reference);
- if (resource == null) {
- this.logger.error("hyphenation-patterns-resource not found: {}", reference);
- this.logger.error(getContextMessage());
- } else {
- this.currentOrthographyConfig.setHyphenationPatternsResource(resource);
- }
- return;
- }
- case "match-rules": {
- final String reference = attributes.getValue("reference");
- final List<Pattern> patterns = this.hyphenationServer.getMatchRules(reference);
- if (patterns == null) {
- this.logger.error("match-rules not found: {}", reference);
- this.logger.error(getContextMessage());
- } else {
- this.currentOrthographyConfig.registerMatchRuleListId(reference);
- }
- return;
- }
- case "derivative-rules": {
- final String reference = attributes.getValue("reference");
- final List<DerivativePattern> rules = this.hyphenationServer.getDerivativePatterns(reference);
- if (rules == null) {
- this.logger.error("derivative-rules not found: {}", reference);
- this.logger.error(getContextMessage());
- } else {
- this.currentOrthographyConfig.registerDerivativeRuleListId(reference);
- }
- return;
- }
- case "derivative-factories": {
- final String reference = attributes.getValue("reference");
- final List<WordWrapperFactory<?>> factories = this.derivativeLists.get(reference);
- if (factories == null) {
- this.logger.error("derivative-factories not found: {}", reference);
- this.logger.error(getContextMessage());
- } else {
- this.currentOrthographyConfig.setWordWrapperFactories(factories);
- }
- return;
- }
- case "dictionary-resource": {
- final String id = attributes.getValue("id");
- this.currentDictionaryResource = new DictionaryResource(id);
- this.dictionaries.put(id, this.currentDictionaryResource);
- return;
- }
- case "hyphenation-patterns-resource": {
- final String id = attributes.getValue("id");
- this.currentHyphenationPatternsResource = new HyphenationPatternsResource(id);
- this.hyphenationPatterns.put(id, this.currentHyphenationPatternsResource);
- return;
- }
- case "parsed-resource": {
- /* All processing is done at endElement. */
- return;
- }
- case "resource-location": {
- final String typeString = attributes.getValue("type");
- this.currentResourceLocationType = ResourceLocation.Type.fromId(typeString);
- if (this.currentResourceLocationType == null) {
- throw new SAXException("Invalid resource location type: " + typeString);
- }
- return;
- }
- case "unparsed-dictionary": {
- /* All processing is done at endElement. */
- return;
- }
- case "dictionary-element": {
- this.currentWordListElement = this.currentDictionaryResource.new WordListElement();
- this.currentDictionaryResource.addWordListElement(this.currentWordListElement);
- return;
- }
- case "unparsed-hyphenation-patterns": {
- /* All processing is done at endElement. */
- return;
- }
- case "configuration": {
- this.currentOrthographyConfig = new OrthographyConfig4a(this.hyphenationServer);
- return;
- }
- case "orthography": {
- parseElementOrthography(attributes);
- return;
- }
- case "noun": {
- this.currentPartOfSpeech = PartOfSpeech.NOUN;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "pronoun": {
- this.currentPartOfSpeech = PartOfSpeech.PRONOUN;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "verb": {
- this.currentPartOfSpeech = PartOfSpeech.VERB;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "adjective": {
- this.currentPartOfSpeech = PartOfSpeech.ADJECTIVE;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "adverb": {
- this.currentPartOfSpeech = PartOfSpeech.ADVERB;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "preposition": {
- this.currentPartOfSpeech = PartOfSpeech.PREPOSITION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "conjunction": {
- this.currentPartOfSpeech = PartOfSpeech.CONJUNCTION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "article": {
- this.currentPartOfSpeech = PartOfSpeech.ARTICLE;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "interjection": {
- this.currentPartOfSpeech = PartOfSpeech.INTERJECTION;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "cardinal": {
- this.currentPartOfSpeech = PartOfSpeech.CARDINAL;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- case "ordinal": {
- this.currentPartOfSpeech = PartOfSpeech.ORDINAL;
- this.currentRegularity = parseRegularRootAttribute(attributes);
- return;
- }
- default: {
- /* Make sure user knows about unknown tag. */
- this.logger.error("Unknown tag in orthography configuration: {}", localName);
- }
- }
- }
-
- private boolean parseRegularRootAttribute(final Attributes attributes) {
- final String value = attributes.getValue("regular-root");
- if (value == null) {
- return false;
- }
- return "true".equals(value);
- }
-
- /**
- * Parses the "orthography" element.
- * @param attributes The raw parsed attributes.
- */
- private void parseElementOrthography(final Attributes attributes) {
- final String languageString = attributes.getValue("language-iso-3char");
- final String countryString = attributes.getValue("country-iso-3char");
- final String scriptString = attributes.getValue("script-iso-4char");
- final Language4a language = Language4a.findFrom3Char(languageString);
- if (language == null) {
- this.logger.error("Unable to find language for: {}", languageString);
- this.logger.error(getContextMessage());
- }
- final Country4a country = Country4a.findFrom3Char(countryString);
- if (country == null) {
- this.logger.error("Unable to find country for: {}", countryString);
- this.logger.error(getContextMessage());
- }
- final Script4a script = Script4a.findFromAlpha(scriptString);
- if (script == null) {
- this.logger.error("Unable to find script for: {}", scriptString);
- this.logger.error(getContextMessage());
- }
- final Orthography4a orthography = Orthography4a.find(language, country, script);
- if (orthography == null) {
- this.logger.error("Unable to find script for: {}_{}_{}", languageString, countryString, scriptString);
- this.logger.error(getContextMessage());
- }
- this.hyphenationServer.registerOrthographyConfig(orthography, this.currentOrthographyConfig);
- }
-
- /**
- * Instantiates an instance of a specified class using reflection, and ensures that it is a subtype of a given type.
- * @param className The name of the class that should be instantiated.
- * @param expectedType The expected superclass for {@code className}.
- * @param <T> The type of the superclass object that is being instantiated.
- * @return The new instance of {@code className}, or null if it could not be created.
- * @throws SAXException Wraps a number of exceptions that can be thrown during instantiation by reflection.
- */
- private <T extends Object> T instantiate(final String className, final Class<T> expectedType) throws SAXException {
- Class<?> theClass = null;
- try {
- theClass = Class.forName(className);
- } catch (final ClassNotFoundException e) {
- throw new SAXException(e);
- }
- if (! expectedType.isAssignableFrom(theClass)) {
- this.logger.warn("Class \"{}\" is not a {} class.", className, WordWrapperFactory.class.getName());
- return null;
- }
-
- @SuppressWarnings("unchecked")
- final Class<T> factoryClass = (Class<T>) theClass;
- /* For now, use only the no-args constructor. */
- Constructor<T> constructor = null;
- try {
- constructor = factoryClass.getConstructor();
- } catch (final SecurityException e) {
- throw new SAXException(e);
- } catch (final NoSuchMethodException e) {
- throw new SAXException(e);
- }
-
- T newInstance = null;
- try {
- newInstance = constructor.newInstance();
- } catch (final IllegalArgumentException e) {
- throw new SAXException(e);
- } catch (final InstantiationException e) {
- throw new SAXException(e);
- } catch (final IllegalAccessException e) {
- throw new SAXException(e);
- } catch (final InvocationTargetException e) {
- throw new SAXException(e);
- }
- return newInstance;
- }
-
- @Override
- public void endElement(final String uri, final String localName, final String qName) {
- endElementInside(uri, localName, qName);
- this.elementStack.pop();
- }
-
- /**
- * Called by {@link #endElement(String, String, String)} so that we can make sure we get housekeeping done after
- * this method has run.
- * @param uri See {@link DefaultHandler#endElement(String, String, String)}.
- * @param localName See {@link DefaultHandler#endElement(String, String, String)}.
- * @param qName See {@link DefaultHandler#endElement(String, String, String)}.
- */
- private void endElementInside(final String uri, final String localName, final String qName) {
- switch(localName) {
- case "axsl-orthography-config": {
- return;
- }
- case "match-rule-list": {
- this.currentMatchRuleList = null;
- return;
- }
- case "derivative-pattern-list": {
- this.currentDerivativePatternList = null;
- return;
- }
- case "derivative-pattern": {
- final DerivativePattern pattern = new DerivativePattern(this.currentDerivativeRuleMatch,
- this.currentDerivativeRuleReplace, this.currentDerivativeRuleList);
- this.currentDerivativePatternList.add(pattern);
- this.currentDerivativeRuleList = null;
- this.currentDerivativeRuleMatch = null;
- this.currentDerivativeRuleReplace = null;
- return;
- }
- case "derivative-rule": {
- final DerivativeRule rule = new DerivativeRule(this.currentPartOfSpeech, this.currentRegularity,
- this.currentDerivativeTypeList);
- this.currentDerivativeRuleList.add(rule);
- this.currentPartOfSpeech = null;
- this.currentRegularity = null;
- this.currentDerivativeTypeList = null;
- return;
- }
- case "derivative-type": {
- return;
- }
- case "match": {
- final String matchString = this.textAccumulator.toString();
- StringUtils.clear(this.textAccumulator);
- final Pattern pattern = Pattern.compile(matchString);
- if (this.currentDerivativeRuleList != null) {
- this.currentDerivativeRuleMatch = pattern;
- } else {
- this.currentMatchRuleList.add(pattern);
- }
- return;
- }
- case "replace": {
- final String replaceString = this.textAccumulator.toString();
- StringUtils.clear(this.textAccumulator);
- this.currentDerivativeRuleReplace = replaceString;
- return;
- }
- case "derivative-factory-list": {
- this.currentDerivateFactoryList = null;
- return;
- }
- case "derivative-factory": {
- return;
- }
- case "word-breaker": {
- return;
- }
- case "exclusion": {
- return;
- }
- case "dictionary": {
- return;
- }
- case "hyphenation-patterns": {
- return;
- }
- case "match-rules": {
- return;
- }
- case "match-derivative": {
- return;
- }
- case "derivative-factories": {
- return;
- }
- case "dictionary-resource": {
- this.currentDictionaryResource = null;
- return;
- }
- case "hyphenation-patterns-resource": {
- this.currentHyphenationPatternsResource = null;
- return;
- }
- case "parsed-resource": {
- return;
- }
- case "resource-location": {
- final String content = this.textAccumulator.toString();
- StringUtils.clear(this.textAccumulator);
- switch (this.currentResourceLocationType) {
- case CLASSPATH_RESOURCE: {
- this.currentResourceLocation = new ResourceLocationClasspath(content);
- break;
- }
- case URL_RESOURCE: {
- this.currentResourceLocation = new ResourceLocationUrl(createUrl(content));
- break;
- }
- }
-
- if (this.currentWordListElement != null) {
- this.currentWordListElement.setLocation(this.currentResourceLocation);
- } else if (this.currentHyphenationPatternsResource != null) {
- final String parentElement = this.getParentElement();
- if ("unparsed-hyphenation-patterns".equals(parentElement)) {
- this.currentHyphenationPatternsResource.setUnparsedLocation(this.currentResourceLocation);
- } else if ("parsed-resource".equals(parentElement)) {
- this.currentHyphenationPatternsResource.addParsedResource(this.currentResourceLocation);
- } else {
- throw new IllegalStateException();
- }
- } else if (this.currentDictionaryResource != null) {
- this.currentDictionaryResource.addParsedResource(this.currentResourceLocation);
- } else {
- throw new IllegalStateException("Unexpected resource type.");
- }
- this.currentResourceLocation = null;
- return;
- }
- case "unparsed-dictionary": {
- return;
- }
- case "dictionary-element": {
- this.currentWordListElement = null;
- return;
- }
- case "unparsed-hyphenation-patterns": {
- return;
- }
- case "configuration": {
- this.currentOrthographyConfig = null;
- return;
- }
- case "orthography": {
- return;
- }
- }
- }
-
- /**
- * Sets the document locator for this parser.
- * @param locator The new locator.
- */
- public void setDocumentLocator(final Locator locator) {
- this.locator = locator;
- }
-
- @Override
- public void characters(final char[] chars, final int start, final int length) throws SAXException {
- this.textAccumulator.append(chars, start, length);
- }
-
- /**
- * Provides a formatted string showing the current locator context, which is useful in user messages to indicate
- * where in the document a condition arose.
- * @return The formatted context message.
- */
- private String getContextMessage() {
- if (this.locator == null) {
- return null;
- }
- return " Context: " + this.locator.getSystemId() + "\n"
- + " (Line " + this.locator.getLineNumber() + ", Column "
- + this.locator.getColumnNumber() + ")";
- }
-
- /**
- * Converts a string to a URL.
- * @param urlString The string to be converted.
- * @return The URL.
- */
- private URL createUrl(final String urlString) {
- try {
- return new URL(urlString);
- } catch (final MalformedURLException e) {
- this.logger.error("Invalid URL: {}", urlString);
- this.logger.error(getContextMessage());
- return null;
- }
- }
-
- /**
- * Returns the name of the element that is the parent of the current element.
- * @return The name of the element that is teh parent of the current element.
- */
- private String getParentElement() {
- /* Stack is a subclass of Vector, so we can use its methods to do a double-peek. */
- /* This is the index to the current element. */
- final int lastIndex = this.elementStack.size() - 1;
- final int parentIndex = lastIndex - 1;
- if (parentIndex < 0) {
- return null;
- }
- return this.elementStack.get(parentIndex);
- }
-
-}
Deleted: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParser.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParser.java 2021-11-06 10:08:20 UTC (rev 12003)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParser.java 2021-11-06 10:32:12 UTC (rev 12004)
@@ -1,178 +0,0 @@
-/*
- * Copyright 2019 The FOray Project.
- * http://www.foray.org
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- *
- * This work is in part derived from the following work(s), used with the
- * permission of the licensor:
- * Apache FOP, licensed by the Apache Software Foundation
- *
- */
-
-/*
- * $LastChangedRevision$
- * $LastChangedDate$
- * $LastChangedBy$
- */
-
-package org.foray.hyphen;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-
-/**
- * Parses a list of words into a SegmentDictionary.
- * @see DictionaryParserXml for a parser for similar data in XML format.
- */
-public class DictionaryParser {
-
- /** The logger. */
- private Logger logger = LoggerFactory.getLogger(this.getClass());
-
- /** The character in the input file that marks a soft hyphen. */
- private char inputSoftHyphenChar = '-';
-
- /** The character in the input file that marks a hard hyphen. */
- private char inputHardHyphenChar = '=';
-
- /** The character that should actually be used in the word content as the hard hyphen characters. */
- private char actualHardHyphenChar = '-';
-
- /* TODO: The following List and contents are oriented toward English & Western European languages.
- * They should be moved to the orthography configuration. */
- /** The list of string factories that should be tried when building the strings. */
- private List<StringWordSegmentFactory<?>> stringFactories = new ArrayList<StringWordSegmentFactory<?>>();
- {
- stringFactories.add(new StringWordSegmentLatin1Factory());
- stringFactories.add(new StringWordSegmentUtf16Factory());
- }
-
- /**
- * Parses a given InputStream and places the parsed information into the dictionary.
- * @param inputStream The input stream to parse.
- * @param description Description of {@literal inputStream}, useful for user messages.
- * @throws IOException For IO errors during parsing.
- * @return The parsed dictionary.
- */
- public SegmentDictionary parse(final InputStream inputStream, final String description) throws IOException {
- logger.info("Begin dictionary word list parsing: " + description);
-
- final InputStreamReader isReader = new InputStreamReader(inputStream);
- final BufferedReader reader = new BufferedReader(isReader);
- final Set<StringWordSegment> segmentSet = new HashSet<StringWordSegment>();
- /** The data structure containing the dictionary words. */
- final Map<String, StringWord> wordMap = new HashMap<String, StringWord>();
-
- /* Reusable builder. */
- final StringBuilder builder = new StringBuilder(100);
- /* Reusable segment list. */
- final List<StringWordSegment> segmentList = new ArrayList<StringWordSegment>(100);
-
- int lineNumber = 0;
- String inputLine = reader.readLine();
- while (inputLine != null) {
- lineNumber ++;
- builder.delete(0, builder.length());
- segmentList.clear();
-
- if (inputLine.length() < 1) {
- inputLine = reader.readLine();
- continue;
- }
- if (inputLine.charAt(0) == '#') {
- inputLine = reader.readLine();
- continue;
- }
- final int charIndex = inputLine.indexOf('#');
- if (charIndex > -1) {
- inputLine = inputLine.substring(0, charIndex - 1);
- }
- inputLine = inputLine.trim();
- int inputLineIndex = 0;
- while (inputLineIndex < inputLine.length()) {
- final char theChar = inputLine.charAt(inputLineIndex);
- if (theChar == this.inputSoftHyphenChar) {
- if (builder.length() < 1) {
- throw new IllegalStateException("0-length syllable on line: " + lineNumber);
- }
- final StringWordSegment wordSegment = createSegment(builder.toString());
- segmentList.add(wordSegment);
- segmentSet.add(wordSegment);
- builder.delete(0, builder.length());
- } else {
- if (theChar == this.inputHardHyphenChar) {
- builder.append(this.actualHardHyphenChar);
- } else {
- builder.append(theChar);
- }
- }
- inputLineIndex ++;
- }
- if (builder.length() > 0) {
- final StringWordSegment wordSegment = createSegment(builder.toString());
- segmentList.add(wordSegment);
- segmentSet.add(wordSegment);
- }
- if (segmentList.size() < 1) {
- throw new IllegalStateException("0-syllable word on line: " + lineNumber);
- }
- final StringWordSegment[] segments = new StringWordSegment[segmentList.size()];
- segmentList.toArray(segments);
- final StringWord word = new StringWord(0, segments);
- wordMap.put(word.getActualContent().toString(), word);
- inputLine = reader.readLine();
- }
- logger.info("End dictionary word list parsing.");
- logger.info("Qty of unique word segments parsed: " + segmentSet.size());
- logger.info("Qty of words parsed: " + wordMap.size());
- final StringWordSegment[] uniqueWordSegments = new StringWordSegment[segmentSet.size()];
- segmentSet.toArray(uniqueWordSegments);
- Arrays.sort(uniqueWordSegments);
- final SegmentDictionary dictionary = new SegmentDictionary(uniqueWordSegments);
- for (Map.Entry<String, StringWord> entry : wordMap.entrySet()) {
- dictionary.addWord(entry.getKey(), entry.getValue());
- }
- dictionary.optimize();
- return dictionary;
- }
-
- /**
- * Create a pseudo-string instance.
- * @param string The String instance to replace.
- * @return A new instance of StringWordSegment encapsulating the content in {@code string}.
- */
- private StringWordSegment createSegment(final String string) {
- for (int index = 0; index < this.stringFactories.size(); index ++) {
- final StringWordSegmentFactory<?> factory = this.stringFactories.get(index);
- final StringWordSegment newString = factory.makeInstance(string);
- if (newString != null) {
- return newString;
- }
- }
- return null;
- }
-
-}
Deleted: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParserXml.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParserXml.java 2021-11-06 10:08:20 UTC (rev 12003)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryParserXml.java 2021-11-06 10:32:12 UTC (rev 12004)
@@ -1,473 +0,0 @@
-/*
- * Copyright 2021 The FOray Project.
- * http://www.foray.org
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- *
- * This work is in part derived from the following work(s), used with the
- * permission of the licensor:
- * Apache FOP, licensed by the Apache Software Foundation
- *
- */
-
-/*
- * $LastChangedRevision$
- * $LastChangedDate$
- * $LastChangedBy$
- */
-
-package org.foray.hyphen;
-
-import org.foray.common.AxslDtdUtil;
-import org.foray.common.i18n.Orthography4a;
-import org.foray.common.primitive.StringUtils;
-
-import org.axsl.hyphen.PartOfSpeech;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-import org.xml.sax.Attributes;
-import org.xml.sax.EntityResolver;
-import org.xml.sax.InputSource;
-import org.xml.sax.Locator;
-import org.xml.sax.SAXException;
-import org.xml.sax.SAXNotRecognizedException;
-import org.xml.sax.SAXNotSupportedException;
-import org.xml.sax.XMLReader;
-import org.xml.sax.ext.DefaultHandler2;
-
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-
-import javax.xml.parsers.ParserConfigurationException;
-import javax.xml.parsers.SAXParserFactory;
-
-/**
- * Parses an axsl-dictionary XML document into a SegmentDictionary.
- * @see DictionaryParser for a parser for similar data in simple text format.
- */
-public class DictionaryParserXml extends DefaultHandler2 {
-
- private class DictionaryElement {
-
- /** The orthography for this dictionary. */
- private Orthography4a orthography;
-
- /** The soft hyphen char for this dictionary. */
- private char softHyphenChar;
-
- /** The hard hyphen char for this dictionary. */
- private char hardHyphenChar;
- }
-
- /** Format string for formatting the current location. */
- private static final String LOCATION_FORMAT_STRING = "(%1$d:%2$d)";
-
- /** Constant used to initialize string builders. */
- private static final int MAX_EXPECTED_WORD_LENGTH = 100;
-
- /** Constant used to initialize segment collections. */
- private static final int MAX_EXPECTED_QTY_SEGMENTS = 100;
-
- /** The logger. */
- private Logger logger = LoggerFactory.getLogger(this.getClass());
-
- /** The character that should actually be used in the word content as the hard hyphen characters. */
- private char actualHardHyphenChar = '-';
-
- /** The input source to be parsed. */
-// private InputSource input;
-
- /** The locator instance for identifying the document, line, and column number of specific elements. */
- private Locator locator;
-
- /** The list of dictionaries that have been parsed by this parser. */
- private List<SegmentDictionary> parsedDictionaries = new ArrayList<SegmentDictionary>();
-
- /** The current dictionary being parsed. */
- private DictionaryElement currentDictionary;
-
- /** The current word content being parsed. */
- private StringWordSegment[] currentSegments;
-
- /** The current parts of speech being parsed. */
- private char currentPartsOfSpeech;
-
- /* TODO: The following List and contents are oriented toward English & Western European languages.
- * They should be moved to the orthography configuration. */
- /** The list of string factories that should be tried when building the strings. */
- private List<StringWordSegmentFactory<?>> stringFactories = new ArrayList<StringWordSegmentFactory<?>>();
- {
- stringFactories.add(new StringWordSegmentLatin1Factory());
- stringFactories.add(new StringWordSegmentUtf16Factory());
- }
-
- /** The set of all segments that have been parsed by this parser. */
- private Set<StringWordSegment> segmentSet = new HashSet<StringWordSegment>();
-
- /** The data structure containing the dictionary words. */
- private Map<String, StringWord> wordMap = new HashMap<String, StringWord>();
-
- /** Reusable builder. */
- private StringBuilder builder = new StringBuilder(MAX_EXPECTED_WORD_LENGTH);
-
- /** Reusable segment list. */
- private List<StringWordSegment> segmentList = new ArrayList<StringWordSegment>(MAX_EXPECTED_QTY_SEGMENTS);
-
- /** Buffer in which to capture parsed element content. */
- private StringBuilder charBuffer = new StringBuilder(MAX_EXPECTED_WORD_LENGTH);
-
- /** Indicates whether this parser should log dictionary problems. */
- private boolean logDictionaryProblems = false;
-
- /** The last parsed word, used to verify alphabetical order. */
- private String lastWord = StringUtils.EMPTY_STRING;
-
- /**
- * Parses a given InputStream and places the parsed information into the dictionary.
- * @param inputSource The input source to parse.
- * @param description Description of {@literal inputStream}, useful for user messages.
- * @throws IOException For IO errors during parsing.
- * @return The parsed dictionary.
- * @throws ParserConfigurationException For errors during parser configuration.
- * @throws SAXException For errors found by the SAX parser.
- */
- public List<SegmentDictionary> parse(final InputSource inputSource, final String description)
- throws IOException, SAXException, ParserConfigurationException {
- logger.info("Begin dictionary word list parsing: " + description);
-
- final XMLReader parser = createParser();
- if (parser != null) {
- parser.setContentHandler(this);
- parser.parse(inputSource);
- }
- cleanup();
-
- final StringWordSegment[] uniqueWordSegments = new StringWordSegment[segmentSet.size()];
- segmentSet.toArray(uniqueWordSegments);
- Arrays.sort(uniqueWordSegments);
- final SegmentDictionary dictionary = new SegmentDictionary(uniqueWordSegments);
- for (Map.Entry<String, StringWord> entry : wordMap.entrySet()) {
- dictionary.addWord(entry.getKey(), entry.getValue());
- }
- dictionary.optimize();
- this.parsedDictionaries.add(dictionary);
- return this.parsedDictionaries;
- }
-
- private XMLReader createParser() throws SAXException, ParserConfigurationException {
- final SAXParserFactory spf = javax.xml.parsers.SAXParserFactory.newInstance();
- spf.setNamespaceAware(true);
- final XMLReader parser = spf.newSAXParser().getXMLReader();
-
- final EntityResolver resolver = AxslDtdUtil.getEntityResolver();
- parser.setEntityResolver(resolver);
-
- /* Bind the LexicalHandler to the XMLReader if possible. */
- try {
- parser.setProperty("http://xml.org/sax/properties/lexical-handler", this);
- } catch (final SAXNotSupportedException e1) {
- this.logger.error("Parser does not support LexicalHandler.");
- }
-
- /* Bind the DeclHandler to the XMLReader if possible. */
- try {
- parser.setProperty("http://xml.org/sax/properties/declaration-handler", this);
- } catch (final SAXNotSupportedException e) {
- this.logger.error("Parser does not support Declaration Handler.");
- }
-
- /* Turn on namespace-prefixes so that we get the namespace declarations
- * returned with other attributes and can therefore write them out
- * along with them. */
- try {
- parser.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
- } catch (final SAXNotRecognizedException e1) {
- this.logger.error("Parser does not recognize the \"namespace-prefixes\" feature.");
- } catch (final SAXNotSupportedException e1) {
- this.logger.error("Parser unable to supply namespace-prefixes.");
- }
-
- try {
- parser.setFeature("http://xml.org/sax/features/validation", true);
- } catch (final SAXNotRecognizedException e1) {
- this.logger.error("Parser does not recognize the \"validation\" feature.");
- } catch (final SAXNotSupportedException e1) {
- this.logger.error("Parser unable to validate.");
- }
-
- /* Turn on "notify-char-refs" feature.
- * Sadly, this only works with Xerces.
- * This feature, or something like it is very important.
- * Without it, character entities get transformed into characters
- * without notification.
- * When notified, we can (and do) ignore the transformed characters
- * and use the character entities instead.
- * We do NOT want to change the user's content. */
-// try {
-// parser.setFeature("http://apache.org/xml/features/scanner/notify-char-refs", true);
-// } catch (final SAXNotRecognizedException e) {
-// /* Make this a fatal error. */
-// this.logger.error("Parser cannot report character entities. Aborting.");
-// cleanup();
-// return null;
-// } catch (final SAXNotSupportedException e) {
-// /* Make this a fatal error. */
-// this.logger.error("Parser cannot report character entities. Aborting.");
-// cleanup();
-// return null;
-// }
- return parser;
- }
-
- /**
- * Finalize the processing.
- */
- private void cleanup() {
- }
-
- @Override
- public void startDocument() throws SAXException {
- }
-
-
- @Override
- public void endDocument() throws SAXException {
- }
-
-
- @Override
- public void setDocumentLocator(final Locator locator) {
- this.locator = locator;
- }
-
- @Override
- public void characters(final char ch[], final int start, final int length) throws SAXException {
- this.charBuffer.append(ch, start, length);
- }
-
- @Override
- public void startElement(final String uri, final String localName, final String qName, final Attributes attributes)
- throws SAXException {
- switch(localName) {
- case "w": {
- this.currentPartsOfSpeech = 0;
- break;
- }
- case "t": {
- break;
- }
- case "noun": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.NOUN);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularNoun(this.currentPartsOfSpeech);
- }
- break;
- }
- case "pronoun": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.PRONOUN);
- break;
- }
- case "verb": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.VERB);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularVerb(this.currentPartsOfSpeech);
- }
- break;
- }
- case "adjective": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ADJECTIVE);
- final String regularity = attributes.getValue("regular-root");
- if ("true".equals(regularity)) {
- this.currentPartsOfSpeech = PosUtils.encodeRegularAdjective(this.currentPartsOfSpeech);
- }
- break;
- }
- case "adverb": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ADVERB);
- break;
- }
- case "preposition": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.PREPOSITION);
- break;
- }
- case "conjunction": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.CONJUNCTION);
- break;
- }
- case "article": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ARTICLE);
- break;
- }
- case "interjection": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.INTERJECTION);
- break;
- }
- case "cardinal": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.CARDINAL);
- break;
- }
- case "ordinal": {
- this.currentPartsOfSpeech = PosUtils.encodePosInfo(this.currentPartsOfSpeech, PartOfSpeech.ORDINAL);
- break;
- }
- case "word-group": break;
- case "axsl-dictionary": {
- this.currentDictionary = new DictionaryElement();
- final String language = attributes.getValue(StringUtils.EMPTY_STRING, "language");
- final String country = attributes.getValue(StringUtils.EMPTY_STRING, "country");
- final String script = attributes.getValue(StringUtils.EMPTY_STRING, "script");
- this.currentDictionary.orthography = Orthography4a.find(language, country, script);
- logger.info("Begin dictionary word list parsing: " + this.currentDictionary.orthography.toString());
- final String soft = attributes.getValue(StringUtils.EMPTY_STRING, "soft-hyphen-char");
- if (soft.length() != 1) {
- throw new SAXException("Attribute soft-hyphen-char must have exactly one char.");
- }
- this.currentDictionary.softHyphenChar = soft.charAt(0);
- final String hard = attributes.getValue(StringUtils.EMPTY_STRING, "hard-hyphen-char");
- if (hard.length() != 1) {
- throw new SAXException("Attribute hard-hyphen-char must have exactly one char.");
- }
- this.currentDictionary.hardHyphenChar = hard.charAt(0);
- break;
- }
- case "axsl-dictionaries": break;
- default: {
- throw new IllegalStateException("Unknown element started: " + localName);
- }
- }
- }
-
-
- @Override
- public void endElement(final String uri, final String localName, final String qName) throws SAXException {
- switch(localName) {
- case "w": {
- final StringWord word = new StringWord(this.currentPartsOfSpeech, this.currentSegments);
- final String actualContent = word.getActualContent().toString();
- if (this.logDictionaryProblems) {
- final String actualContentLowercase = actualContent.toLowerCase();
- if (actualContentLowercase.compareTo(this.lastWord) < 0) {
- this.logger.warn("Out of alphabetical sequence: " + actualContent + " " + locationString());
- }
- this.lastWord = actualContentLowercase;
- }
- wordMap.put(actualContent, word);
- break;
- }
- case "t": {
- final String inputLine = this.charBuffer.toString().trim();
- StringUtils.clear(this.charBuffer);
-
- StringUtils.clear(this.builder);
- this.segmentList.clear();
-
- int inputLineIndex = 0;
- while (inputLineIndex < inputLine.length()) {
- final char theChar = inputLine.charAt(inputLineIndex);
- if (theChar == this.currentDictionary.softHyphenChar) {
- if (builder.length() < 1) {
- throw new SAXException("0-length syllable on line: " + locationString());
- }
- final StringWordSegment wordSegment = createSegment(builder.toString());
- segmentList.add(wordSegment);
- segmentSet.add(wordSegment);
- builder.delete(0, builder.length());
- } else {
- if (theChar == this.currentDictionary.hardHyphenChar) {
- builder.append(this.actualHardHyphenChar);
- } else {
- builder.append(theChar);
- }
- }
- inputLineIndex ++;
- }
- if (builder.length() > 0) {
- final StringWordSegment wordSegment = createSegment(builder.toString());
- segmentList.add(wordSegment);
- segmentSet.add(wordSegment);
- }
- if (segmentList.size() < 1) {
- throw new SAXException("0-syllable word: " + this.locationString());
- }
- this.currentSegments = new StringWordSegment[segmentList.size()];
- segmentList.toArray(this.currentSegments);
- break;
- }
- case "noun": break;
- case "pronoun": break;
- case "verb": break;
- case "adjective": break;
- case "adverb": break;
- case "preposition": break;
- case "conjunction": break;
- case "article": break;
- case "interjection": break;
- case "cardinal": break;
- case "ordinal": break;
- case "word-group": break;
- case "axsl-dictionary": {
- logger.info("End parsing for dictionary: " + this.currentDictionary.orthography.toString());
- logger.info("Qty of unique word segments parsed: " + segmentSet.size());
- logger.info("Qty of words parsed: " + wordMap.size());
- break;
- }
- case "axsl-dictionaries": break;
- default: {
- throw new IllegalStateException("Unknown element ended: " + localName);
- }
- }
- }
-
-
- /**
- * Returns the current location in the input document as a formatted string.
- * @return The current location in the input document as a formatted string.
- */
- private String locationString() {
- return String.format(LOCATION_FORMAT_STRING, this.locator.getLineNumber(), this.locator.getColumnNumber());
- }
-
- /**
- * Create a pseudo-string instance.
- * @param string The String instance to replace.
- * @return A new instance of StringWordSegment encapsulating the content in {@code string}.
- */
- private StringWordSegment createSegment(final String string) {
- for (int index = 0; index < this.stringFactories.size(); index ++) {
- final StringWordSegmentFactory<?> factory = this.stringFactories.get(index);
- final StringWordSegment newString = factory.makeInstance(string);
- if (newString != null) {
- return newString;
- }
- }
- return null;
- }
-
- /**
- * Sets flag that tells parser to log warnings about problems found in the dictionary input.
- * @param logDictionaryProblems The logDictionaryProblems to set.
- */
- public void setLogDictionaryProblems(final boolean logDictionaryProblems) {
- this.logDictionaryProblems = logDictionaryProblems;
- }
-
-}
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryResource.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryResource.java 2021-11-06 10:08:20 UTC (rev 12003)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionaryResource.java 2021-11-06 10:32:12 UTC (rev 12004)
@@ -30,6 +30,7 @@
import org.foray.common.resource.ResourceLocation;
import org.foray.common.resource.ResourceLocationUrl;
+import org.foray.hyphen.util.DictionaryParserXml;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/DictionarySerializer...
[truncated message content] |
|
From: <vic...@us...> - 2021-11-06 10:08:22
|
Revision: 12003
http://sourceforge.net/p/foray/code/12003
Author: victormote
Date: 2021-11-06 10:08:20 +0000 (Sat, 06 Nov 2021)
Log Message:
-----------
Normal dictionary updates.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
Added Paths:
-----------
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
Modified: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-05 22:18:02 UTC (rev 12002)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-999-Latn.dict.xml 2021-11-06 10:08:20 UTC (rev 12003)
@@ -67,7 +67,6 @@
<w><t>Aar-gau</t></w>
<w><t>Aar-hus</t></w>
<w><t>Aar-on</t></w>
-<w><t>Aa-ron</t></w>
<w><t>Aa-ron's beard</t></w>
<w><t>Aa-ron's rod</t></w>
<w><t>Aa-ron's=beard</t></w>
@@ -19627,7 +19626,7 @@
<w><t>Brig-ham</t></w>
<w><t>Brig-house</t></w>
<w><t>Bright</t></w>
-<w><t>bright</t></w>
+<w><t>bright</t><adjective regular-root="true"/></w>
<w><t>Bright's dis-ease</t></w>
<w><t>bright-en</t></w>
<w><t>bright-en-er</t></w>
@@ -21990,7 +21989,7 @@
<w><t>calk</t></w>
<w><t>calk-er</t></w>
<w><t>cal-kin</t></w>
-<w><t>call</t></w>
+<w><t>call</t><verb regular-root="true"/></w>
<w><t>call let-ters</t></w>
<w><t>call mon-ey</t></w>
<w><t>call num-ber</t></w>
@@ -32061,7 +32060,7 @@
<w><t>co-no-scen-ti</t></w>
<w><t>co-no-scope</t></w>
<w><t>co-no-scop-ic</t></w>
-<w><t>con-quer</t></w>
+<w><t>con-quer</t><verb regular-root="true"/></w>
<w><t>con-quer-a-ble</t></w>
<w><t>con-quer-a-ble-ness</t></w>
<w><t>con-quer-ing-ly</t></w>
@@ -74893,7 +74892,7 @@
<w><t>im-pre-scrip-ti-ble</t></w>
<w><t>im-pre-scrip-ti-bly</t></w>
<w><t>im-prese</t></w>
-<w><t>im-press</t></w>
+<w><t>im-press</t><verb regular-root="true"/></w>
<w><t>im-press-er</t></w>
<w><t>im-press-i-bil-i-ty</t></w>
<w><t>im-press-i-ble</t></w>
@@ -84704,7 +84703,6 @@
<w><t>la-bi-o-ve-lar-iz-ing</t></w>
<w><t>la-bi-um</t></w>
<w><t>lab-lab</t></w>
-<w><t>la-bor</t></w>
<w><t>La-bor Par-ty</t></w>
<w><t>la-bor un-ion</t></w>
<w><t>la-bor=sav-ing</t></w>
@@ -89063,7 +89061,7 @@
<w><t>lone-some-ness</t></w>
<w><t>Lo-ney</t></w>
<w><t>Long</t></w>
-<w><t>long</t></w>
+<w><t>long</t><adjective regular-root="true"/></w>
<w><t>Long Ea-ton</t></w>
<w><t>long hun-dred-weight</t></w>
<w><t>Long Is-land</t></w>
@@ -89103,7 +89101,6 @@
<w><t>long-cloth</t></w>
<w><t>longe</t></w>
<w><t>longe-ing</t></w>
-<w><t>long-er</t></w>
<w><t>lon-ge-ron</t></w>
<w><t>lon-gev-i-ty</t></w>
<w><t>lon-ge-vous</t></w>
@@ -91707,7 +91704,7 @@
<w><t>man-i-cure</t></w>
<w><t>man-i-cur-ist</t></w>
<w><t>man-i-fer</t></w>
-<w><t>man-i-fest</t></w>
+<w><t>man-i-fest</t><verb regular-root="true"/></w>
<w><t>Man-i-fest Des-ti-ny</t></w>
<w><t>man-i-fes-tant</t></w>
<w><t>man-i-fes-ta-tion</t></w>
@@ -98455,7 +98452,7 @@
<w><t>mov-a-ble</t></w>
<w><t>mov-a-ble-ness</t></w>
<w><t>mov-a-bly</t></w>
-<w><t>move</t></w>
+<w><t>move</t><verb regular-root="true"/></w>
<w><t>move-a-bil-i-ty</t></w>
<w><t>move-a-ble</t></w>
<w><t>move-a-ble-ness</t></w>
@@ -108194,7 +108191,7 @@
<w><t>O-ber-o-ster-reich</t></w>
<w><t>o-bese</t></w>
<w><t>o-bes-i-ty</t></w>
-<w><t>o-bey</t></w>
+<w><t>o-bey</t><verb regular-root="true"/></w>
<w><t>o-bey-a-ble</t></w>
<w><t>o-bey-er</t></w>
<w><t>o-bey-ing-ly</t></w>
@@ -128519,7 +128516,7 @@
<w><t>prov-a-bly</t></w>
<w><t>pro-vac-ci-na-tion</t></w>
<w><t>pro-vac-cine</t></w>
-<w><t>prove</t></w>
+<w><t>prove</t><verb regular-root="true"/></w>
<w><t>prov-en</t></w>
<w><t>prov-e-nance</t></w>
<w><t>Pro-ven-cal</t></w>
@@ -130455,7 +130452,7 @@
<w><t>qual-i-ta-tive</t></w>
<w><t>qual-i-ta-tive a-nal-y-sis</t></w>
<w><t>qual-i-ta-tive-ly</t></w>
-<w><t>qual-i-ty</t></w>
+<w><t>qual-i-ty</t><noun regular-root="true"/></w>
<w><t>qual-i-ty con-trol</t></w>
<w><t>qual-i-ty-less</t></w>
<w><t>qualm</t></w>
@@ -141666,7 +141663,7 @@
<w><t>sa-vants</t></w>
<w><t>sav-a-rin</t></w>
<w><t>sa-vate</t></w>
-<w><t>save</t></w>
+<w><t>save</t><verb regular-root="true"/></w>
<w><t>Save</t></w>
<w><t>save-a-ble</t></w>
<w><t>save-a-ble-ness</t></w>
@@ -148728,7 +148725,7 @@
<w><t>sli-er</t></w>
<w><t>sliest</t></w>
<w><t>sli-est</t></w>
-<w><t>slight</t></w>
+<w><t>slight</t><adjective regular-root="true"/></w>
<w><t>slight-er</t></w>
<w><t>slight-ing</t></w>
<w><t>slight-ing-ly</t></w>
@@ -160207,10 +160204,11 @@
<w><t>tea-ber-ry</t></w>
<w><t>tea-cake</t></w>
<w><t>tea-cart</t></w>
-<w><t>teach</t></w>
+<w><t>teach</t><verb regular-root="false"/></w>
<w><t>teach-er</t></w>
<w><t>teach-er-less</t></w>
<w><t>teach-er-ship</t></w>
+<w><t>teach-es</t><verb regular-root="false"/></w>
<w><t>teach-ing</t></w>
<w><t>teach-ing aid</t></w>
<w><t>teach-ing fel-low</t></w>
@@ -184684,7 +184682,7 @@
<w><t>width-wise</t></w>
<w><t>Wi-du-kind</t></w>
<w><t>Wie-land</t></w>
-<w><t>wield</t></w>
+<w><t>wield</t><verb regular-root="true"/></w>
<w><t>wield-a-ble</t></w>
<w><t>wield-er</t></w>
<w><t>wield-i-er</t></w>
@@ -184881,7 +184879,7 @@
<w><t>Wil-ming-to-ni-an</t></w>
<w><t>Wil-more</t></w>
<w><t>Wil-no</t></w>
-<w><t>Wil-son</t></w>
+<w><t>Wil-son</t><noun regular-root="true"/></w>
<w><t>Wil-son cloud cham-ber</t></w>
<w><t>Wil-son's pet-rel</t></w>
<w><t>Wil-son's snipe</t></w>
Added: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml (rev 0)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml 2021-11-06 10:08:20 UTC (rev 12003)
@@ -0,0 +1,22 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!DOCTYPE axsl-dictionary
+ PUBLIC "-//aXSL//DTD Dictionary V0.1//EN"
+ "http://www.axsl.org/dtds/0.1/en/axsl-dictionary.dtd">
+
+<axsl-dictionary language="eng" script="Latn" hard-hyphen-char="="
+ soft-hyphen-char="-">
+
+<!--
+This dictionary contains the British spelling of English words whose spellings
+differ between American and British usage.
+See eng-USA-Latn.dict.xml for the American spellings.
+Words that are common to both dialects should be placed in
+eng-999-Latn.dict.xml.
+-->
+
+<w><t>co=la-bour-er</t><noun regular-root="true"/></w>
+<w><t>la-bour</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>la-boured</t><adjective/></w>
+<w><t>la-bour-er</t><noun regular-root="true"/></w>
+</axsl-dictionary>
Property changes on: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-GBR-Latn.dict.xml
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Rev
\ No newline at end of property
Added: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
===================================================================
--- trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml (rev 0)
+++ trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml 2021-11-06 10:08:20 UTC (rev 12003)
@@ -0,0 +1,22 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!DOCTYPE axsl-dictionary
+ PUBLIC "-//aXSL//DTD Dictionary V0.1//EN"
+ "http://www.axsl.org/dtds/0.1/en/axsl-dictionary.dtd">
+
+<axsl-dictionary language="eng" script="Latn" hard-hyphen-char="="
+ soft-hyphen-char="-">
+
+<!--
+This dictionary contains the American spelling of English words whose spellings
+differ between American and British usage.
+See eng-GBR-Latn.dict.xml for the British spellings.
+Words that are common to both dialects should be placed in
+eng-999-Latn.dict.xml.
+-->
+
+<w><t>co=la-bor-er</t><noun regular-root="true"/></w>
+<w><t>la-bor</t><noun regular-root="true"/><verb regular-root="true"/></w>
+<w><t>la-bored</t><adjective/></w>
+<w><t>la-bor-er</t><noun regular-root="true"/></w>
+</axsl-dictionary>
Property changes on: trunk/foray/foray-hyphen/src/main/data/dictionaries/eng-USA-Latn.dict.xml
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Rev
\ No newline at end of property
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|
|
From: <vic...@us...> - 2021-11-05 22:18:04
|
Revision: 12002
http://sourceforge.net/p/foray/code/12002
Author: victormote
Date: 2021-11-05 22:18:02 +0000 (Fri, 05 Nov 2021)
Log Message:
-----------
If initial-cap words are not found in the dictionari(es), convert first char to lowercase and look again.
Modified Paths:
--------------
trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
Modified: trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java
===================================================================
--- trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-05 19:08:15 UTC (rev 12001)
+++ trunk/foray/foray-hyphen/src/main/java/org/foray/hyphen/OrthographyConfig4a.java 2021-11-05 22:18:02 UTC (rev 12002)
@@ -310,10 +310,26 @@
/* 6. Check derivative matches in standard dictionaries for the orthography. */
if (orthoDictionary != null) {
- return isDerivativeFound(orthoDictionary, wordChars);
+ if (isDerivativeFound(orthoDictionary, wordChars)) {
+ return true;
+ }
}
/* Not found in any dictionary. */
+ /* If the first character is uppercase, convert to lowercase and try again. Discussion: For English at least, we
+ * do not want the opposite effect, i.e. to convert words starting with lowercase have the first char converted
+ * to uppercase. If the word is in the dictionary as a proper noun, we should treat a failure to capitalize it
+ * as a spelling error. Also, we do not want to generally convert the entire word to lowercase, as capital
+ * letters in the middle of the word should normally be treated as a spelling error. For exceptions to this
+ * last rule, users should enter the oddly-capitalized word into a dictionary in that form.
+ * TODO: This capability should be included in the orthography configuration instead of being hard-coded
+ * here. */
+ if (Character.isUpperCase(wordChars.charAt(0))) {
+ final StringBuilder builder = new StringBuilder(wordChars);
+ builder.setCharAt(0, Character.toLowerCase(wordChars.charAt(0)));
+ return isValidWord(builder, pos, adhocDictionaries);
+ }
+
return false;
}
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|