axsl-commit Mailing List for aXSL (Page 7)
An API for XSL-FO.
Status: Alpha
Brought to you by:
victormote
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(36) |
Apr
(36) |
May
(127) |
Jun
(193) |
Jul
(12) |
Aug
(46) |
Sep
(66) |
Oct
(28) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(39) |
Feb
(68) |
Mar
(58) |
Apr
(88) |
May
(40) |
Jun
(82) |
Jul
(213) |
Aug
(19) |
Sep
(2) |
Oct
(26) |
Nov
(2) |
Dec
|
2008 |
Jan
(5) |
Feb
(30) |
Mar
(26) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(4) |
Apr
(44) |
May
(1) |
Jun
(9) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
(4) |
Feb
(4) |
Mar
|
Apr
(7) |
May
(35) |
Jun
|
Jul
|
Aug
(48) |
Sep
(10) |
Oct
(1) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(40) |
2017 |
Jan
(82) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(15) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(37) |
Mar
(28) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(7) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(27) |
2021 |
Jan
(52) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(8) |
Nov
(72) |
Dec
(100) |
2022 |
Jan
(119) |
Feb
(94) |
Mar
(4) |
Apr
|
May
|
Jun
(5) |
Jul
(3) |
Aug
(2) |
Sep
|
Oct
|
Nov
(10) |
Dec
(97) |
2023 |
Jan
(52) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(17) |
Sep
(21) |
Oct
(8) |
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
(11) |
Feb
(1) |
Mar
|
Apr
(27) |
May
(62) |
Jun
(27) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <vic...@us...> - 2023-09-21 12:38:43
|
Revision: 2733 http://sourceforge.net/p/axsl/code/2733 Author: victormote Date: 2023-09-21 12:38:38 +0000 (Thu, 21 Sep 2023) Log Message: ----------- Add option for nouns to have a number of "number-any". Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-parts-of-speech.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-parts-of-speech.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-parts-of-speech.dtd 2023-09-20 11:35:59 UTC (rev 2732) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-parts-of-speech.dtd 2023-09-21 12:38:38 UTC (rev 2733) @@ -23,7 +23,7 @@ A noun. --> <!ELEMENT noun ( - (singular | plural | pluralizable?)?, + (singular | plural | pluralizable | number-any)?, ((masculine?, feminine?, neuter?) | gender-any)?, convertible-to-possessive? This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-20 11:36:01
|
Revision: 2732 http://sourceforge.net/p/axsl/code/2732 Author: victormote Date: 2023-09-20 11:35:59 +0000 (Wed, 20 Sep 2023) Log Message: ----------- Convert Lexer an Iterator. Track WritingMode for explicit words. Modified Paths: -------------- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-19 18:19:24 UTC (rev 2731) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-20 11:35:59 UTC (rev 2732) @@ -25,7 +25,7 @@ import org.axsl.i18n.WritingSystem; -import java.util.List; +import java.util.Iterator; /** * <p>Implementations know how to break a character sequence into words and interword content. @@ -32,25 +32,73 @@ * This interface is part of the "optional" package because it is quite possible to provide orthography information * without any implementations of this interface.</p> * - * <p>Implementations of this interface can be used in an axsl-orthography-config XML document "lexer" element, to - * specify for the orthography what class should be used to perform the lexing task. + * <p>The {@link Lexer} begins processing in an empty and unlocked state. + * Client code adds content using {@link #addUntokenized(CharSequence, WritingSystem)} and + * {@link #addWordToken(CharSequence, WritingSystem)} as needed until some logical break point. + * When all content has been added, client code calls the {@link #lock()} method, which puts the Lexer into the locked + * state. + * The results of the tokenization can then be iterated by client code using the {@link #hasNext()} and {@link #next()} + * methods. + * When all results have been iterated, the client code calls {@link #clear()} to remove all content from the Lexer + * and reset it to the unlocked state, so that the sequence can be repeated.</p> * - * <p>The {@link Lexer} begins processing in an "empty" state. - * Content is then added to it using {@link #addUntokenized(CharSequence, WritingSystem)} and - * {@link #addWordToken(CharSequence)} as needed until some logical break point. - * The results of the tokenization are then returned by {@link #process()}, which also resets the Lexer to the empty - * state, and the cycle can be repeated. - * This design allows any pre-processing tokenization to be accepted as such.</p> - * * <p>The break point mentioned above to trigger processing should be some point that has an unambiguous "end." - * In other words, there should be no content after it that could affect the results of the content before it. </p> + * In other words, there should be no content after it that could affect the results of the content before it. + * A good candidate for this break point is the end of a sentence or paragraph.</p> */ -public interface Lexer { +public interface Lexer extends Iterator<Lexer.Token> { /** + * Enumeration of valid token types that can be returned by this Lexer. + */ + enum TokenType { + + /** Token is a word. */ + WORD, + + /** Token is inter-word whitespace. */ + WHITESPACE, + + /** Token is inter-word punctuation. */ + PUNCTUATION + } + + /** + * <p>One token resulting from the tokenization of this Lexer.</p> + * + * <p>Design note: It is tempting to try to use the existing subinterfaces of {@link org.axsl.orthography.TextToken} + * to combine the {@link #getText()} and {@link #getTokenType()} methods in this type. + * However, the purpose of those interfaces is higher-level than we want here, and using them would be klunky and + * confusing. + * The purpose here is to identify chunks of text that can later be converted into or replaced by those higher-level + * concepts.</p> + */ + interface Token { + + /** + * Returns the text of the token. + * @return The text of the token. + */ + CharSequence getText(); + + /** + * Returns the type of the token. + * @return The type of the token. + */ + TokenType getTokenType(); + + /** + * Returns the writing system of the token. + * @return The writing system of the token. + */ + WritingSystem getWritingSystem(); + } + + /** * Adds a sequence of untokenized content. * @param sequence The untokenized sequence to be added. * @param writingSystem The writing system to be used to tokenize {@code sequence}. + * @throws IllegalStateException If the Lexer is in the "locked" state. */ void addUntokenized(CharSequence sequence, WritingSystem writingSystem); @@ -59,20 +107,37 @@ * For content that has no pre-processed tokens, add all content using * {@link #addUntokenized(CharSequence, WritingSystem)} instead. * @param sequence The word token to be added. + * @param writingSystem The writing system of {@code sequence}. + * Since this item has already been tokenized, the writing system is not needed for that purpose, but is retained + * for downstream processes, which may need it for other purposes, such as dictionary lookup. + * @throws IllegalStateException If the Lexer is in the "locked" state. */ - void addWordToken(CharSequence sequence); + void addWordToken(CharSequence sequence, WritingSystem writingSystem); /** - * Processes the content added in {@link #addUntokenized(CharSequence, WritingSystem)} and - * {@link #addWordToken(CharSequence)}, returns the word and interword content of that content, and resets the lexer - * to the "empty" state so that it can begin processing again. - * @return The list of word and interword content tokenized from - * {@link #addUntokenized(CharSequence, WritingSystem)} and {@link #addWordToken(CharSequence)}. - * Even-numbered elements in the list always contain a word, and odd-numbered indexes always contain interword - * content. - * In the case that the sequence actually starts with interword content (instead of the more normal case of starting - * with a word), the first element (at index 0) will be an empty sequence. + * Puts this Lexer into the "locked" state, preventing additional content from being added, and allowing the results + * to be iterated. */ - List<CharSequence> process(); + void lock(); + /** + * Clears the content of the lexer and unlocks it so that it can be reused. + */ + void clear(); + + /** + * Returns the next token reported by this lexer from the content added in + * {@link #addUntokenized(CharSequence, WritingSystem)} and + * {@link #addWordToken(CharSequence, WritingSystem)}. + * Note that the object returned by this method is <em>mutable,</em> and may be changed by this Lexer after + * processing resumes. + * Any values that need to be retained by client applications must be preserved by such applications immediately. + * Also, the {@link CharSequence} returned by {@link Token#getText()} may also be mutable, and, if its value needs + * to be retained by client applications, must be copied into different location. + * The easiest way to convert the value to an immutable object is {@link CharSequence#toString()}. + * @return The next token reported by this lexer. + * @throws IllegalStateException If the Lexer is <em>not</em> in the "locked" state. + */ + Token next(); + } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-19 18:19:26
|
Revision: 2731 http://sourceforge.net/p/axsl/code/2731 Author: victormote Date: 2023-09-19 18:19:24 +0000 (Tue, 19 Sep 2023) Log Message: ----------- Add element "foreign". Remove ability of "text" elements to be nested. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-19 16:53:40 UTC (rev 2730) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-19 18:19:24 UTC (rev 2731) @@ -41,7 +41,7 @@ original document. This could be the line/column number, "98:24" for example, or perhaps an XPath. --> -<!ELEMENT text (#PCDATA | text | word)*> +<!ELEMENT text (#PCDATA | word | foreign)*> <!ATTLIST text xml:lang CDATA #IMPLIED location CDATA #IMPLIED @@ -64,11 +64,27 @@ Attributes: 1. "xml:lang" is used to determine which dictionary(ies) should be used for the spell-checking. +2. "location" stores an optional clue about where the element was located in the + original document. This could be the line/column number, "98:24" for example, + or perhaps an XPath. --> <!ELEMENT word (#PCDATA)> <!ATTLIST word xml:lang CDATA #IMPLIED + location CDATA #IMPLIED > +<!-- +Marks a sequence of text as having a different writing system than the +surrounding text. +Such content does not mark the end of a processing segment, but only an +interruption in it. +--> +<!ELEMENT foreign (#PCDATA | word)* > +<!ATTLIST foreign + xml:lang CDATA #IMPLIED + location CDATA #IMPLIED +> + <!-- Last Line of DTD --> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-19 16:53:43
|
Revision: 2730 http://sourceforge.net/p/axsl/code/2730 Author: victormote Date: 2023-09-19 16:53:40 +0000 (Tue, 19 Sep 2023) Log Message: ----------- Remove lexer from the orthography configuration. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-09-19 13:03:57 UTC (rev 2729) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-09-19 16:53:40 UTC (rev 2730) @@ -147,21 +147,7 @@ > -<!ELEMENT lexer EMPTY> <!-- -1. class: The fully-qualified class name of an implementation of - org.axsl.orthography.Lexer. - Such classes know how to break a string of text (a paragraph or block) into - words using the specific rules for a given orthography. - For example, English allows allows an apostrophe or closing single quotation - mark within a word to mark a contraction or possession. ---> -<!ATTLIST lexer - class CDATA #REQUIRED -> - - -<!-- Describes patterns in a resource file that should be excluded when building the resource. --> @@ -259,7 +245,7 @@ <!ELEMENT unparsed-hyphenation-patterns (resource-location)> -<!ELEMENT orthography (explicit-tokens*, lexer, match-rules*, +<!ELEMENT orthography (explicit-tokens*, match-rules*, derivative-rules?, dictionary?, hyphenation-patterns?, derivative-factories?) > This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-19 13:03:59
|
Revision: 2729 http://sourceforge.net/p/axsl/code/2729 Author: victormote Date: 2023-09-19 13:03:57 +0000 (Tue, 19 Sep 2023) Log Message: ----------- Remove link between lexer and writing system in orthography config. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-09-19 12:06:42 UTC (rev 2728) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-09-19 13:03:57 UTC (rev 2729) @@ -155,18 +155,9 @@ words using the specific rules for a given orthography. For example, English allows allows an apostrophe or closing single quotation mark within a word to mark a contraction or possession. -2. language-iso-3char: The 3-character ISO-639-2/T code for the language being - configured. For example, for English: "eng". -3. script-iso-4char: The 4-character ISO-15924 code for the script being - configured. For example, for Latin: "Latn". -4. country-iso-3char: The 3-character ISO-3166-1 code for the language being - configured. For example, for Canada: "CAN". --> <!ATTLIST lexer class CDATA #REQUIRED - language-iso-3char CDATA #IMPLIED - script-iso-4char CDATA #IMPLIED - country-iso-3char CDATA #IMPLIED > This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-19 12:06:51
|
Revision: 2728 http://sourceforge.net/p/axsl/code/2728 Author: victormote Date: 2023-09-19 12:06:42 +0000 (Tue, 19 Sep 2023) Log Message: ----------- 1. Return Locale and word break iterator for WritingSystem. 2. Add WritingSystem to API for tokenizing a sequence of characters. Modified Paths: -------------- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java Modified: trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java =================================================================== --- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-09-18 21:04:07 UTC (rev 2727) +++ trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-09-19 12:06:42 UTC (rev 2728) @@ -23,6 +23,8 @@ package org.axsl.i18n; +import java.util.Locale; + /** * <p>A combination of language, script, and country that describes a unique writing system. * This interface is intended to capture the general information encoded in the {@code xml:lang} attribute. @@ -119,4 +121,17 @@ */ boolean satisfies(WritingSystem other); + /** + * Returns the Java Locale that is closest to this writing system. + * @return The Java Locale. + */ + Locale toLocale(); + + /** + * Returns the {@link java.text.BreakIterator} to be used for finding word breaks for this writing system. + * Because these iterators are expensive to create, implementations may want to cache them. + * @return The {@link java.text.BreakIterator} to be used for finding word breaks for this writing system. + */ + java.text.BreakIterator getWordBreakIterator(); + } Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-18 21:04:07 UTC (rev 2727) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-19 12:06:42 UTC (rev 2728) @@ -23,6 +23,8 @@ package org.axsl.orthography.optional; +import org.axsl.i18n.WritingSystem; + import java.util.List; /** @@ -34,8 +36,8 @@ * specify for the orthography what class should be used to perform the lexing task. * * <p>The {@link Lexer} begins processing in an "empty" state. - * Content is then added to it using {@link #addUntokenized(CharSequence)} and {@link #addWordToken(CharSequence)} as - * needed until some logical break point. + * Content is then added to it using {@link #addUntokenized(CharSequence, WritingSystem)} and + * {@link #addWordToken(CharSequence)} as needed until some logical break point. * The results of the tokenization are then returned by {@link #process()}, which also resets the Lexer to the empty * state, and the cycle can be repeated. * This design allows any pre-processing tokenization to be accepted as such.</p> @@ -48,23 +50,24 @@ /** * Adds a sequence of untokenized content. * @param sequence The untokenized sequence to be added. + * @param writingSystem The writing system to be used to tokenize {@code sequence}. */ - void addUntokenized(CharSequence sequence); + void addUntokenized(CharSequence sequence, WritingSystem writingSystem); /** * Adds a token already known to be a word. - * For content that has no pre-processed tokens, add all content using {@link #addUntokenized(CharSequence)} - * instead. + * For content that has no pre-processed tokens, add all content using + * {@link #addUntokenized(CharSequence, WritingSystem)} instead. * @param sequence The word token to be added. */ void addWordToken(CharSequence sequence); /** - * Processes the content added in {@link #addUntokenized(CharSequence)} and {@link #addWordToken(CharSequence)}, - * returns the word and interword content of that content, and resets the lexer to the "empty" state so that it can - * begin processing again. - * @return The list of word and interword content tokenized from {@link #addUntokenized(CharSequence)} and - * {@link #addWordToken(CharSequence)}. + * Processes the content added in {@link #addUntokenized(CharSequence, WritingSystem)} and + * {@link #addWordToken(CharSequence)}, returns the word and interword content of that content, and resets the lexer + * to the "empty" state so that it can begin processing again. + * @return The list of word and interword content tokenized from + * {@link #addUntokenized(CharSequence, WritingSystem)} and {@link #addWordToken(CharSequence)}. * Even-numbered elements in the list always contain a word, and odd-numbered indexes always contain interword * content. * In the case that the sequence actually starts with interword content (instead of the more normal case of starting This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-18 21:04:09
|
Revision: 2727 http://sourceforge.net/p/axsl/code/2727 Author: victormote Date: 2023-09-18 21:04:07 +0000 (Mon, 18 Sep 2023) Log Message: ----------- Change API for Lexer, to handle pre-tokenized word content. Modified Paths: -------------- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-16 00:51:28 UTC (rev 2726) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Lexer.java 2023-09-18 21:04:07 UTC (rev 2727) @@ -28,22 +28,48 @@ /** * <p>Implementations know how to break a character sequence into words and interword content. * This interface is part of the "optional" package because it is quite possible to provide orthography information - * without any implementations of it.</p> + * without any implementations of this interface.</p> * * <p>Implementations of this interface can be used in an axsl-orthography-config XML document "lexer" element, to * specify for the orthography what class should be used to perform the lexing task. + * + * <p>The {@link Lexer} begins processing in an "empty" state. + * Content is then added to it using {@link #addUntokenized(CharSequence)} and {@link #addWordToken(CharSequence)} as + * needed until some logical break point. + * The results of the tokenization are then returned by {@link #process()}, which also resets the Lexer to the empty + * state, and the cycle can be repeated. + * This design allows any pre-processing tokenization to be accepted as such.</p> + * + * <p>The break point mentioned above to trigger processing should be some point that has an unambiguous "end." + * In other words, there should be no content after it that could affect the results of the content before it. </p> */ public interface Lexer { /** - * Returns the word and interword content of a given character sequence. - * Even-numbered indexes in the array always contain a word, and odd-numbered indexes always contain interword + * Adds a sequence of untokenized content. + * @param sequence The untokenized sequence to be added. + */ + void addUntokenized(CharSequence sequence); + + /** + * Adds a token already known to be a word. + * For content that has no pre-processed tokens, add all content using {@link #addUntokenized(CharSequence)} + * instead. + * @param sequence The word token to be added. + */ + void addWordToken(CharSequence sequence); + + /** + * Processes the content added in {@link #addUntokenized(CharSequence)} and {@link #addWordToken(CharSequence)}, + * returns the word and interword content of that content, and resets the lexer to the "empty" state so that it can + * begin processing again. + * @return The list of word and interword content tokenized from {@link #addUntokenized(CharSequence)} and + * {@link #addWordToken(CharSequence)}. + * Even-numbered elements in the list always contain a word, and odd-numbered indexes always contain interword * content. * In the case that the sequence actually starts with interword content (instead of the more normal case of starting - * with a word), element 0 will contain an empty string. - * @param sequence The character sequence to be parsed. - * @return The list of word and interword content parsed from {@code sequence}. + * with a word), the first element (at index 0) will be an empty sequence. */ - List<CharSequence> tokenize(CharSequence sequence); + List<CharSequence> process(); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-16 00:51:30
|
Revision: 2726 http://sourceforge.net/p/axsl/code/2726 Author: victormote Date: 2023-09-16 00:51:28 +0000 (Sat, 16 Sep 2023) Log Message: ----------- Add attribute to capture the original location. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-15 20:38:23 UTC (rev 2725) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-16 00:51:28 UTC (rev 2726) @@ -33,10 +33,18 @@ Whereas the "word" element defines an explicit word, other text in a "text" element is considered to be a series of implicit words and inter-word content that will be tokenized into those components. + +Attributes: +1. "xml:lang" is used to determine which dictionary(ies) should be used for the + spell-checking. +2. "location" stores an optional clue about where the element was located in the + original document. This could be the line/column number, "98:24" for example, + or perhaps an XPath. --> <!ELEMENT text (#PCDATA | text | word)*> <!ATTLIST text xml:lang CDATA #IMPLIED + location CDATA #IMPLIED > @@ -52,6 +60,10 @@ or "Paulo", but might have an entry for the capital of Brazil, "São Paulo". Enclosing such text in an explicit word allows the spell checker to treat it as a single word. + +Attributes: +1. "xml:lang" is used to determine which dictionary(ies) should be used for the + spell-checking. --> <!ELEMENT word (#PCDATA)> <!ATTLIST word This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-15 20:38:25
|
Revision: 2725 http://sourceforge.net/p/axsl/code/2725 Author: victormote Date: 2023-09-15 20:38:23 +0000 (Fri, 15 Sep 2023) Log Message: ----------- Add "word" element for pre-tokenized words. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-07 10:24:32 UTC (rev 2724) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-spell-check-input.dtd 2023-09-15 20:38:23 UTC (rev 2725) @@ -1,11 +1,12 @@ <?xml version="1.0" encoding="UTF-8"?> <!-- -Document Type Definition (DTD) for an XML document that is an intermediate stage between a semantic document and a -spell-checker. +Document Type Definition (DTD) for an XML document that is an intermediate stage +between a semantic document and a spell-checker. -The idea is that for any given semantic document DTD/schema, a stylesheet can be created that converts an instance of -that schema into this simpler format, thus hiding the intracacies of the schema from the spell-checking processor, which +The idea is that for any given semantic document DTD/schema, a stylesheet can be +created that converts an instance of that schema into this simpler format, thus +hiding the intracacies of the schema from the spell-checking processor, which only needs to know how to handle the simpler format. Use the following public and system IDs for this DTD: @@ -26,13 +27,36 @@ <!-- Element "text" optionally and recursively may contain child "text" elements. -Any of these may optionally declare an xml:lang attribute to define the orthography that should be used by its -descendants for purposes of spell-checking. +Any of these may optionally declare an xml:lang attribute to define the +orthography that should be used by its descendants for purposes of +spell-checking. +Whereas the "word" element defines an explicit word, other text in a "text" +element is considered to be a series of implicit words and inter-word content +that will be tokenized into those components. --> -<!ELEMENT text (#PCDATA | text)*> +<!ELEMENT text (#PCDATA | text | word)*> <!ATTLIST text xml:lang CDATA #IMPLIED > +<!-- +Marks a sequence of text as being pre-processed as an explicit word. +Any downstream lexing/tokenizing is required to treat the content as an explicit +word for purposes of spell-checking. +This is useful at least two cases: +1. Words may contain ambiguous characters. For example, in English, the +abbreviations "i.e." or "i. e." are tricky to tokenize because the periods might +be interepreted either as full stops or abbreviation markers. +2. Open compound words. An English dictionary might not have entries for "São" +or "Paulo", but might have an entry for the capital of Brazil, "São Paulo". +Enclosing such text in an explicit word allows the spell checker to treat it as +a single word. +--> +<!ELEMENT word (#PCDATA)> +<!ATTLIST word + xml:lang CDATA #IMPLIED +> + + <!-- Last Line of DTD --> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-07 11:31:15
|
Revision: 2723 http://sourceforge.net/p/axsl/code/2723 Author: victormote Date: 2023-09-07 10:23:51 +0000 (Thu, 07 Sep 2023) Log Message: ----------- Change imported dictionary concept to allow multiple entries. Modified Paths: -------------- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-09-06 21:47:31 UTC (rev 2722) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-09-07 10:23:51 UTC (rev 2723) @@ -25,7 +25,6 @@ import org.axsl.i18n.WritingSystem; import org.axsl.orthography.Orthography; -import org.axsl.orthography.OrthographyServer; import org.axsl.orthography.Word; import java.io.Serializable; @@ -98,10 +97,10 @@ boolean isExcludedWord(CharSequence wordChars); /** - * Returns the parent dictionary, if any. - * @param server The orthography server that knows how to resolve the parent dictionary. - * @return The parent dictionary, or null if there is none. + * Returns the list of imported dictionary IDs. + * @return The list of imported dictionary IDs for this dictionary. + * This should never be null, but can be empty. */ - Dictionary getParentDictionary(OrthographyServer server); + List<String> getImportedDictionaries(); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-07 10:24:35
|
Revision: 2724 http://sourceforge.net/p/axsl/code/2724 Author: victormote Date: 2023-09-07 10:24:32 +0000 (Thu, 07 Sep 2023) Log Message: ----------- Remove isValid method and add isNotSpecified method. Modified Paths: -------------- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/Country.java Modified: trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/Country.java =================================================================== --- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/Country.java 2023-09-07 10:23:51 UTC (rev 2723) +++ trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/Country.java 2023-09-07 10:24:32 UTC (rev 2724) @@ -58,12 +58,11 @@ String getEnglishName(); /** - * Indicates whether this country code is a valid country code or not. - * Implementations may wish to create placeholder country codes that indicate that they have tried to obtain a - * country code and failed. - * Such placeholder instances should return false. - * @return True if and only if this country instance is valid. + * Indicates whether this country code should be interpreted as "not specified" for language matching purposes. + * Implementations may wish to create placeholder country code(s) that indicate that no country has been specified. + * Such placeholder instances should return true. + * @return True if and only if this instance should be treated as "not specified". */ - boolean isValid(); + boolean isNotSpecified(); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-06 21:47:32
|
Revision: 2722 http://sourceforge.net/p/axsl/code/2722 Author: victormote Date: 2023-09-06 21:47:31 +0000 (Wed, 06 Sep 2023) Log Message: ----------- Fix to axsl-dictionary model. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-09-06 21:24:41 UTC (rev 2721) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-09-06 21:47:31 UTC (rev 2722) @@ -110,7 +110,7 @@ This is suitable as a root element for files that need to handle only one orthography. --> -<!ELEMENT axsl-dictionary (import-dictionary*, (w | word-group | phrase))*> +<!ELEMENT axsl-dictionary (import-dictionary*, (w | word-group | phrase)*)> <!-- 1. id: Used to allow one dictionary to point to another. It is an error for more than one dictionary document to have the same id, although that must This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-09-06 21:24:44
|
Revision: 2721 http://sourceforge.net/p/axsl/code/2721 Author: victormote Date: 2023-09-06 21:24:41 +0000 (Wed, 06 Sep 2023) Log Message: ----------- Move override dictionary attribute to new element import-dictionary, and allow multiple instances. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-08-29 15:47:50 UTC (rev 2720) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-09-06 21:24:41 UTC (rev 2721) @@ -110,7 +110,7 @@ This is suitable as a root element for files that need to handle only one orthography. --> -<!ELEMENT axsl-dictionary (w | word-group | phrase)*> +<!ELEMENT axsl-dictionary (import-dictionary*, (w | word-group | phrase))*> <!-- 1. id: Used to allow one dictionary to point to another. It is an error for more than one dictionary document to have the same id, although that must @@ -159,7 +159,6 @@ --> <!ATTLIST axsl-dictionary id CDATA #REQUIRED - overrides CDATA #IMPLIED language CDATA #REQUIRED script CDATA #IMPLIED country CDATA #IMPLIED @@ -169,6 +168,18 @@ > <!-- +An optional ancillary dictionary, usually specialized in nature, that is to be effectively imported into this +dictionary. +This allows document-specific dictionaries to include such specialized items as (for example) medical or legal terms, +Biblical names, etc. +Implementations need not actually import these dictionaries, but should behave as if they have. +--> +<!ELEMENT import-dictionary EMPTY> +<!ATTLIST import-dictionary + dictionary-id CDATA #REQUIRED +> + +<!-- Optional element containing two or more words that have identical spelling, but that have different semantics. Currently the only semantic difference contemplated is a difference in This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-29 15:48:00
|
Revision: 2720 http://sourceforge.net/p/axsl/code/2720 Author: victormote Date: 2023-08-29 15:47:50 +0000 (Tue, 29 Aug 2023) Log Message: ----------- Make explicit tokens available earlier, so they can be used when creating the lexer. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-08-28 09:46:44 UTC (rev 2719) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-08-29 15:47:50 UTC (rev 2720) @@ -268,7 +268,7 @@ <!ELEMENT unparsed-hyphenation-patterns (resource-location)> -<!ELEMENT orthography (lexer, explicit-tokens*, match-rules*, +<!ELEMENT orthography (explicit-tokens*, lexer, match-rules*, derivative-rules?, dictionary?, hyphenation-patterns?, derivative-factories?) > This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-28 09:46:47
|
Revision: 2719 http://sourceforge.net/p/axsl/code/2719 Author: victormote Date: 2023-08-28 09:46:44 +0000 (Mon, 28 Aug 2023) Log Message: ----------- DTD change to tie a WritingSystem directory to an orthography. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-08-28 09:27:59 UTC (rev 2718) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-orthography-config.dtd 2023-08-28 09:46:44 UTC (rev 2719) @@ -29,7 +29,7 @@ <!ELEMENT axsl-orthography-config (explicit-token-list*, match-rule-list*, derivative-pattern-list*, derivative-factory-list*, dictionary-resource*, - hyphenation-patterns-resource*, configuration+)> + hyphenation-patterns-resource*, orthography+)> <!-- @@ -268,12 +268,11 @@ <!ELEMENT unparsed-hyphenation-patterns (resource-location)> -<!ELEMENT configuration (lexer, explicit-tokens*, match-rules*, +<!ELEMENT orthography (lexer, explicit-tokens*, match-rules*, derivative-rules?, dictionary?, hyphenation-patterns?, - derivative-factories?, orthography*) > + derivative-factories?) > -<!ELEMENT orthography EMPTY> <!-- 1. language-iso-3char: The 3-character ISO-639-2/T code for the language being configured. For example, for English: "eng". This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-28 09:28:01
|
Revision: 2718 http://sourceforge.net/p/axsl/code/2718 Author: victormote Date: 2023-08-28 09:27:59 +0000 (Mon, 28 Aug 2023) Log Message: ----------- Add method indicating whether one WritingSystem can satisfy the requirements of another. Modified Paths: -------------- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java Modified: trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java =================================================================== --- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-27 03:07:06 UTC (rev 2717) +++ trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-28 09:27:59 UTC (rev 2718) @@ -109,4 +109,14 @@ */ String getPrivateUse(); + /** + * Indicates whether this writing system satisfies the requirements of some other writing system. + * In other words, can resources (dictonaries, for example) tagged with this writing system be properly used by + * tasks (spell-checking, for example) requiring the other writing system. + * @param other The writing system against which this writing system is being tested. + * @return True if and only if resources tagged with this writing system can be properly be used by tasks that + * require {@code other}. + */ + boolean satisfies(WritingSystem other); + } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-27 03:07:08
|
Revision: 2717 http://sourceforge.net/p/axsl/code/2717 Author: victormote Date: 2023-08-27 03:07:06 +0000 (Sun, 27 Aug 2023) Log Message: ----------- Add methods supporting remaining components of the xml:lang attribute. Modified Paths: -------------- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java Modified: trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java =================================================================== --- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-27 02:08:33 UTC (rev 2716) +++ trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-27 03:07:06 UTC (rev 2717) @@ -24,21 +24,34 @@ package org.axsl.i18n; /** - * <p>A combination of language, country, and script that describes a unique writing system.</p> + * <p>A combination of language, script, and country that describes a unique writing system. + * This interface is intended to capture the general information encoded in the {@code xml:lang} attribute. + * The format of {@code xml:lang} is: language-extlang-script-region-variant-extension-privateuse. + * No support is provided in this interface for {@code extlang} because all {@code extlang} values, together with the + * appropriate language code map to a valid {@code language}. + * The combination of a (general) {@code language} code with an {@code extlang} should be normalized to the appropriate + * (specific) {@code language} code.</p> * + * <p>XSL-FO supports language, script, and country (region) attributes, and treats {@code xml:lang} as a shorthand for + * those attributes, but does not support the additional segments (variant, extension, privateuse). + * Applications that do not need the granularity of these additional segments should simply return null for the + * appropriate methods. + * Note that {@code region} in {@code xml:lang} maps to {@code country} in XSL-FO.</p> + * * <p>Developer Notes: Neither {@link java.util.Locale} nor the ICU4J ULocale class were flexible enough for our * purposes for the following reasons:</p> * <ul> * <li>Both are oriented toward language variants, i.e. a combination of language, country, and variant. - * For orthographies, the variant is (we think) unimportant, but the script is very important. + * For orthographies, the variant is relatively unimportant, but the script is very important. * <li>Neither is as object-oriented nor as typesafe as our design requires. * Neither has an interface that can be expanded. * Our design requires that classes like formatting objects to be able to directly provide orthography information * without unnecessary object creation.</li> * </ul> - * * <p>Client classes that wish to use either of these APIs are may want to consider writing an adapter that implements * this interface and wraps the logic from those tools.</p> + * + * @see "https://www.w3.org/International/articles/language-tags/" */ public interface WritingSystem { @@ -46,23 +59,54 @@ * Returns the (spoken) language for this writing system. * @return The (spoken) language for this writing system. * This should never be null. + * @see "https://www.w3.org/International/articles/language-tags/#language" + * @see "https://www.w3.org/International/articles/language-tags/#extlang" */ Language getLanguage(); /** - * Returns the script used by this writing system. + * Returns the script used by this writing system, which is used, for example, to distinguish between Vietnamese + * written in ideographs and that written in Latin characters, or between Greek written with Greek characters and + * that transliterated into Latin characters. * @return The script used by this writing system. * If null, the implied value is the default script for {@link #getLanguage()}. + * @see "https://www.w3.org/International/articles/language-tags/#script" * @see Language#getDefaultScript(Country) */ Script getScript(); /** - * Returns the country (region) for this writing system. + * Returns the country (region) for this writing system, which is used, for example, to distinguish between European + * French and Candian French, or between British English and American English. * @return The country for this writing system. * If null, the implication is that this writing system has no country-specific information in it, and can be used * for any country. + * @see "language-extlang-script-region-variant-extension-privateuse" */ Country getCountry(); + /** + * Returns the variant, if any, for this writing system. + * @return The variant for this writing system. + * This will normally be null except for specialized needs. + * @see "https://www.w3.org/International/articles/language-tags/#variants" + */ + String getVariant(); + + /** + * Returns the extension, if any, for this writing system. + * @return The extension for this writing system. + * This will normally be null except for specialized needs. + * @see "https://www.w3.org/International/articles/language-tags/#extension" + */ + String getExtension(); + + /** + * Returns the private use value, if any, for this writing system. + * @return The private use value for this writing system. + * This will normally be null except for specialized needs. + * @see "https://www.w3.org/International/articles/language-tags/#extension" + */ + String getPrivateUse(); + } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-27 02:08:36
|
Revision: 2716 http://sourceforge.net/p/axsl/code/2716 Author: victormote Date: 2023-08-27 02:08:33 +0000 (Sun, 27 Aug 2023) Log Message: ----------- Add method to Dictionary to return the WritingSystem which it supports. Modified Paths: -------------- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java Modified: trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java =================================================================== --- trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-25 10:44:28 UTC (rev 2715) +++ trunk/axsl/axsl-i18n/src/main/java/org/axsl/i18n/WritingSystem.java 2023-08-27 02:08:33 UTC (rev 2716) @@ -45,19 +45,24 @@ /** * Returns the (spoken) language for this writing system. * @return The (spoken) language for this writing system. + * This should never be null. */ Language getLanguage(); /** - * Returns the country for this writing system. - * @return The country for this writing system. - */ - Country getCountry(); - - /** * Returns the script used by this writing system. * @return The script used by this writing system. + * If null, the implied value is the default script for {@link #getLanguage()}. + * @see Language#getDefaultScript(Country) */ Script getScript(); + /** + * Returns the country (region) for this writing system. + * @return The country for this writing system. + * If null, the implication is that this writing system has no country-specific information in it, and can be used + * for any country. + */ + Country getCountry(); + } Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-25 10:44:28 UTC (rev 2715) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-27 02:08:33 UTC (rev 2716) @@ -23,6 +23,7 @@ package org.axsl.orthography.optional; +import org.axsl.i18n.WritingSystem; import org.axsl.orthography.Orthography; import org.axsl.orthography.OrthographyServer; import org.axsl.orthography.Word; @@ -39,6 +40,13 @@ public interface Dictionary extends Serializable { /** + * Returns the writing system supported by this dictionary. + * @return The writing system supported by this dictionary. + * This should never be null. + */ + WritingSystem getWritingSystem(); + + /** * Returns the number of alternative ways a given word appears in this dictionary. * @param wordChars The chars whose word is being queried. * @return The number of alternatives for this word in this dictionary. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-25 10:44:31
|
Revision: 2715 http://sourceforge.net/p/axsl/code/2715 Author: victormote Date: 2023-08-25 10:44:28 +0000 (Fri, 25 Aug 2023) Log Message: ----------- Use OrthographyServer to help resolve parent dictionaries. Modified Paths: -------------- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/OrthographyServer.java trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/OrthographyServer.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/OrthographyServer.java 2023-08-25 00:21:09 UTC (rev 2714) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/OrthographyServer.java 2023-08-25 10:44:28 UTC (rev 2715) @@ -24,11 +24,10 @@ package org.axsl.orthography; import org.axsl.i18n.WritingSystem; +import org.axsl.orthography.optional.Dictionary; /** - * The main entry point to the hyphenation package. - * Provides methods for finding the start and size of a word in a text - * sequence, and for creating hyphenation information for a word. + * Knows how to find orthography resources. */ public interface OrthographyServer { @@ -39,4 +38,11 @@ */ Orthography getOrthography(WritingSystem writingSystem); + /** + * Returns the dictionary for a given dictionary ID registered with this server. + * @param dictionaryId The ID of the dictionary to be returned. + * @return The dictionary for {@code dictionaryId}. + */ + Dictionary getDictionary(String dictionaryId); + } Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-25 00:21:09 UTC (rev 2714) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-25 10:44:28 UTC (rev 2715) @@ -24,6 +24,7 @@ package org.axsl.orthography.optional; import org.axsl.orthography.Orthography; +import org.axsl.orthography.OrthographyServer; import org.axsl.orthography.Word; import java.io.Serializable; @@ -88,4 +89,11 @@ */ boolean isExcludedWord(CharSequence wordChars); + /** + * Returns the parent dictionary, if any. + * @param server The orthography server that knows how to resolve the parent dictionary. + * @return The parent dictionary, or null if there is none. + */ + Dictionary getParentDictionary(OrthographyServer server); + } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-25 00:21:13
|
Revision: 2714 http://sourceforge.net/p/axsl/code/2714 Author: victormote Date: 2023-08-25 00:21:09 +0000 (Fri, 25 Aug 2023) Log Message: ----------- Rename element to axsl-dictionary-collection, for clarity. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-08-24 21:41:34 UTC (rev 2713) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-08-25 00:21:09 UTC (rev 2714) @@ -100,7 +100,7 @@ This is suitable as a root element for document-specific files that need to capture one-off words in more than one orthography. --> -<!ELEMENT axsl-dictionaries (axsl-dictionary*) > +<!ELEMENT axsl-dictionary-collection (axsl-dictionary*) > <!-- This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-24 21:41:36
|
Revision: 2713 http://sourceforge.net/p/axsl/code/2713 Author: victormote Date: 2023-08-24 21:41:34 +0000 (Thu, 24 Aug 2023) Log Message: ----------- Add method positively rejecting a given word as being valid for an orthography. Modified Paths: -------------- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java Modified: trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java =================================================================== --- trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-24 19:18:56 UTC (rev 2712) +++ trunk/axsl/axsl-orthography/src/main/java/org/axsl/orthography/optional/Dictionary.java 2023-08-24 21:41:34 UTC (rev 2713) @@ -75,4 +75,17 @@ */ boolean supportsQualifiedType(Word.PartOfSpeech pos, Word.PosQualifier qualifier); + /** + * Indicates whether a given sequence of characters should be excluded as a word for this orthography. + * This is useful for cases where one dictionary can override another. + * For example, the American English word "honor" is spelled "honour" in British English. + * If an American English dictionary uses a British English dictionary as its base, but overrides the spelling of + * this one word, then "honor" should be treated as correct, and "honour" should be treated as incorrect. + * Processors should check to see if a given sequence of characters is excluded before checking with the overridden + * dictionary for validity. + * @param wordChars The characters being tested. + * @return True if and only if {@code wordChars} is necessarily eliminated as a valid word in this orthography. + */ + boolean isExcludedWord(CharSequence wordChars); + } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-24 19:18:58
|
Revision: 2712 http://sourceforge.net/p/axsl/code/2712 Author: victormote Date: 2023-08-24 19:18:56 +0000 (Thu, 24 Aug 2023) Log Message: ----------- Add "epoch" attribute to provide a notion of orthography changing over time. Add "id" and "overrides" attributes, to allow one dictionary to wrap and override another. Modified Paths: -------------- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd Modified: trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd =================================================================== --- trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-08-23 10:47:45 UTC (rev 2711) +++ trunk/axsl/axsl-00-dev/doc/web/dtds/0.1/en/axsl-dictionary.dtd 2023-08-24 19:18:56 UTC (rev 2712) @@ -112,14 +112,34 @@ --> <!ELEMENT axsl-dictionary (w | word-group | phrase)*> <!-- -1. language: The 3-character ISO-639 code for the language to which the words +1. id: Used to allow one dictionary to point to another. It is an error for +more than one dictionary document to have the same id, although that must +be enforced at some higher level than an XML editor. +2. overrides: References the "id" attribute of some other axsl-dictionary, +allowing this dictionary to logically include the content of another, +overriding its content as needed. + +[The attributes language, script, country, and epoch are intended to match the +meaning of those terms as used in this article: +https://www.w3.org/International/articles/language-tags +except that "epoch" is a private-use subtag (see below).] + +3. language: The 3-character ISO-639 code for the language to which the words belong. -2. country: The 3-character ISO-3166 code for the country, if the words in this +4. script: The 4-character ISO-15924 code for the script being used. +5. country: The 3-character ISO-3166 code for the country, if the words in this dictionary are country-specific. If the intent is for this dictionary to contain words that are applicable to /any/ country in which "language" is spoken, do not set this attribute. -3. script: The 4-character ISO-15924 code for the script being used. -4. soft-hyphen-char: The character that is used in this dictionary to denote a +6. epoch: A private-use subtag that describes the time period in which the +dictionary applies. +This is intended to allow one dictionary that overrides another to designate +the approximate period in which its content would be considered valid. +For example, a dictionary valid through approximately 1920 might use an epoch +of "1920". +See the following for information about private-use subtags: +https://www.w3.org/International/articles/language-tags/#extension +7. soft-hyphen-char: The character that is used in this dictionary to denote a valid hyphenation point. Be sure to pick a character that will never occur in the actual spelling of any word in the dictionary, except, if that character is the hard hyphen character, @@ -127,7 +147,7 @@ other than the actual hard hyphen character. For languages where the hyphen character (both hard and soft) is "-", the recommended value for this attribute is "-". -5. hard-hyphen-char: The character that is used in this dictionary to denote a +8. hard-hyphen-char: The character that is used in this dictionary to denote a hard hyphenation point, i.e. where the hyphen is part of the word. For example, the English word "absent-minded" contains a hard hyphen. Word spellings in this dictionary will need to distinguish between hard and @@ -138,9 +158,12 @@ recommended value for this attribute is "=". --> <!ATTLIST axsl-dictionary - language CDATA #REQUIRED - country CDATA #IMPLIED - script CDATA #IMPLIED + id CDATA #REQUIRED + overrides CDATA #IMPLIED + language CDATA #REQUIRED + script CDATA #IMPLIED + country CDATA #IMPLIED + epoch CDATA #IMPLIED soft-hyphen-char CDATA #REQUIRED hard-hyphen-char CDATA #REQUIRED > This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-23 10:47:49
|
Revision: 2711 http://sourceforge.net/p/axsl/code/2711 Author: victormote Date: 2023-08-23 10:47:45 +0000 (Wed, 23 Aug 2023) Log Message: ----------- Fix build scripts to match current directory structure. Modified Paths: -------------- trunk/axsl/axsl-font/build.gradle trunk/axsl/axsl-galley/build.gradle trunk/axsl/axsl-orthography/build.gradle trunk/axsl/buildSrc/src/main/groovy/axsl.library-conventions.gradle Modified: trunk/axsl/axsl-font/build.gradle =================================================================== --- trunk/axsl/axsl-font/build.gradle 2023-08-22 23:28:04 UTC (rev 2710) +++ trunk/axsl/axsl-font/build.gradle 2023-08-23 10:47:45 UTC (rev 2711) @@ -13,7 +13,7 @@ /* We want the DTDs to live as part of the website files, so we need to copy them into the jar file here. */ jar { - from rootProject.projectDir.absolutePath + '/axsl-00-master/doc/web/dtds/0.1/en/', + from rootProject.projectDir.absolutePath + '/axsl-00-dev/doc/web/dtds/0.1/en/', { include 'axsl-font-config.dtd' into '/resources/org/axsl/dtds/' Modified: trunk/axsl/axsl-galley/build.gradle =================================================================== --- trunk/axsl/axsl-galley/build.gradle 2023-08-22 23:28:04 UTC (rev 2710) +++ trunk/axsl/axsl-galley/build.gradle 2023-08-23 10:47:45 UTC (rev 2711) @@ -23,7 +23,7 @@ } task copyAdditionalResources(type: Copy) { - from rootProject.projectDir.absolutePath + '/axsl-00-master/doc/web/dtds/0.1/en/' + from rootProject.projectDir.absolutePath + '/axsl-00-dev/doc/web/dtds/0.1/en/' include 'axsl-area-tree.dtd' into additionalResources.absolutePath + '/resources/org/axsl/dtds/' } Modified: trunk/axsl/axsl-orthography/build.gradle =================================================================== --- trunk/axsl/axsl-orthography/build.gradle 2023-08-22 23:28:04 UTC (rev 2710) +++ trunk/axsl/axsl-orthography/build.gradle 2023-08-23 10:47:45 UTC (rev 2711) @@ -16,7 +16,7 @@ /* We want the DTDs to live as part of the website files, so we need to copy them into the jar file here. */ jar { - from rootProject.projectDir.absolutePath + '/axsl-00-master/doc/web/dtds/0.1/en/', + from rootProject.projectDir.absolutePath + '/axsl-00-dev/doc/web/dtds/0.1/en/', { include 'axsl-area-tree.dtd' include 'axsl-dictionary.dtd' Modified: trunk/axsl/buildSrc/src/main/groovy/axsl.library-conventions.gradle =================================================================== --- trunk/axsl/buildSrc/src/main/groovy/axsl.library-conventions.gradle 2023-08-22 23:28:04 UTC (rev 2710) +++ trunk/axsl/buildSrc/src/main/groovy/axsl.library-conventions.gradle 2023-08-23 10:47:45 UTC (rev 2711) @@ -27,7 +27,7 @@ } checkstyle { - configFile = new File(rootProject.projectDir.absolutePath + '/axsl-00-master/config/checkstyle/checkstyle-config.xml') + configFile = new File(rootProject.projectDir.absolutePath + '/axsl-00-dev/config/checkstyle/checkstyle-config.xml') configProperties.put('axsl.root', rootProject.projectDir) toolVersion = versions.checkstyle } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-22 23:28:07
|
Revision: 2710 http://sourceforge.net/p/axsl/code/2710 Author: victormote Date: 2023-08-22 23:28:04 +0000 (Tue, 22 Aug 2023) Log Message: ----------- 1. Restore axsl-00-dev as a project, to have better access to its resources in IDEs. 2. Fix checkstyle config for new directory name. Modified Paths: -------------- trunk/axsl/axsl-00-dev/config/checkstyle/checkstyle-config.xml trunk/axsl/settings.gradle Modified: trunk/axsl/axsl-00-dev/config/checkstyle/checkstyle-config.xml =================================================================== --- trunk/axsl/axsl-00-dev/config/checkstyle/checkstyle-config.xml 2023-08-22 23:22:54 UTC (rev 2709) +++ trunk/axsl/axsl-00-dev/config/checkstyle/checkstyle-config.xml 2023-08-22 23:28:04 UTC (rev 2710) @@ -13,7 +13,7 @@ <!-- Reference the file with suppression information. --> <module name="SuppressionFilter"> - <property name="file" value="${axsl.root}/axsl-00-master/config/checkstyle/checkstyle-suppressions.xml"/> + <property name="file" value="${axsl.root}/axsl-00-dev/config/checkstyle/checkstyle-suppressions.xml"/> </module> <module name="FileTabCharacter"> @@ -21,7 +21,7 @@ </module> <module name="RegexpHeader"> - <property name="headerFile" value="${axsl.root}/axsl-00-master/config/checkstyle/checkstyle-header-java.txt"/> + <property name="headerFile" value="${axsl.root}/axsl-00-dev/config/checkstyle/checkstyle-header-java.txt"/> <property name="fileExtensions" value="java"/> </module> Modified: trunk/axsl/settings.gradle =================================================================== --- trunk/axsl/settings.gradle 2023-08-22 23:22:54 UTC (rev 2709) +++ trunk/axsl/settings.gradle 2023-08-22 23:28:04 UTC (rev 2710) @@ -1,5 +1,6 @@ rootProject.name = 'axsl' +include 'axsl-00-dev' include 'axsl-areatree' include 'axsl-constants' include 'axsl-content' This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <vic...@us...> - 2023-08-22 23:22:55
|
Revision: 2709 http://sourceforge.net/p/axsl/code/2709 Author: victormote Date: 2023-08-22 23:22:54 +0000 (Tue, 22 Aug 2023) Log Message: ----------- Rename 00-dev to axsl-00-dev in preparation for restoring it as a gradle project. Added Paths: ----------- trunk/axsl/axsl-00-dev/ Removed Paths: ------------- trunk/axsl/00-dev/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |