Thread: [Libwpd-devel] submitting .wpd files?

Brought to you by: dtardon, lachancew, strbafridrich, uwog

libwpd-devel

[Libwpd-devel] submitting .wpd files?

From: Mark C. <mar...@gm...> - 2011-04-15 17:52:17

Hi;
Would it be at all useful to submit .wpd files that don't open
correctly with the latest version of libwpd?
I work in an organization that makes extensive use of WP 12 and can
get my hands on many such documents.

Re: [Libwpd-devel] submitting .wpd files?

From: Mark C. <mar...@gm...> - 2011-04-18 15:18:49

BTW, I have libwpd 0.9.1 and libwpg 0.2.0 from Arch Linux. Is that new
enough? Writerperfect I'll have to install myself (0.8.0).

Mark

On Mon, Apr 18, 2011 at 11:06 AM, Mark Coolen <mar...@gm...> wrote:
> I'll see what I can do. I'll make sure I have the latest version to
>  test on. I'd like to see our organization use LibreOffice as a
>  replacement for WP, so libwpd et al are an important part of this.
>
>  Mark
>
> On Fri, Apr 15, 2011 at 5:15 PM, Fridrich Strba
> <fri...@bl...> wrote:
>> Hello, Mark,
>>
>> On 15/04/2011 19:51, Mark Coolen wrote:
>>> Would it be at all useful to submit .wpd files that don't open
>>> correctly with the latest version of libwpd?
>>
>> Very useful indeed, provided that they don't open correctly with the
>> master branch of libwpd/libwpg/writerperfect mix. I fixed some problems
>> of document loading last weeks, so some problem might be dupplicates.
>>
>>> I work in an organization that makes extensive use of WP 12 and can
>>> get my hands on many such documents.
>>
>> Very nice! We normally like to add a sample document corresponding to a
>> problem we fixed to our regression testing suite
>> <http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd-regression;a=summary>
>> to avoid the same class of problems biting us ever again. So, if you are
>> submitting a file, it would be nice to specify whether we are legally
>> entitled to commit it there. We like to have the problematic document
>> public in order to lower the bus factor as much as possible.
>>
>> Thanks for your willingness to help.
>>
>> Cheers
>>
>> Fridrich
>>
>> ------------------------------------------------------------------------------
>> Benefiting from Server Virtualization: Beyond Initial Workload
>> Consolidation -- Increasing the use of server virtualization is a top
>> priority.Virtualization can reduce costs, simplify management, and improve
>> application availability and disaster protection. Learn more about boosting
>> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
>> _______________________________________________
>> Libwpd-devel mailing list
>> Lib...@li...
>> https://lists.sourceforge.net/lists/listinfo/libwpd-devel
>>
>
>
>
> --
>  ___________________________
> | Coolen Software Solutions
> | +1.519.652.9378
> | mar...@gm...
> |___________________________
>



-- 
 ___________________________
| Coolen Software Solutions
| +1.519.652.9378
| mar...@gm...
|___________________________

Re: [Libwpd-devel] submitting .wpd files?

From: Fridrich S. <fri...@bl...> - 2011-04-19 06:22:01

Hi, Mark,

On 18/04/11 17:18, Mark Coolen wrote:
> BTW, I have libwpd 0.9.1 and libwpg 0.2.0 from Arch Linux. Is that new
> enough? Writerperfect I'll have to install myself (0.8.0).

OK, there was a fix landing to the libwpd master after 0.9.1, which was 
solving some issues with loading documents with mildly corrupted prefix 
for WP6 parser. Would be good to see whether that one is not solving the 
issue already before reporting, I guess.

F.

[Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-19 12:58:35

Hello,

Recently I was asked to help someone convert hundreds of WPMac files that include Japanese Kanji, files that were created on old Macs that had the Japanese Language Kits installed.

It seems - I could be wrong - that libwpd doesn't convert the characters in those files. The method I found for converting them was a bit roundabout:

Use a PowerPC Mac that runs OS 10.4 and "Classic" with the Japanese Language Kit installed. Open the WPMac files in WPMac in Classic. Copy the contents of the file to the clipboard. Paste the contents of the file from the Clipboard into OS X's TextEdit or any other unicode-aware Mac application. Save the resulting file as an RTF or DOC file. The resulting file opens correctly in LibreOffice, Pages, Word, etc.

This method obviously requires obsolete hardware and software. I would guess that it would require an enormous amount of effort to support double-byte CJK and other WorldScript-based scripts in libwpd, and that the potential need for it is far too small to justify the effort. But is this something that might someday be possible in the future?

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: William L. <wr...@gm...> - 2011-04-19 13:16:25

On Tue, Apr 19, 2011 at 8:58 AM, Edward Mendelson <em...@co...> wrote:

> Hello,
>
> Recently I was asked to help someone convert hundreds of WPMac files that
> include Japanese Kanji, files that were created on old Macs that had the
> Japanese Language Kits installed.
>
> It seems - I could be wrong - that libwpd doesn't convert the characters in
> those files. The method I found for converting them was a bit roundabout:
> ...

Use a PowerPC Mac that runs OS 10.4 and "Classic" with the Japanese Language
> Kit installed. Open the WPMac files in WPMac in Classic. Copy the contents
> of the file to the clipboard. Paste the contents of the file from the
> Clipboard into OS X's TextEdit or any other unicode-aware Mac application.
> Save the resulting file as an RTF or DOC file. The resulting file opens
> correctly in LibreOffice, Pages, Word, etc.
>
> This method obviously requires obsolete hardware and software. I would
> guess that it would require an enormous amount of effort to support
> double-byte CJK and other WorldScript-based scripts in libwpd, and that the
> potential need for it is far too small to justify the effort. But is this
> something that might someday be possible in the future?
>

Actually, it's not really that difficult. Unless Japanese is dramatically
different from what we've seen so far, all we should need to make this
conversion work is a table mapping from WordPerfect extended characters to
their unicode equivalents. Over the years we've expanded support for
languages from only plain latin to relatively obscure ones like Tibetan
courtesy of mappings submitted by various people.

If you don't have the expertise to create such a mapping yourself, we could
probably derive one from (1) a WP document containing all the characters in
a Japanese script and (2) one converted to RTF/DOC. If you're interested in
producing something like this, let us know!

-- 
William Lachance
wr...@gm...

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-19 14:08:17

On 4/19/2011 9:16 AM, William Lachance wrote:
> On Tue, Apr 19, 2011 at 8:58 AM, Edward Mendelson <em...@co...
> <mailto:em...@co...>> wrote:
>
>     Hello,
>
>     Recently I was asked to help someone convert hundreds of WPMac files
>     that include Japanese Kanji, files that were created on old Macs
>     that had the Japanese Language Kits installed.
>
>     It seems - I could be wrong - that libwpd doesn't convert the
>     characters in those files. The method I found for converting them
>     was a bit roundabout:
>     ...
>
>     Use a PowerPC Mac that runs OS 10.4 and "Classic" with the Japanese
>     Language Kit installed. Open the WPMac files in WPMac in Classic.
>     Copy the contents of the file to the clipboard. Paste the contents
>     of the file from the Clipboard into OS X's TextEdit or any other
>     unicode-aware Mac application. Save the resulting file as an RTF or
>     DOC file. The resulting file opens correctly in LibreOffice, Pages,
>     Word, etc.
>
>     This method obviously requires obsolete hardware and software. I
>     would guess that it would require an enormous amount of effort to
>     support double-byte CJK and other WorldScript-based scripts in
>     libwpd, and that the potential need for it is far too small to
>     justify the effort. But is this something that might someday be
>     possible in the future?
>
>
> Actually, it's not really that difficult. Unless Japanese is
> dramatically different from what we've seen so far, all we should need
> to make this conversion work is a table mapping from WordPerfect
> extended characters to their unicode equivalents. Over the years we've
> expanded support for languages from only plain latin to relatively
> obscure ones like Tibetan courtesy of mappings submitted by various people.
>
> If you don't have the expertise to create such a mapping yourself, we
> could probably derive one from (1) a WP document containing all the
> characters in a Japanese script and (2) one converted to RTF/DOC. If
> you're interested in producing something like this, let us know!
>
> --
> William Lachance
> wr...@gm... <mailto:wr...@gm...>

I created a WPMac 3.5e file with a few lines of Japanese kanji. The text 
is nonsense - I simply typed in random characters because I know about 
ten words of Japanese and don't know how to type them. But it should 
give you an idea of how kanji is stored in WPMac files. The file is here:

http://dl.dropbox.com/u/271144/Kanji.wpmac

I'm not expert enough in the WPMac file format to learn anything from 
it, but perhaps it may be useful to someone who knows a lot more than I 
do. When I open it in Writer, it's blank except for a single letter "t".

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-19 14:28:53

Hello, Edward

On 19/04/11 16:07, Edward Mendelson wrote:
> I created a WPMac 3.5e file with a few lines of Japanese kanji. The text
> is nonsense - I simply typed in random characters because I know about
> ten words of Japanese and don't know how to type them. But it should
> give you an idea of how kanji is stored in WPMac files. The file is here:
> http://dl.dropbox.com/u/271144/Kanji.wpmac
> I'm not expert enough in the WPMac file format to learn anything from
> it, but perhaps it may be useful to someone who knows a lot more than I
> do. When I open it in Writer, it's blank except for a single letter "t".

OK, they are all in the C8 function. Now only if we could find a 
documentation about the double-byte Mac Script character sets.

F.

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Smokey A. <alq...@ar...> - 2011-04-19 18:16:12

At 4:28 PM +0200 on  4/19/11, Fridrich Strba wrote:

>Hello, Edward
>
>On 19/04/11 16:07, Edward Mendelson wrote:
>>  I created a WPMac 3.5e file with a few lines of Japanese kanji. The text
>>  is nonsense - I simply typed in random characters because I know about
>>  ten words of Japanese and don't know how to type them. But it should
>>  give you an idea of how kanji is stored in WPMac files. The file is here:
>  > http://dl.dropbox.com/u/271144/Kanji.wpmac
>>  I'm not expert enough in the WPMac file format to learn anything from
>>  it, but perhaps it may be useful to someone who knows a lot more than I
>>  do. When I open it in Writer, it's blank except for a single letter "t".
>
>OK, they are all in the C8 function. Now only if we could find a
>documentation about the double-byte Mac Script character sets.

Fridrich, I think you want 
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT (and 
other files in http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/ for 
other Mac encodings).  (Someone else please commit this link to 
memory for me; I knew it existed but couldn't remember where to find 
it anymore :-( )

I thought I had remembered you telling me years ago that the 
WorldScript stuff was stored in the resource fork and was thus 
near-impossible for libwpd to use; I'm happy to see that appears not 
to be the case here :-) (maybe it was just the single-byte scripts?)

Best,
Smokey
(still here, very busy)
-- 
Smokey Ardisson
alq...@ar...
http://www.ardisson.org/
------------------------------------------
"He is a fool who has forgotten what became of his ancestry
seven generations before him and who does not care what will
become of his progeny seven generations after him."
           --Kazakh Proverb

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-19 23:53:44

Attachments: Kanji.wpmac.html

Hello, Smokey, my old buddy. Nice to know you alive and kicking :)

On 19/04/11 19:40, Smokey Ardisson wrote:
> http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT (and other
> files in http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/ for other Mac
> encodings). (Someone else please commit this link to memory for me; I
> knew it existed but couldn't remember where to find it anymore :-( )

This is exactly what is needed. Thanks for thinking about it. It is very 
helpful.

> I thought I had remembered you telling me years ago that the WorldScript
> stuff was stored in the resource fork and was thus near-impossible for
> libwpd to use; I'm happy to see that appears not to be the case here :-)
> (maybe it was just the single-byte scripts?)

Ah, if I remembered all the stupid things I ever said, I would be having 
to have a head of the size of a 40-wheeler. There were other things in 
the resource fork that we now try to read. Like pictures.

But, just to tell you all that I had another try on this. And after some 
hours of grep, sed, awk and even a c++ generator, I added about 5000 
lines of conversion tables to libwpd_internal.cpp and the document is 
now looking like the attached one.

Judge by yourselves.

F.

Re: [Libwpd-devel] submitting .wpd files?

From: Fridrich S. <fri...@bl...> - 2011-04-15 21:15:50

Hello, Mark,

On 15/04/2011 19:51, Mark Coolen wrote:
> Would it be at all useful to submit .wpd files that don't open
> correctly with the latest version of libwpd?

Very useful indeed, provided that they don't open correctly with the 
master branch of libwpd/libwpg/writerperfect mix. I fixed some problems 
of document loading last weeks, so some problem might be dupplicates.

> I work in an organization that makes extensive use of WP 12 and can
> get my hands on many such documents.

Very nice! We normally like to add a sample document corresponding to a 
problem we fixed to our regression testing suite 
<http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd-regression;a=summary> 
to avoid the same class of problems biting us ever again. So, if you are 
submitting a file, it would be nice to specify whether we are legally 
entitled to commit it there. We like to have the problematic document 
public in order to lower the bus factor as much as possible.

Thanks for your willingness to help.

Cheers

Fridrich

Re: [Libwpd-devel] submitting .wpd files?

From: Mark a. N. <the...@gm...> - 2011-04-18 12:38:42

I'll see what I can do. I'll make sure I have the latest version to
test on. I'd like to see our organization use LibreOffice as a
replacement for WP, so libwpd et al are an important part of this.

Mark

On Fri, Apr 15, 2011 at 5:15 PM, Fridrich Strba
<fri...@bl...> wrote:
> Hello, Mark,
>
> On 15/04/2011 19:51, Mark Coolen wrote:
>> Would it be at all useful to submit .wpd files that don't open
>> correctly with the latest version of libwpd?
>
> Very useful indeed, provided that they don't open correctly with the
> master branch of libwpd/libwpg/writerperfect mix. I fixed some problems
> of document loading last weeks, so some problem might be dupplicates.
>
>> I work in an organization that makes extensive use of WP 12 and can
>> get my hands on many such documents.
>
> Very nice! We normally like to add a sample document corresponding to a
> problem we fixed to our regression testing suite
> <http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd-regression;a=summary>
> to avoid the same class of problems biting us ever again. So, if you are
> submitting a file, it would be nice to specify whether we are legally
> entitled to commit it there. We like to have the problematic document
> public in order to lower the bus factor as much as possible.
>
> Thanks for your willingness to help.
>
> Cheers
>
> Fridrich
>
> ------------------------------------------------------------------------------
> Benefiting from Server Virtualization: Beyond Initial Workload
> Consolidation -- Increasing the use of server virtualization is a top
> priority.Virtualization can reduce costs, simplify management, and improve
> application availability and disaster protection. Learn more about boosting
> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
> _______________________________________________
> Libwpd-devel mailing list
> Lib...@li...
> https://lists.sourceforge.net/lists/listinfo/libwpd-devel
>

Re: [Libwpd-devel] submitting .wpd files?

From: Mark C. <mar...@gm...> - 2011-04-18 15:06:58

I'll see what I can do. I'll make sure I have the latest version to
 test on. I'd like to see our organization use LibreOffice as a
 replacement for WP, so libwpd et al are an important part of this.

 Mark

On Fri, Apr 15, 2011 at 5:15 PM, Fridrich Strba
<fri...@bl...> wrote:
> Hello, Mark,
>
> On 15/04/2011 19:51, Mark Coolen wrote:
>> Would it be at all useful to submit .wpd files that don't open
>> correctly with the latest version of libwpd?
>
> Very useful indeed, provided that they don't open correctly with the
> master branch of libwpd/libwpg/writerperfect mix. I fixed some problems
> of document loading last weeks, so some problem might be dupplicates.
>
>> I work in an organization that makes extensive use of WP 12 and can
>> get my hands on many such documents.
>
> Very nice! We normally like to add a sample document corresponding to a
> problem we fixed to our regression testing suite
> <http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd-regression;a=summary>
> to avoid the same class of problems biting us ever again. So, if you are
> submitting a file, it would be nice to specify whether we are legally
> entitled to commit it there. We like to have the problematic document
> public in order to lower the bus factor as much as possible.
>
> Thanks for your willingness to help.
>
> Cheers
>
> Fridrich
>
> ------------------------------------------------------------------------------
> Benefiting from Server Virtualization: Beyond Initial Workload
> Consolidation -- Increasing the use of server virtualization is a top
> priority.Virtualization can reduce costs, simplify management, and improve
> application availability and disaster protection. Learn more about boosting
> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
> _______________________________________________
> Libwpd-devel mailing list
> Lib...@li...
> https://lists.sourceforge.net/lists/listinfo/libwpd-devel
>



-- 
 ___________________________
| Coolen Software Solutions
| +1.519.652.9378
| mar...@gm...
|___________________________

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-20 00:46:36

On 4/19/2011 7:53 PM, Fridrich Strba wrote:
>
> But, just to tell you all that I had another try on this. And after some
> hours of grep, sed, awk and even a c++ generator, I added about 5000
> lines of conversion tables to libwpd_internal.cpp and the document is
> now looking like the attached one.
>
> Judge by yourselves.

Fridrich,

Amazing!! Deeply impressed!! May I ask the person I've been working with 
for a few test files that I can send along?

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-20 00:48:23

On 4/19/2011 7:53 PM, Fridrich Strba wrote:
> But, just to tell you all that I had another try on this. And after some
> hours of grep, sed, awk and even a c++ generator, I added about 5000
> lines of conversion tables to libwpd_internal.cpp and the document is
> now looking like the attached one.

Not to make your life more difficult, but would it now be possible to 
add the CHINSIMP and CHINTRAD and KOREAN to the tables? They're all here:

http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/

Plus a few others!

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-20 07:41:43

Edward,

On 20/04/2011 02:48, Edward Mendelson wrote:
> Not to make your life more difficult, but would it now be possible to
> add the CHINSIMP and CHINTRAD and KOREAN to the tables?

They are all in already. All CJK two-byte codes are inside
http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd;a=commit;h=1814c4270a1fe13c6ff569985b62349d1707b6dd

;)

f.

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Smokey A. <alq...@ar...> - 2011-04-21 06:32:15

At 1:53 AM +0200 on 4/20/11, Fridrich Strba wrote:

>On 19/04/11 19:40, Smokey Ardisson wrote:
>>http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT (and other
>>files in http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/ for other Mac
>>encodings). (Someone else please commit this link to memory for me; I
>>knew it existed but couldn't remember where to find it anymore :-( )
>
>This is exactly what is needed. Thanks for thinking about it. It is 
>very helpful.

:-)  I'm glad I was able to remember/find it again.

>But, just to tell you all that I had another try on this. And after 
>some hours of grep, sed, awk and even a c++ generator, I added about 
>5000 lines of conversion tables to libwpd_internal.cpp and the 
>document is now looking like the attached one.
>
>Judge by yourselves.

Amazing work again!

At 9:41 AM +0200 on 4/20/11, Fridrich Strba wrote:

>Edward,
>On 20/04/2011 02:48, Edward Mendelson wrote:
>  > Not to make your life more difficult, but would it now be possible to
>>  add the CHINSIMP and CHINTRAD and KOREAN to the tables?
>
>They are all in already. All CJK two-byte codes are inside
>http://libwpd.git.sourceforge.net/git/gitweb.cgi?p=libwpd/libwpd;a=commit;h=1814c4270a1fe13c6ff569985b62349d1707b6dd

Just for my own information, your recent commits have only been for 
the double-byte languages; there's still no support for the 
single-byte ones in WP-Mac files, right?

Enjoy the rest of your vacation!

Smokey
-- 
Smokey Ardisson
alq...@ar...
http://www.ardisson.org/
------------------------------------------
"He is a fool who has forgotten what became of his ancestry
seven generations before him and who does not care what will
become of his progeny seven generations after him."
           --Kazakh Proverb

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-21 09:49:27

Smokey,

On 21/04/2011 08:20, Smokey Ardisson wrote:
> Just for my own information, your recent commits have only been for the
> double-byte languages; there's still no support for the single-byte ones
> in WP-Mac files, right?

I have problem with this for this moment. Because they are basically in 
the same range for each charset. I don't know how those are placed in 
the script function. It is possible that for each group there is one 
byte indicating which group it is and the code, but I am pretty not 
sure. If this were the case, it could happen that we could somehow 
convert them.

For the other "extended character function", we do following. We read 
first the one byte of mac character code and interprete it into the 
MacRoman table. If this character is not valid (special codes for that), 
we read the two bytes of WP5 charset/char pair and interprete it the 
same way as we use for WP5 documents.

I have implemented this primarily as only the WP5 part, but if I recall 
well, we had some problems with particular mac characters that were 
misinterpreted (probably bug in WP3). So we reverted to the priority for 
the Mac Character. It is nevertheless conceivable, although I don't have 
the empirical evidence for it that if you write a document using another 
system language, the mac character will be from different set. There, we 
cannot do much unless we somehow get the information about what charset 
those characters are from. That is also I was saying the we could use 
some additional information about how the WorldScript encoding is 
looking like.

So, unless we know which encoding the document uses for the mac 
character part of the 0xC0 functions, we are a bit grilled. I almost 
feel like actually converting the WP5 pair first and then the mac char 
only if the WP5 is not giving result, since it might cover correctly 
more cases even though it might mess up some 2-3 acutes vs. graves 
accents. But the proper way would be to get the information from the 
docs about what encoding one uses. Anybody volunteering?

F.

-- 
Please avoid sending me Word, Excel or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-21 13:46:33

On 21 Apr 2011, at 9:37 AM, Fridrich Strba wrote:

> I see in a manual list of script IDs ranging from 0x00 for Roman to 0x20 
> for Symbol. Now I would like to see how an eventual arabic text looks in 
> a wp document. This could make us somehow understand how to extract this 
> information from the 0xC8 function.

I can try to enter some arbitrary Arabic characters into a WPMac file later today and post it to the group. Is that what would be useful?

Or do you want WPDOS 5.1 Arabic? That's ordinary character set 13/14 so I assume you already handle it.

Some sample Arabic WPDOS files that I use for testing are here:

http://www.un.org/popin/unpopcom/32ndsess/gass.htm

The PDF files made from the WP files (on the same page) are of course NOT Unicode-based Arabic, but simply arbitrary symbols.

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-21 14:56:33

Thanks, Edward,

On 21/04/11 15:46, Edward Mendelson wrote:
> Some sample Arabic WPDOS files that I use for testing are here:
> http://www.un.org/popin/unpopcom/32ndsess/gass.htm

Here I realized that we were not doing any conversion of arabic for WP5. 
Now. I found this program here 
http://www.lionscribe.com/downloads/files/wp2rtf/wp2rtf.zip that has 
some test files for arabic and for hebrew. I "implemented" arabic for 
WP5. Nevertheless, not properly.

Smokey, could you go through the rtf files compared what wpd2text makes 
out of the *WP files and point me which glyphs are not correct? 
Eventually map them to unicode (but I can do it when I have time)?

Also, someone that is good at hebrew, could you please map the missing 
hebrew characters for me? I might try, but not sure how good it will be 
because I will basically compare two pictures.

F.

-- 
Please avoid sending me Word, Excel or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-21 15:14:29

On 4/21/2011 10:56 AM, Fridrich Strba wrote:
>
> Also, someone that is good at hebrew, could you please map the missing
> hebrew characters for me? I might try, but not sure how good it will be
> because I will basically compare two pictures.

I don't know any Hebrew at all, but the character set was small enough 
for me to map them to their Unicode equivalents.

If you go to this page:

http://www.columbia.edu/~em36/wpdos/arabicandhebrew.html#hebrewpspdf

and download my WP51HEPS.EXE you will find an .ALL file named WP51HEPS.ALL.

There are character maps in that file that map the WP Hebrew characters 
to the glyphs in the PostScript fonts also included in the file. As 
you'll see, these are mapped either to their Adobe "afii" numbers 
(easily converted to unicode) or their unicode numbers.

There are, if I remember correctly, three characters that aren't 
directly matched. A Hebrew scholar told me that three characters in the 
WP Hebrew set are "ghosts" - that is, they were combinations of a 
consonant and vowel markings that in fact never occur in any real Hebrew 
text. I don't remember what they are, but I think I tried to find some 
way to create them by combining real unicode/Adobe characters, just to 
be complete.

Please let me know if I can clarify this further. It's been a few years 
since I did this.

I made a start on Arabic, but the task was too complex and there seemed 
to be too few users who cared.

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-21 15:53:51

On 4/21/2011 10:56 AM, Fridrich Strba wrote:
> Thanks, Edward,
>
> On 21/04/11 15:46, Edward Mendelson wrote:
>> Some sample Arabic WPDOS files that I use for testing are here:
>> http://www.un.org/popin/unpopcom/32ndsess/gass.htm
>
> Here I realized that we were not doing any conversion of arabic for WP5.
> Now. I found this program here
> http://www.lionscribe.com/downloads/files/wp2rtf/wp2rtf.zip that has
> some test files for arabic and for hebrew. I "implemented" arabic for
> WP5. Nevertheless, not properly.
>
> Smokey, could you go through the rtf files compared what wpd2text makes
> out of the *WP files and point me which glyphs are not correct?
> Eventually map them to unicode (but I can do it when I have time)?
>
> Also, someone that is good at hebrew, could you please map the missing
> hebrew characters for me? I might try, but not sure how good it will be
> because I will basically compare two pictures.
>

I have the LionScribe utility, and used it to convert the CHARACTR.DOC 
from Arabic WP into RTF format. I've posted it here:

http://dl.dropbox.com/u/271144/CHARACTR.DOC.rtf

I have no way of knowing how accurate the results are, but I'm sure 
they're better than anything anyone else has tried.

Edward Mendelson
Contributing Editor
PC Magazine

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Smokey A. <alq...@ar...> - 2011-04-23 06:19:10

At 11:23 PM +0200 on 4/21/11, Fridrich Strba wrote:

>On 21/04/11 21:52, Smokey Ardisson wrote:
>>  Do we have reason to believe that the mapping in WP5 is different from
>>  the ones I contributed for WP6?
>
>Yes, we have a strong reason to believe that not all are the same. If
>you take the arabic test in the zip I pointed to and convert with a
>fresh checkout of libwpd to html, you will see remarkable differences.
>The advantage though is that that test document is actually having the
>names of the characters near to them, which will make the janitorial
>work a bit easier.

I can't believe that a company would move around blocks of characters 
from one version of the software to the next, especially when those 
characters' codes are the canonical ways of identifying them :-P 
:sigh:  However, there were not too many differences/corrections 
between what's in libwpd_internal.cpp right now for set 13.

Can someone (Edward?) extract the full character sets 13 and 14 from 
"TEST.WP" from the wp2rtf zip and either run those through wp2rtf or 
generate PDFs that show the Arabic characters for me?  The "TEST_ARA" 
pair of documents didn't include the first dozen codepoints in set 13 
and don't include any of set 14, so I don't have a visual reference 
:-(

I have set 13 all fixed (with the exception of those first dozen that 
I don't know what they look like and which have useless names), 
within the confines of what's available in Unicode and with the 
limitation of a single-codepoint-to-single-codepoint mapping (wp2rtf 
produces more "accurate" mappings of some fancy WP codepoints that 
don't have single matching Unicode glyphs by using two characters).

I think new additions to Unicode since version 4.1 (when I did the 
WP6 mappings) will let us successfully map some of the additional 
random diacritical marks WP used to Unicode; if so, I can also fix 
the old WP6 mappings for those.

>  > Looking back through my files from that era, it looks like we ended up
>>  punting on "true" conversion for WP-Arabic and ended up mapping to
>>  Unicode presentation forms (except for the "stand-alone" forms of the
>>  letters, which got mapped to the normal, combining Unicode characters),
>>  so that the result was reverse-ordered, unconnected text (I think you
>>  were going to use Fribidi to try and reorder, but I remember vaguely
>>  that that effort had some problems and you intended to fix things
>>  elsewhere in another manner).  At some point in the future, we might
>>  want to revisit that and map all of the WP-Arabic codepoints to normal,
>>  combining forms where possible for WP5/WP6.
>
>Yeah, I tried to do that, but it is a bit too complicated and the
>reverse bidi algorithm is not even defined. Basically, you would have to
>have two marks like OOXML has, to tell that this and this span is part
>of the same run, because if not, you will not have a way to reorder
>spans with different character properties (bold, italic) but part of the
>same phrase. It was a huge mess to do and I really did not have the
>courage to dive into it.

Just pretend I never said that paragraph you replied to; I wasn't 
completely awake when I wrote that and I had forgotten the key bit of 
the problem: non-Mac WP required you to enter characters 
backwards/LTR to begin with (I was thinking for some reason that the 
characters were in the correct order in the file and would work 
properly in a bidi-aware word processor if only we'd mapped to the 
normal codepoints) :-P

Smokey
-- 
Smokey Ardisson
alq...@ar...
http://www.ardisson.org/
------------------------------------------
"He is a fool who has forgotten what became of his ancestry
seven generations before him and who does not care what will
become of his progeny seven generations after him."
           --Kazakh Proverb

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-23 11:23:42

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 23/04/11 08:18, Smokey Ardisson wrote:
> I can't believe that a company would move around blocks of characters 
> from one version of the software to the next, especially when those 
> characters' codes are the canonical ways of identifying them :-P 
> :sigh:  However, there were not too many differences/corrections 
> between what's in libwpd_internal.cpp right now for set 13.

Yeah, but I saw differences and I noticed in each one of them they
changed some characters.

> Can someone (Edward?) extract the full character sets 13 and 14 from 
> "TEST.WP" from the wp2rtf zip and either run those through wp2rtf or 
> generate PDFs that show the Arabic characters for me?  The "TEST_ARA" 
> pair of documents didn't include the first dozen codepoints in set 13 
> and don't include any of set 14, so I don't have a visual reference 
> :-(

This is what Edward posted some days ago:
http://dl.dropbox.com/u/271144/CHARACTR.DOC.rtf It should have the
visual references.

> I have set 13 all fixed (with the exception of those first dozen that 
> I don't know what they look like and which have useless names), 
> within the confines of what's available in Unicode and with the 
> limitation of a single-codepoint-to-single-codepoint mapping (wp2rtf 
> produces more "accurate" mappings of some fancy WP codepoints that 
> don't have single matching Unicode glyphs by using two characters).

It would be actually good to mark somewhere the exact wp characters
(charset, charnumber) and their multicharacter mappings. I will then do
what I do for the double byte script, will handle those where the single
codepoint to single codepoint works and will add a special handling for
those that need to be mapped using two or more chars. In the single
codepoint to single codepoint mapping I will mark them as having 0x0000
conversion which will be a special case. Nevertheless, I will do this later.

> I think new additions to Unicode since version 4.1 (when I did the 
> WP6 mappings) will let us successfully map some of the additional 
> random diacritical marks WP used to Unicode; if so, I can also fix 
> the old WP6 mappings for those.

Feel free to fix whatever you want. Patches are always welcome :)

Cheers

F.

- -- 
Please avoid sending me Word, Excel or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAk2ytrMACgkQu9a1imXPdA8a8gCfZ/zVfpVif6JmCNpXtEsZ/JTN
z0QAn2DY44glLmbfAzoHnG9ot1f4xrOG
=rRZz
-----END PGP SIGNATURE-----

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Edward M. <em...@co...> - 2011-04-23 21:01:21

On 23 Apr 2011, at 4:42 PM, Edward Mendelson wrote:

> 
> On 4/23/2011 4:19 PM, Fridrich Strba wrote:
>> Edward,
>> 
>> 
>> On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote:
>>> Also potentially useful: The character map document that shipped with 6.x for DOS is here:
>>> http://dl.dropbox.com/u/271144/CHARACT6.DOC
>>> It includes all 14 sets.
>> 
>> Thanks for this one. I run it over wpd2text and it might be that we
>> actually do a good job here. Now, I could use some help here. If you
>> people could simply do the wpd2text of those characters and see whether
>> all glyphs in all charsets are correctly mapped. If you find error, just
>> note which charset and char number and what would be the correct unicode
>> mapping. For characters that would correspond to 2 or more unicode
>> character sequence, please write that down, but give me also a closer
>> approximation of 1-1 mapping.
>> 
>> 
>>> I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on.
>> 
>> The wp2rtf zip file contains a TEST.WP file that has all the charsets in
>> it. If you have the visual representation, just run it through wpd2text
>> and compare. I would appreciate again to have the information of wrongly
>> mapped glyphs.
>> 
>> BTW: I hope I actually fixed (maybe apart about 10 chars) the WP5.1
>> hebrew map yesterday. Please check.
>> 
>> Thanks for helping with this
> 
> Hello Fridrich,
> 
> I am happy to help. Let me make certain that I know what you asking for.
> 
> 1. You want me to test the WPDOS 6.x character map file by running it 
> through wpd2text, and compare the output from wpd2text with the original 
> file, and report any changes.
> 
> 2. The TEST.WP file is (except for the first line) the same as the 
> CHARMAP.TST file that shipped with WP 5.1 Arabic and Hebrew. It shows a 
> very different character set from the WPDOS 6.x character map file, and 
> it is (I believe) in WP5 format. You want me to run it also through 
> wpd2text and compare the output with the original, and report any changes.
> 
> I will be away much of the evening (New York time) but will try to do 
> this some time this weekend.
> 
> Edward
> 

Just to clarify again:

I've run both files through wpd2text - and the results did not seem useful at all (see the linked file). However, if you want me to run them through wpd2odt, then the results were very useful indeed. I'll check them later. 

See the results here:

http://dl.dropbox.com/u/271144/CharTests.zip

And wpd2html produced some very impressive-looking results:

http://dl.dropbox.com/u/271144/testwp5.html

http://dl.dropbox.com/u/271144/character6.html

As I said, I'll have to check these later this weekend.

Edward

Re: [Libwpd-devel] WPMac files with WorldScript fonts?

From: Fridrich S. <fri...@bl...> - 2011-04-24 04:55:16

Hello, Edward,

On 23/04/2011 23:01, Edward Mendelson wrote:
> I've run both files through wpd2text - and the results did not seem useful at all (see the linked file). However, if you want me to run them through wpd2odt, then the results were very useful indeed. I'll check them later.
> See the results here:
> http://dl.dropbox.com/u/271144/CharTests.zip

Whichever workflow works the best for you :) is the best for you.

> And wpd2html produced some very impressive-looking results:
> http://dl.dropbox.com/u/271144/testwp5.html
> http://dl.dropbox.com/u/271144/character6.html

Thanks! It is a result of several nights of diving into the unicode 
charts. First it was Ariya, then me with Smokey. So happy that it works 
so far.

> As I said, I'll have to check these later this weekend.

Ok, just to clarify what would be good:

1) Check what the file should look like and what it looks like. When you 
find a differences in glyphs, please note the WP charset and WP char of 
the offending character. In case you want to go further, just check the 
unicode charts, both visually and also the names of the characters and 
provide the right unicode character. In case there is no possibility to 
have direct mapping (i.e. the character is composed from a combining 
diacritics and a character), I would gladly have the multi-character 
mapping too, but also some close approximation (character without 
diacritics for instance) of 1:1 mapping. I will extend the mechanism so 
that we can handle sequences of characters as results, but for the time 
being, I want first to see whether it is worth it.

Thanks for your work.

Fridrich

1 2 3 > >> (Page 1 of 3)