You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(17) |
Feb
(32) |
Mar
(1) |
Apr
(33) |
May
(101) |
Jun
(8) |
Jul
(4) |
Aug
(13) |
Sep
(27) |
Oct
(27) |
Nov
(36) |
Dec
(22) |
2004 |
Jan
(91) |
Feb
(99) |
Mar
(109) |
Apr
(40) |
May
(18) |
Jun
(20) |
Jul
(42) |
Aug
(78) |
Sep
(35) |
Oct
(15) |
Nov
(43) |
Dec
(53) |
2005 |
Jan
(95) |
Feb
(80) |
Mar
(12) |
Apr
(45) |
May
(3) |
Jun
(18) |
Jul
(10) |
Aug
(12) |
Sep
(7) |
Oct
(4) |
Nov
(12) |
Dec
(7) |
2006 |
Jan
|
Feb
|
Mar
(7) |
Apr
(5) |
May
(11) |
Jun
(5) |
Jul
(4) |
Aug
(21) |
Sep
(4) |
Oct
(5) |
Nov
(4) |
Dec
(1) |
2007 |
Jan
|
Feb
|
Mar
(4) |
Apr
(17) |
May
(12) |
Jun
(11) |
Jul
(3) |
Aug
(4) |
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
(5) |
Feb
(3) |
Mar
|
Apr
(8) |
May
(2) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
(6) |
Nov
(11) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(62) |
May
(3) |
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(5) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
2015 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2016 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Smokey A. <alq...@ar...> - 2011-06-06 22:59:29
|
Hi Fridrich, Sorry you had to re-do the WP5-Arabic13 mapping fixes since I never got my patch to you in time (I was planning on fixing 13 and 14 both and then sending the patch, but I got pulled away on other things before I got very far into 14) :-( I have one small patch for the new composed characters (very nice!), though. I think we should swap the order of the lam-alif and the fathatan/faux-wasla in those pairs so that the latter display over/to the left of the alif part of the lam-alif, instead of before (to the right of) the entire lam-alif in wpd2text's output. Swapping the order appears to make no difference in browser rendering of the output of wpd2html (I checked in a recent release of both Gecko and WebKit), nor does it appear to make a difference in rendering of the output of wpd2odt. (This also matches the output of wp2rtf for lam-alif with fathatan.) Finally, this puts the characters of this "character group" into the file in the same order as if you were to physically type the "character group": lam-alif, then fathatan. It looks like a "stand-alone" wasla glyph didn't make the new pedagogical glyphs introduced in Unicode 6 (which is sad, since someone apparently wrote a proposal for that all the way back in 2002: http://lists.arabeyes.org/archives/general/2002/January/msg00000.html). I don't have any better ideas for a glyph than the circumflex that wp2rtf used; that at least will let people see and uniquely search for any cases they might have used that (or the lam-alif-wasla characters), but hopefully those are rare enough to never see real-world usage. Also, gcc 4.0.1 over here on Mac OS X 10.5.8 is unhappy about your comparison between signed and unsigned in the WP42DefineColumnsGroup.cpp and the build fails, so attached is the patch I used to fix that. Smokey |
From: Edward M. <em...@co...> - 2011-06-06 22:29:20
|
On 6/6/2011 3:44 PM, Edward Mendelson wrote: > I've posted two files that shipped with WPMac 3.5 Enhanced that cause a general error when I try to open them in LibreOffice 3.4.0 on the Mac. I haven't tested them under Windows or Linux. > > http://dl.dropbox.com/u/271144/ProblemWPFiles.zip > > I hope these are useful to someone! > > Thanks... > > It turns out that I only needed to add a .WPD extension to the filenames, and they opened perfectly. Is it possible that a future version of LibreOffice might check unknown filetypes for the WordPerfect file "signature" to prevent this apparent problem? Edward Mendelson Contributing Editor PC Magazine |
From: Edward M. <EM...@co...> - 2011-06-06 19:44:30
|
I've posted two files that shipped with WPMac 3.5 Enhanced that cause a general error when I try to open them in LibreOffice 3.4.0 on the Mac. I haven't tested them under Windows or Linux. http://dl.dropbox.com/u/271144/ProblemWPFiles.zip I hope these are useful to someone! Thanks... Edward Mendelson Contributing Editor PC Magazine |
From: Edward M. <em...@co...> - 2011-05-15 19:40:57
|
Please ignore my previous message about unusable output from wpd2odt under OS X. The problem was user-stupidity. I replaced wpd2html with wpd2odt in an AppleScript, but I forgot that wpd2odt does not require ">" when creating an output file, as wpd2html does. So instead of writing a converted ODT file to disk, I was writing something different - which was whatever wpd2odt sends to standard output when used with a redirection switch. My apologies for wasting time and bandwidth. Edward Mendelson |
From: Edward M. <EM...@co...> - 2011-05-15 17:18:39
|
Here's a curious problem that seems to have developed only in very recent versions of libwpd, though I'm not certain which ones. I've built wpd2odt from the latest source under OS X. When I use wpd2odt to convert any WP file, the resulting ODT file opens correctly in LibreOffice, but it opens as raw text in OS X's TextEdit. This problem did not occur with older versions of wpd2odt (as recently as a few weeks ago). The same problem occurs when I use OS X's textutil to convert the ODT file to rtf format; presumably OS X uses the same code to display the ODT file in TextEdit that it uses for textutil. I've put together a ZIP file with samples: http://dl.dropbox.com/u/271144/OSXproblemODT.zip Now, I understand that libwpd can't fix problems in OS X. But something seems to have changed recently in the way wpd2odt works that causes this problem in TextEdit and textutil. I can't guess what it might be, but I will be happy to test any files that anyone might want to send me. Thank you all. Edward Mendelson |
From: Edward M. <em...@co...> - 2011-05-05 19:31:07
|
Apologies for bothering the list with another beginner's question. After getting advice here that let me build a static wpd2odt under OS X, may I ask for advice on building it under Windows? I use Visual C++ 2008, and can easily build libwpd simply by using the defaults. But when I try to build writerperfect, it compalains: Cannot open include file: 'libwpd/libwpd.h' I've tried various ways of modifying the build environment, but since I don't know what I'm doing, I haven't succeeded. Any help would be gratefully received. Thank you. Edward Mendelson |
From: Edward M. <em...@co...> - 2011-04-27 14:28:48
|
On 4/27/2011 9:31 AM, Edward Mendelson wrote: > > libwpd (through writerperfect) correctly decrypts WP 5.x password-protected files. But, at least in my limited tests, it doesn't recognize password-protected WP6.2 files as valid WP files. Is this the way it works at this stage of development, or am I doing something wrong? > > File attached is encrypted with the password Apologies for wasting bandwidth with my previous message. The November 2010 news bulletin clearly states that the latest version includes: Conversion of password protected WP1, WP3, WP42 and WP5 documents. That answers my question. (And tells me that I will need to add a routine to the AppleScript wrapper I'm writing for wpd2odt that will detect whether a password-protected file is a supported <= version 5 document or a non-supported >= version 6 document - an enjoyable challenge!) |
From: Edward M. <em...@co...> - 2011-04-27 13:32:07
|
Hello, libwpd (through writerperfect) correctly decrypts WP 5.x password-protected files. But, at least in my limited tests, it doesn't recognize password-protected WP6.2 files as valid WP files. Is this the way it works at this stage of development, or am I doing something wrong? File attached is encrypted with the password passwd I'm testing under OS X with the latest code. Thanks! Edward |
From: Fridrich S. <fri...@bl...> - 2011-04-24 20:11:48
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, On 24/04/11 07:17, Fridrich Strba wrote: >> 4,100 seems to be 1D11E >> 4,101 seems to be 1D122 > > Don't know what to do with these ones though. We store the conversion > results as UCS2. Let me see what we can do. Ok, I just made libwpd use internally UCS4, so codes from ranges above FFFF can be now used freely. Cheers F. - -- Please avoid sending me Word, Excel or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk20g/wACgkQu9a1imXPdA9+3gCdEV1h8QsMi34LjArzWfIxPSK9 XrUAni89+4iO97fi2VRkZ+Qvr4wUrPgs =VKoD -----END PGP SIGNATURE----- |
From: Edward M. <em...@co...> - 2011-04-24 05:47:58
|
On 24 Apr 2011, at 1:17 AM, Fridrich Strba wrote: > Thanks, Edward, > > On 24/04/2011 07:05, Edward Mendelson wrote: >> 1. Quite a few WP characters have no unicode equivalents, and there is no way to fix that. > > Yeah, as they say in Swahili: "Maisha ndyvio, alivio" or as we say here > in the socialist Europe, "C'est la vie" :) Indeed, we cannot do much in > this apart approximate wherever it seems useful. > >> 2. In TEST.WP (the WP5.1 file), 6,56 through 6,234 didn't convert at all; but these characters are correctly converted in the WP6.x CHARACT6.DOC. You evidently have different tables for 5.x and 6+, and I think you can simply copy the 6,56 through 6,234 mappings from the 6+ table to the 5.x table. > > Will do that this week. It will need then one other comparison run > because there might be subtle differences we would like to catch. > >> 3. In the converted CHARACT6.DOC, I think it may be possible to add these: >> 2,44 seems to be 0361 >> 2,45 seems to be 035C >> 4,100 seems to be 1D11E >> 4,101 seems to be 1D122 > > Don't know what to do with these ones though. We store the conversion > results as UCS2. Let me see what we can do. > >> 6,83 seems to be 2A38 >> 9,83 seems to be 05AA > > Will correct this too. Do you mind to be marked as author of the > contribution when I am committing into git? > >> I'll have to check the Hebrew tomorrow, but since I don't know any Hebrew, I'll be guessing. > > I did similar with the WP5 charset. Comparing pictures :) At the end it > might be fun :) > >> I finally tested the Arabic WP 5.1 files from this page: >> http://www.un.org/popin/unpopcom/32ndsess/gass.htm >> wpd2odt says they are not WordPerfect files. Apparently libwpd doesn't handle documents created by Arabic WP5.1 or Hebrew WP5.1. > > Let me see those ones, I remember I was able to somehow read some of > them. Nevertheless, I have already seen on UN web-sites documents that > were declared to be WP document when in reality they were Word documents. > > Cheers > Glad to help, and I would be very proud to be listed as author of the contribution - though I didn't do much, and I assume you'll check my reports! The Arabic files at that UN site really and truly are WP documents - they're the ones I use to test my printer drivers and fonts for Arabic WP. I know that someone was writing a book a few years ago in Arabic WP, but I haven't heard from him for a while. And now for some sleep.... Edward Mendelson Contributing Editor PC Magazine |
From: Fridrich S. <fri...@bl...> - 2011-04-24 05:18:01
|
Thanks, Edward, On 24/04/2011 07:05, Edward Mendelson wrote: > 1. Quite a few WP characters have no unicode equivalents, and there is no way to fix that. Yeah, as they say in Swahili: "Maisha ndyvio, alivio" or as we say here in the socialist Europe, "C'est la vie" :) Indeed, we cannot do much in this apart approximate wherever it seems useful. > 2. In TEST.WP (the WP5.1 file), 6,56 through 6,234 didn't convert at all; but these characters are correctly converted in the WP6.x CHARACT6.DOC. You evidently have different tables for 5.x and 6+, and I think you can simply copy the 6,56 through 6,234 mappings from the 6+ table to the 5.x table. Will do that this week. It will need then one other comparison run because there might be subtle differences we would like to catch. > 3. In the converted CHARACT6.DOC, I think it may be possible to add these: > 2,44 seems to be 0361 > 2,45 seems to be 035C > 4,100 seems to be 1D11E > 4,101 seems to be 1D122 Don't know what to do with these ones though. We store the conversion results as UCS2. Let me see what we can do. > 6,83 seems to be 2A38 > 9,83 seems to be 05AA Will correct this too. Do you mind to be marked as author of the contribution when I am committing into git? > I'll have to check the Hebrew tomorrow, but since I don't know any Hebrew, I'll be guessing. I did similar with the WP5 charset. Comparing pictures :) At the end it might be fun :) > I finally tested the Arabic WP 5.1 files from this page: > http://www.un.org/popin/unpopcom/32ndsess/gass.htm > wpd2odt says they are not WordPerfect files. Apparently libwpd doesn't handle documents created by Arabic WP5.1 or Hebrew WP5.1. Let me see those ones, I remember I was able to somehow read some of them. Nevertheless, I have already seen on UN web-sites documents that were declared to be WP document when in reality they were Word documents. Cheers F. |
From: Edward M. <em...@co...> - 2011-04-24 05:05:55
|
On 23 Apr 2011, at 4:19 PM, Fridrich Strba wrote: > Edward, > > > On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote: >> Also potentially useful: The character map document that shipped with 6.x for DOS is here: >> http://dl.dropbox.com/u/271144/CHARACT6.DOC >> It includes all 14 sets. > > Thanks for this one. I run it over wpd2text and it might be that we > actually do a good job here. Now, I could use some help here. If you > people could simply do the wpd2text of those characters and see whether > all glyphs in all charsets are correctly mapped. If you find error, just > note which charset and char number and what would be the correct unicode > mapping. For characters that would correspond to 2 or more unicode > character sequence, please write that down, but give me also a closer > approximation of 1-1 mapping. > > >> I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on. > > The wp2rtf zip file contains a TEST.WP file that has all the charsets in > it. If you have the visual representation, just run it through wpd2text > and compare. I would appreciate again to have the information of wrongly > mapped glyphs. > Fridrich, Both these files (CHARACT6.DOC and TEST.WP - which is the same as the WP51 CHARACTR.DOC) produced very good results when run through wpd2html. Here are some quick notes: 1. Quite a few WP characters have no unicode equivalents, and there is no way to fix that. 2. In TEST.WP (the WP5.1 file), 6,56 through 6,234 didn't convert at all; but these characters are correctly converted in the WP6.x CHARACT6.DOC. You evidently have different tables for 5.x and 6+, and I think you can simply copy the 6,56 through 6,234 mappings from the 6+ table to the 5.x table. 3. In the converted CHARACT6.DOC, I think it may be possible to add these: 2,44 seems to be 0361 2,45 seems to be 035C 4,100 seems to be 1D11E 4,101 seems to be 1D122 6,83 seems to be 2A38 9,83 seems to be 05AA I'll have to check the Hebrew tomorrow, but since I don't know any Hebrew, I'll be guessing. Smokey, I think you know Arabic. Is there any chance you could check these? I finally tested the Arabic WP 5.1 files from this page: http://www.un.org/popin/unpopcom/32ndsess/gass.htm wpd2odt says they are not WordPerfect files. Apparently libwpd doesn't handle documents created by Arabic WP5.1 or Hebrew WP5.1. I hope these details help somewhat, and will try to report more tomorrow. Edward Mendelson Contributing Editor PC Magazine |
From: Fridrich S. <fri...@bl...> - 2011-04-24 04:55:16
|
Hello, Edward, On 23/04/2011 23:01, Edward Mendelson wrote: > I've run both files through wpd2text - and the results did not seem useful at all (see the linked file). However, if you want me to run them through wpd2odt, then the results were very useful indeed. I'll check them later. > See the results here: > http://dl.dropbox.com/u/271144/CharTests.zip Whichever workflow works the best for you :) is the best for you. > And wpd2html produced some very impressive-looking results: > http://dl.dropbox.com/u/271144/testwp5.html > http://dl.dropbox.com/u/271144/character6.html Thanks! It is a result of several nights of diving into the unicode charts. First it was Ariya, then me with Smokey. So happy that it works so far. > As I said, I'll have to check these later this weekend. Ok, just to clarify what would be good: 1) Check what the file should look like and what it looks like. When you find a differences in glyphs, please note the WP charset and WP char of the offending character. In case you want to go further, just check the unicode charts, both visually and also the names of the characters and provide the right unicode character. In case there is no possibility to have direct mapping (i.e. the character is composed from a combining diacritics and a character), I would gladly have the multi-character mapping too, but also some close approximation (character without diacritics for instance) of 1:1 mapping. I will extend the mechanism so that we can handle sequences of characters as results, but for the time being, I want first to see whether it is worth it. Thanks for your work. Fridrich |
From: Edward M. <em...@co...> - 2011-04-23 21:20:05
|
On 4/23/2011 2:18 AM, Smokey Ardisson wrote: > > Just pretend I never said that paragraph you replied to; I wasn't > completely awake when I wrote that and I had forgotten the key bit of > the problem: non-Mac WP required you to enter characters > backwards/LTR to begin with (I was thinking for some reason that the > characters were in the correct order in the file and would work > properly in a bidi-aware word processor if only we'd mapped to the > normal codepoints) :-P > Smokey, I haven't looked into the file structure of WP 5.1 Hebrew and Arabic, but both of those versions expected you to enter Hebrew and Arabic text from right to left. There are codes that switch direction. I haven't tested libwpd, but presumably it handles this? Edward |
From: Edward M. <em...@co...> - 2011-04-23 21:01:21
|
On 23 Apr 2011, at 4:42 PM, Edward Mendelson wrote: > > On 4/23/2011 4:19 PM, Fridrich Strba wrote: >> Edward, >> >> >> On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote: >>> Also potentially useful: The character map document that shipped with 6.x for DOS is here: >>> http://dl.dropbox.com/u/271144/CHARACT6.DOC >>> It includes all 14 sets. >> >> Thanks for this one. I run it over wpd2text and it might be that we >> actually do a good job here. Now, I could use some help here. If you >> people could simply do the wpd2text of those characters and see whether >> all glyphs in all charsets are correctly mapped. If you find error, just >> note which charset and char number and what would be the correct unicode >> mapping. For characters that would correspond to 2 or more unicode >> character sequence, please write that down, but give me also a closer >> approximation of 1-1 mapping. >> >> >>> I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on. >> >> The wp2rtf zip file contains a TEST.WP file that has all the charsets in >> it. If you have the visual representation, just run it through wpd2text >> and compare. I would appreciate again to have the information of wrongly >> mapped glyphs. >> >> BTW: I hope I actually fixed (maybe apart about 10 chars) the WP5.1 >> hebrew map yesterday. Please check. >> >> Thanks for helping with this > > Hello Fridrich, > > I am happy to help. Let me make certain that I know what you asking for. > > 1. You want me to test the WPDOS 6.x character map file by running it > through wpd2text, and compare the output from wpd2text with the original > file, and report any changes. > > 2. The TEST.WP file is (except for the first line) the same as the > CHARMAP.TST file that shipped with WP 5.1 Arabic and Hebrew. It shows a > very different character set from the WPDOS 6.x character map file, and > it is (I believe) in WP5 format. You want me to run it also through > wpd2text and compare the output with the original, and report any changes. > > I will be away much of the evening (New York time) but will try to do > this some time this weekend. > > Edward > Just to clarify again: I've run both files through wpd2text - and the results did not seem useful at all (see the linked file). However, if you want me to run them through wpd2odt, then the results were very useful indeed. I'll check them later. See the results here: http://dl.dropbox.com/u/271144/CharTests.zip And wpd2html produced some very impressive-looking results: http://dl.dropbox.com/u/271144/testwp5.html http://dl.dropbox.com/u/271144/character6.html As I said, I'll have to check these later this weekend. Edward |
From: Edward M. <em...@co...> - 2011-04-23 20:43:02
|
On 4/23/2011 4:19 PM, Fridrich Strba wrote: > Edward, > > > On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote: >> Also potentially useful: The character map document that shipped with 6.x for DOS is here: >> http://dl.dropbox.com/u/271144/CHARACT6.DOC >> It includes all 14 sets. > > Thanks for this one. I run it over wpd2text and it might be that we > actually do a good job here. Now, I could use some help here. If you > people could simply do the wpd2text of those characters and see whether > all glyphs in all charsets are correctly mapped. If you find error, just > note which charset and char number and what would be the correct unicode > mapping. For characters that would correspond to 2 or more unicode > character sequence, please write that down, but give me also a closer > approximation of 1-1 mapping. > > >> I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on. > > The wp2rtf zip file contains a TEST.WP file that has all the charsets in > it. If you have the visual representation, just run it through wpd2text > and compare. I would appreciate again to have the information of wrongly > mapped glyphs. > > BTW: I hope I actually fixed (maybe apart about 10 chars) the WP5.1 > hebrew map yesterday. Please check. > > Thanks for helping with this Hello Fridrich, I am happy to help. Let me make certain that I know what you asking for. 1. You want me to test the WPDOS 6.x character map file by running it through wpd2text, and compare the output from wpd2text with the original file, and report any changes. 2. The TEST.WP file is (except for the first line) the same as the CHARMAP.TST file that shipped with WP 5.1 Arabic and Hebrew. It shows a very different character set from the WPDOS 6.x character map file, and it is (I believe) in WP5 format. You want me to run it also through wpd2text and compare the output with the original, and report any changes. I will be away much of the evening (New York time) but will try to do this some time this weekend. Edward |
From: Fridrich S. <fri...@bl...> - 2011-04-23 20:19:59
|
Edward, On Sat, 2011-04-23 at 11:13 -0400, Edward Mendelson wrote: > Also potentially useful: The character map document that shipped with 6.x for DOS is here: > http://dl.dropbox.com/u/271144/CHARACT6.DOC > It includes all 14 sets. Thanks for this one. I run it over wpd2text and it might be that we actually do a good job here. Now, I could use some help here. If you people could simply do the wpd2text of those characters and see whether all glyphs in all charsets are correctly mapped. If you find error, just note which charset and char number and what would be the correct unicode mapping. For characters that would correspond to 2 or more unicode character sequence, please write that down, but give me also a closer approximation of 1-1 mapping. > I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on. The wp2rtf zip file contains a TEST.WP file that has all the charsets in it. If you have the visual representation, just run it through wpd2text and compare. I would appreciate again to have the information of wrongly mapped glyphs. BTW: I hope I actually fixed (maybe apart about 10 chars) the WP5.1 hebrew map yesterday. Please check. Thanks for helping with this F. |
From: Edward M. <em...@co...> - 2011-04-23 20:08:27
|
Fridrich, On 23 Apr 2011, at 10:38 AM, Fridrich Strba wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Smokey > > I wrote a quick generator and generated a wp document with all charsets > from 1 to 14. Would be nice to open it in a mac that is on a system > where all the WorldScript goodies are installed, so that we can see how > the missing mac characters will be regenerated by formater. Not sure > though whether it leads somewhere. > > Find attached I was able to install the System7/8 Hebrew and Arabic language kits in a SheepShaver setup running System 7.5.5 and WPMac 3.5e, and opened your file after setting its file type so that WPMac imported it as a WP6/7/8 file. I then applied the Times New Roman font (just in case a font change triggered anything at all) and saved it. Attached. As you can see, WPMac seems to have made no effort at all to match the Mac's WorldScript fonts to the DOS character sets. This corresponds to behavior I noticed before. WPMac seems to ignore any characters in an imported file that aren't part of the DOS symbol set. However, it doesn't change them either, I think. When I covert the attached file to .odt using writerperfect, the symbol sets seem to be preserved. Does this help at all? I'll try some other things later. Edward |
From: Edward M. <em...@co...> - 2011-04-23 19:29:37
|
On 23 Apr 2011, at 10:38 AM, Fridrich Strba wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Smokey > > I wrote a quick generator and generated a wp document with all charsets > from 1 to 14. Would be nice to open it in a mac that is on a system > where all the WorldScript goodies are installed, so that we can see how > the missing mac characters will be regenerated by formater. Not sure > though whether it leads somewhere. > One thing to keep in mind: As a result of what I wrote in my previous message, it's likely that a character-map document created in 6.x format will produce different results from one created in 5.x format. The WPMac importer presumably knows to interpret the numbers in different ways, depending on whether the 5.1 or 6.x mappings were used to create the file. Also potentially useful: The character map document that shipped with 6.x for DOS is here: http://dl.dropbox.com/u/271144/CHARACT6.DOC It includes all 14 sets. I'll find and post the CHARMAP.TST that shipped with 5.1 Hebrew and Arabic later on. Smokey, do you have a 10.4/Classic Mac set up with Hebrew/Arabic installed? If not, I can try to set one up later this weekend or next week. I never installed Hebrew or Arabic on a Mac because WPMac never supported right-to-left text entry. Edward |
From: Edward M. <em...@co...> - 2011-04-23 15:05:26
|
On 23 Apr 2011, at 2:18 AM, Smokey Ardisson wrote: > At 11:23 PM +0200 on 4/21/11, Fridrich Strba wrote: > >> On 21/04/11 21:52, Smokey Ardisson wrote: >>> Do we have reason to believe that the mapping in WP5 is different from >>> the ones I contributed for WP6? >> >> Yes, we have a strong reason to believe that not all are the same. If >> you take the arabic test in the zip I pointed to and convert with a >> fresh checkout of libwpd to html, you will see remarkable differences. >> The advantage though is that that test document is actually having the >> names of the characters near to them, which will make the janitorial >> work a bit easier. > > I can't believe that a company would move around blocks of characters > from one version of the software to the next, especially when those > characters' codes are the canonical ways of identifying them :-P > :sigh: However, there were not too many differences/corrections > between what's in libwpd_internal.cpp right now for set 13. I won't be able to send more details until later today, or tomorrow, but I can confirm (what I assume you already know) that there are two states of the 5.1 character sets - one state for non-Hebrew/Arabaic 5.1, another for Hebrew and Arabic 5.1 - and that the 6.x character sets are very different from the 5.1 sets. About the two states of the 5.1 sets: until Hebrew and Arabic was released, there were only 12 sets; the Hebrew/Arabic version had a much larger set 9 (Hebrew) and added Arabic sets 13 and 14. I don't have a full list of the differences between the 5.1 and 6.x in the other character sets, but briefly: Set 1: 6.x adds some characters at the end Set 2. 5.x and 6.x are completely different. Set 4: 6.x adds some characters at the end Set 5: 5.x and 6.x are completely different Set 6: 6.x adds some characters at the end Set 8: 6.x adds some characters at the end Set 9: 6.x is vastly larger (I haven't yet checked whether it's the same as 5.1 Hebrew) Set 10: 6.x adds some characters at the end Set 11: 6.x is completely different Set 13/14: I think you discovered that these are different in 5.1 Arabic and 6.x? You can find on my Arabic and Hebrew WP page full sets of printer drivers for Arabic and Hebrew 5.1, and these may help in mapping characters: http://www.columbia.edu/~em36/wpdos/arabicandhebrew.html I'll get back to testing later today or tomorrow at the latest. Edward |
From: Fridrich S. <fri...@bl...> - 2011-04-23 14:38:33
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Smokey I wrote a quick generator and generated a wp document with all charsets from 1 to 14. Would be nice to open it in a mac that is on a system where all the WorldScript goodies are installed, so that we can see how the missing mac characters will be regenerated by formater. Not sure though whether it leads somewhere. Find attached F. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk2y5GAACgkQu9a1imXPdA8eaQCeOhfoETiETLHhVIWFd1BupE7v JdAAnA7ne8f5rYg58ae//H6SQfAXWgPF =iE77 -----END PGP SIGNATURE----- |
From: Fridrich S. <fri...@bl...> - 2011-04-23 11:23:42
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/04/11 08:18, Smokey Ardisson wrote: > I can't believe that a company would move around blocks of characters > from one version of the software to the next, especially when those > characters' codes are the canonical ways of identifying them :-P > :sigh: However, there were not too many differences/corrections > between what's in libwpd_internal.cpp right now for set 13. Yeah, but I saw differences and I noticed in each one of them they changed some characters. > Can someone (Edward?) extract the full character sets 13 and 14 from > "TEST.WP" from the wp2rtf zip and either run those through wp2rtf or > generate PDFs that show the Arabic characters for me? The "TEST_ARA" > pair of documents didn't include the first dozen codepoints in set 13 > and don't include any of set 14, so I don't have a visual reference > :-( This is what Edward posted some days ago: http://dl.dropbox.com/u/271144/CHARACTR.DOC.rtf It should have the visual references. > I have set 13 all fixed (with the exception of those first dozen that > I don't know what they look like and which have useless names), > within the confines of what's available in Unicode and with the > limitation of a single-codepoint-to-single-codepoint mapping (wp2rtf > produces more "accurate" mappings of some fancy WP codepoints that > don't have single matching Unicode glyphs by using two characters). It would be actually good to mark somewhere the exact wp characters (charset, charnumber) and their multicharacter mappings. I will then do what I do for the double byte script, will handle those where the single codepoint to single codepoint works and will add a special handling for those that need to be mapped using two or more chars. In the single codepoint to single codepoint mapping I will mark them as having 0x0000 conversion which will be a special case. Nevertheless, I will do this later. > I think new additions to Unicode since version 4.1 (when I did the > WP6 mappings) will let us successfully map some of the additional > random diacritical marks WP used to Unicode; if so, I can also fix > the old WP6 mappings for those. Feel free to fix whatever you want. Patches are always welcome :) Cheers F. - -- Please avoid sending me Word, Excel or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk2ytrMACgkQu9a1imXPdA8a8gCfZ/zVfpVif6JmCNpXtEsZ/JTN z0QAn2DY44glLmbfAzoHnG9ot1f4xrOG =rRZz -----END PGP SIGNATURE----- |
From: Smokey A. <alq...@ar...> - 2011-04-23 06:19:10
|
At 11:23 PM +0200 on 4/21/11, Fridrich Strba wrote: >On 21/04/11 21:52, Smokey Ardisson wrote: >> Do we have reason to believe that the mapping in WP5 is different from >> the ones I contributed for WP6? > >Yes, we have a strong reason to believe that not all are the same. If >you take the arabic test in the zip I pointed to and convert with a >fresh checkout of libwpd to html, you will see remarkable differences. >The advantage though is that that test document is actually having the >names of the characters near to them, which will make the janitorial >work a bit easier. I can't believe that a company would move around blocks of characters from one version of the software to the next, especially when those characters' codes are the canonical ways of identifying them :-P :sigh: However, there were not too many differences/corrections between what's in libwpd_internal.cpp right now for set 13. Can someone (Edward?) extract the full character sets 13 and 14 from "TEST.WP" from the wp2rtf zip and either run those through wp2rtf or generate PDFs that show the Arabic characters for me? The "TEST_ARA" pair of documents didn't include the first dozen codepoints in set 13 and don't include any of set 14, so I don't have a visual reference :-( I have set 13 all fixed (with the exception of those first dozen that I don't know what they look like and which have useless names), within the confines of what's available in Unicode and with the limitation of a single-codepoint-to-single-codepoint mapping (wp2rtf produces more "accurate" mappings of some fancy WP codepoints that don't have single matching Unicode glyphs by using two characters). I think new additions to Unicode since version 4.1 (when I did the WP6 mappings) will let us successfully map some of the additional random diacritical marks WP used to Unicode; if so, I can also fix the old WP6 mappings for those. > > Looking back through my files from that era, it looks like we ended up >> punting on "true" conversion for WP-Arabic and ended up mapping to >> Unicode presentation forms (except for the "stand-alone" forms of the >> letters, which got mapped to the normal, combining Unicode characters), >> so that the result was reverse-ordered, unconnected text (I think you >> were going to use Fribidi to try and reorder, but I remember vaguely >> that that effort had some problems and you intended to fix things >> elsewhere in another manner). At some point in the future, we might >> want to revisit that and map all of the WP-Arabic codepoints to normal, >> combining forms where possible for WP5/WP6. > >Yeah, I tried to do that, but it is a bit too complicated and the >reverse bidi algorithm is not even defined. Basically, you would have to >have two marks like OOXML has, to tell that this and this span is part >of the same run, because if not, you will not have a way to reorder >spans with different character properties (bold, italic) but part of the >same phrase. It was a huge mess to do and I really did not have the >courage to dive into it. Just pretend I never said that paragraph you replied to; I wasn't completely awake when I wrote that and I had forgotten the key bit of the problem: non-Mac WP required you to enter characters backwards/LTR to begin with (I was thinking for some reason that the characters were in the correct order in the file and would work properly in a bidi-aware word processor if only we'd mapped to the normal codepoints) :-P Smokey -- Smokey Ardisson alq...@ar... http://www.ardisson.org/ ------------------------------------------ "He is a fool who has forgotten what became of his ancestry seven generations before him and who does not care what will become of his progeny seven generations after him." --Kazakh Proverb |
From: Edward M. <em...@co...> - 2011-04-23 02:23:17
|
Fridrich On 22 Apr 2011, at 5:12 PM, Fridrich Strba wrote: > > On 22/04/11 16:15, Edward Mendelson wrote: >> However, I also tried the conversion with two real-world files that I was sent by the person who first asked about this. These real-world files are here: >> http://dl.dropbox.com/u/271144/Attachments.zip > > They exposed a bug in libwpd that was assuming that the Text Font > function's "font name" pascal string contains only ascii characters. I > saw in the file, that if the character code is between 0x20 (space) and > 0x7f, it is an ascii character, when it is bigger or equal to 0x80, it > is basically the higher byte of a double byte script. I fixed the font > name reading to work accordingly. Those documents have some font names > encoded in CJK characters and the xml that we were producing by > writerperfect was not a valid xml because the character strings were not > a valid UTF-8 due to our misunderstanding that I mentioned above. > > It is now enough to pull libwpd again doing: > > git pull -r > > inside the libwpd directory and rebuild that one. > > BTW, to the question how to build a static everything, configure all > (libwpd, libwpg and writerperfect) using --enable-static > - --disable-shared configure options. You can even change the installation > location by using --prefix=/path/where/you/want/to/have/them. > Note, that in that case, you will have to adapt your PKG_CONFIG_PATH > variable accordingly. Your instructions worked perfectly. I built a portable wpd2odt that runs on any other OS X 10.6 system. Thank you! I've written a bash script that updates libwpd and writerperfect with git and then compiles and builds all three components, so I can update this whenever you update The conversions work extremely well. Thank you! Question: is it possible to guess which version of LibreOffice will have the new version of libwpd? Edward Mendelson |
From: Fridrich S. <fri...@bl...> - 2011-04-22 21:12:23
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Edward, On 22/04/11 16:15, Edward Mendelson wrote: > However, I also tried the conversion with two real-world files that I was sent by the person who first asked about this. These real-world files are here: > http://dl.dropbox.com/u/271144/Attachments.zip They exposed a bug in libwpd that was assuming that the Text Font function's "font name" pascal string contains only ascii characters. I saw in the file, that if the character code is between 0x20 (space) and 0x7f, it is an ascii character, when it is bigger or equal to 0x80, it is basically the higher byte of a double byte script. I fixed the font name reading to work accordingly. Those documents have some font names encoded in CJK characters and the xml that we were producing by writerperfect was not a valid xml because the character strings were not a valid UTF-8 due to our misunderstanding that I mentioned above. It is now enough to pull libwpd again doing: git pull -r inside the libwpd directory and rebuild that one. BTW, to the question how to build a static everything, configure all (libwpd, libwpg and writerperfect) using --enable-static - --disable-shared configure options. You can even change the installation location by using --prefix=/path/where/you/want/to/have/them. Note, that in that case, you will have to adapt your PKG_CONFIG_PATH variable accordingly. Cheers F. - -- Please avoid sending me Word, Excel or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk2x7y4ACgkQu9a1imXPdA9q4ACbBWdxSa/BnaC8TUiTkDN+AuDz BykAn0V0PPOfEWZ9tU9lWL32Aj8aLdv5 =cLWD -----END PGP SIGNATURE----- |