From: Klaus - G. L. <Le...@we...> - 2003-02-23 15:04:08
|
This is the second mail i promised in Bug Report 1 I restored some pages from a 1.2.2 page dump to a virgin 1.3.4 wiki ( nightly 20030131 ). Outside the problems that I descibed with nonalpha chars, i noticed is that this pages are not on the RecentChanges pages. I'm not sure if this is a feature or a bug. But now to the problem with nonalpha chars in pagenames. The page names had been MESH+CreditUnions MESH+Energy MESH+EnergyMovement MESH+Quarternary MESH+Quintinary MESH+Secondary MESH+Tertiary MESH=Energy MESH+Tech stored as --- page_data/MESH%2BCreditUnions page_data/MESH%2BEnergy page_data/MESH%2BEnergyMovement page_data/MESH%2BQuarternary page_data/MESH%2BQuintinary page_data/MESH%2BSecondary page_data/MESH%2BTertiary page_data/MESH%3DEnergy page_data/MESH%2BTech Output during XHTNML dump --- MESH CreditUnions ... saved as MESH%20CreditUnions.html ... Object MESH Energy ... saved as MESH%20Energy.html ... Object MESH EnergyMovement ... saved as MESH%20EnergyMovement.html ... Object MESH Quarternary ... saved as MESH%20Quarternary.html ... Object MESH Quintinary ... saved as MESH%20Quintinary.html ... Object MESH Secondary ... saved as MESH%20Secondary.html ... Object MESH Tertiary ... saved as MESH%20Tertiary.html ... Object MESH=Energy ... saved as MESH%3DEnergy.html ... Object MESH Tech ... saved as MESH%20Tech.html ... Object some of the resulting filenames MESH%20Tech.html MESH%3DEnergy.html links in the page that links to the above pages <a href="MESH%252BEnergy.html" class="named-wiki" title="MESH+Energy"> <a href="MESH%252BTech.html" class="named-wiki" title="MESH+Tech"> here the %2B of the original filename is escaped to %252b since %25 is %. If I do are zip dump there are files missing. I assume the reason for this is that there are the other errors I described in the other mail. I i open the archive with a text editor there are error messages interspersed with the zip content. Klaus Leiss |
From: Klaus - G. L. <Le...@we...> - 2003-02-23 21:36:15
|
> Hi Klaus, > > > Are you sure they're not there? The pages are probably listed > under their original edit date (i.e. when they were last edited > on the 1.2.2 wiki). Are you sure you looked far enough back in > the RecentChanges? (Try http://path.to.your/wiki/RecentChanges?days=-1) > > Jeff You are partly right some are there under their original edit date. I'm not sure if that is a bug or a feature, what would happen if the page exist in the wiki and i restore the page than. Would that show in the RecentChanges of the day? If Yes, it is a feature to me else a bug. But now to the missing pages in the RecentChanges.This are all pages that a "+" in the page Name, since they did not show on the RecentChanges i did a search. This is the output of Findpage ( Title search ) * ?MESH CreditUnions * ?MESH Energy * ?MESH EnergyMovement * ?MESH Quarternary * ?MESH Quintinary * ?MESH Secondary * ?MESH Tertiary * MESH=Energy * ?MESH Tech It shows the same problem as the HTML Dump the + from the Title is changed to a space in the source of the page one gets <span class="wikiunknown"> <a href="MESH%20CreditUnions?action=edit" The existing pages are not found, but I can follow the links from pages in the wiki to them. They are also in the wiki. I have nothing aginst mangling PageNames but it should be consistent in all parts of the wiki. Klaus Leiss |
From: Jeff D. <da...@da...> - 2003-02-23 22:27:01
|
> You are partly right some are there under their original edit date. > I'm not sure if that is a bug or a feature, what would happen if > the page exist in the wiki and i restore the page than. If the page already exists in the wiki, the modification time of the new version is adjusted to ensure that revision modification times are monotonic. (I can't remember off-hand whether the current time is used, or just the time of the most-recent revision.) ... At least, that's what (I think) is supposed to happen... > This is the output of Findpage ( Title search ) > > * ?MESH CreditUnions > * ?MESH Energy > * ?MESH EnergyMovement > * ?MESH Quarternary > * ?MESH Quintinary > * ?MESH Secondary > * ?MESH Tertiary > * MESH=Energy > * ?MESH Tech I can't duplicate that here. I created a MESH+CreditUnions page, and it works fine for me (I haven't tried an (X)HTML dump yet, but it zips fine, and shows up in a TitleSearch just fine.) What platform & PHP version are you running on? |
From: Klaus - G. L. <Le...@we...> - 2003-02-23 23:03:59
|
> > I can't duplicate that here. I created a MESH+CreditUnions page, > and it works fine for me (I haven't tried an (X)HTML dump yet, > but it zips fine, and shows up in a TitleSearch just fine.) The normal functions, browsing editing and so on are ok. I didn't try a zip yet since I are only importing from a 1.2.2 wiki. The zipped (X)HTML did give the difficulties. > What platform & PHP version are you running on? Suse Linux 8.1 Apache 1.3.26 - 82 mod-php 4.2.2 - 82 file as database. But if I remember correctly I also tried dba once to verify it is not purely related to the file databaseand got the same problem. Warnings in the output during the creation of the web pages. But I can't remember with which version and I didn't look at the pages. I could try it again with dba if you want. Maybe it is a matter when you have more than one such file. Try to import the wiki_sample.zip that I send privately to you. If you have other suggestions I will try them tommorrow after work. Klaus Leiss |
From: Jeff D. <da...@da...> - 2003-02-24 02:12:02
|
> file as database. Ahh. I thought you had said you were using dba, so that's what I was testing with. I hadn't tried the flat-file backend at all until now. The problem (at least the TitleSearch problem) is specific to the flat-file backend.... I think I've fixed that problem now. That might have fixed some of the HTML dump problems too, but I think some problems will remain with filenames/urls... Anyhow, give it a try when you get the chance... There's a problem with using urlencoding to generate filenames for HTML output. That's because: how do you link to filenames with '%'s in them? Well ... it depends on whether you're going through a webserver, or just getting the files off of local disk. I think the answer is to switch to some other encoding scheme for filenames (probably of our own devising). That would get around the '/' in filenames problem too. Anyhoo, I'm probably not going to get to looking at it for a bit. If someone else wants to take a crack at it, feel free. (Also, the HTML dumps fail horribly if USE_PATH_INFO is false.) |
From: Martin G. <gim...@gi...> - 2003-02-25 18:55:34
|
Jeff Dairiki <da...@da...> writes: > There's a problem with using urlencoding to generate filenames for > HTML output. That's because: how do you link to filenames with '%'s > in them? Well ... it depends on whether you're going through a > webserver, or just getting the files off of local disk. > > I think the answer is to switch to some other encoding scheme for > filenames (probably of our own devising). That would get around the > '/' in filenames problem too. How about using quoted-printable? That's a standard for of encoding, but it's not used in URLs (it's used in MIME in mails) so the webserver won't decode it for us. This function encodes everything outside the alphanumeric ASCII characters as quoted-printable and then uses the builtin PHP function quoted_printable_decode() to see if the input matches the output: <?php function quoted_printable_encode($str) { $output =3D ''; $length =3D strlen($str); =20=20 for ($i =3D 0; $i < $length; $i++) { $char =3D $str[$i]; if (ereg('[a-zA-Z ]', $char)) { $output .=3D $char; } else { $output .=3D sprintf('=3D%02X', ord($char)); } } =20=20 return $output; } $input =3D '!"#=A4%&/()=3D?`|+^\'*=E6=F8=E5=E9=E8=E4=F1'; $qp =3D quoted_printable_encode($input); $output =3D quoted_printable_decode($qp); echo "Input: $input\n"; echo "QP: $qp\n"; echo "Output: $output\n"; if ($input =3D=3D=3D $output) { echo "Match\n"; } else { echo "Mismatch!\n"; } ?> With the quoted-printable encoding you're allowed to encode everything, or to leave some printable characters untouched (that's what makes this encoding 'printable' --- it can be decoded by humans), see Section 6.7 in RFC 2045: http://www.ietf.org/rfc/rfc2045.txt > Anyhoo, I'm probably not going to get to looking at it for a bit. > If someone else wants to take a crack at it, feel free. It's important for me to be able to dump a WikiWikiWeb as XHTML, as I'm going to use PhpWiki as a site generation tool. So I'll probably have a look at it... --=20 Martin Geisler My GnuPG Key: 0xF7F6B57B See http://gimpster.com/ and http://phpweather.net/ for: PHP Weather =3D> Shows the current weather on your webpage and PHP Shell =3D> A telnet-connection (almost :-) in a PHP page. |