From: Peter M. <pet...@ma...> - 2002-01-06 13:19:26
|
Hi, i have tried to use canonical_dn with UTF 8 and I think it treats strings with UTF8 encoded values wrong. Characters with codes > 127 have UTF8 encodings that consist of 2 or more bytes that have all codes > 127. Since these characters are legal in LDAPv3 DNs they should not get escaped. So line 310 of Net/LDAP/Util.pm should read $val =3D~ s/([\x00-\x1f])/sprintf("\\%02x",ord($1))/eg; instead of the current version: $val =3D~ s/([\x00-\x1f\x7f-\xff])/sprintf("\\%02x",ord($1))/eg; When changing canonical_dn() anyway, maybe changing the=20 implementation into three functions would be helpful. It would give users of Net::LDAP a standardized way of dealing with=20 DNs and parts of it (very helpful when moving entries, ..) without having to reimplement the wheel themselves. Here is my idea: ## split a DN string into its parts; code stolen from canonical_dn() ## # Synopsis: @rdns =3D splitDN($dn, %optionHash) # allowed options: # * lowercase: convert attribute names to lower case # * uppercase: convert attribute names to upper case # * sortRDN: sort RDN values # * splitRDN: split multi part RDNs into their parts sub splitDN($%) { my $dn =3D shift; my %opt =3D @_; my @dn; my @rdn; $dn =3D $dn->dn if ref($dn); while ($dn =3D~ /\G(?: \s* ([a-zA-Z][-a-zA-Z0-9]*|(?:[Oo][Ii][Dd]\.)?\d+(?:\.\d+)*) \s* =3D \s* ( (?:[^\\",=3D+<>\#;]*[^\\",=3D+<>\#;\s]|\\(?:[\\=20 ",=3D+<>#;]|[0-9a-fA-F]{2}))* | \#(?:[0-9a-fA-F]{2})+ | "(?:[^\\"]+|\\(?:[\\",=3D+<>#;]|[0-9a-fA-F]{2}))*" ) \s* (?:([;,+])\s*(?=3D\S)|$) )\s*/gcx) { my ($type,$val,$sep) =3D ($1,$2,$3); $type =3D~ s/^oid\.(\d+(\.\d+)*)$/$1/i; $type =3D lc($type) if ($opt{lowercase}); $type =3D uc($type) if ($opt{uppercase}); if ($val !~ /^#/) { $val =3D~ s/^"(.*)"$/$1/; $val =3D~ s/\\([\\ ",=3D+<>#;]|[0-9a-fA-F]{2}) /length($1)=3D=3D1 ? $1 : chr(hex($1)) /xeg; $val =3D~ s/([\\",=3D+<>#;])/\\$1/g; $val =3D~ s/([\x00-\x1F])/sprintf("\\%02x",ord($1))/eg; $val =3D~ s/(^\s+|\s+$)/"\\20" x length $1/ge; } push @rdn, "$type=3D$val"; unless (defined $sep and $sep eq '+') { @rdn =3D sort(@rdn) if ($opt{sortRDN}); push @dn, ($opt{splitRDN}) ? ((scalar(@rdn) > 1) ? [ @rdn ] : ($rdn[0] || '')) : join('+', @rdn); @rdn =3D (); } } return((length($dn) !=3D (pos($dn) || 0)) ? () : @dn); } ## join RDNs and RDN parts into a DN string ## # Synopsis: $dn =3D joinDN(@dnpartref, %optionhash) sub joinDN(\@%) { my @dnparts =3D @i{+shift}; my %opt =3D @_; my $dn =3D ''; @dnparts =3D reverse(@dnparts) if ($opt{reversed}); foreach my $part (@dnparts) { $dn .=3D (($opt{reversed}) ? \000 : ',') if ($dn); if (ref($part)) # multi part RDN { my $partlist =3D ($opt{revered}) ? reverse(@$part) : @$part; my $rdn; foreach my $rdnpart (@partlist) { return if (!$rdnpart); $rdn .=3D (($opt{reversed}) ? \001 : '+') if ($rdn); $rdn .=3D $rdnpart; } $dn .=3D $rdn; } else # single part RDN { return if (!$part); $dn .=3D $part; } } return($dn); } These two basic functions now allow to implement=20 canonical_dn() with only a few lines: sub canonical_dn($;$) { my ($dn, $rev) =3D @_; $dn =3D $dn->dn if ref($dn); my @dnparts =3D splitDN($dn, uppercase =3D> 1, splitRDN =3D> 1, sortRDN= =3D> 1); joinDN(@dnparts, reversed =3D> ($rev||0)); } Yours Peter --=20 Peter Marschall | eMail: pet...@ma... Scheffelstra=DFe 15 | pet...@is... 97072 W=FCrzburg | Tel: 0931/14721 PGP: D7 FF 20 FE E6 6B 31 74 D1 10 88 E0 3C FE 28 35 |
From: Graham B. <gb...@po...> - 2002-01-08 16:28:01
|
On Sun, Jan 06, 2002 at 01:45:17PM +0100, Peter Marschall wrote: > Hi, > > i have tried to use canonical_dn with UTF 8 and I think > it treats strings with UTF8 encoded values wrong. It probably does. > Characters with codes > 127 have UTF8 encodings that > consist of 2 or more bytes that have all codes > 127. > Since these characters are legal in LDAPv3 DNs they should > not get escaped. True, assuming the DN given is UTF8, which it should be with LDAPv3 but not v2 But the escaping should be done on the basis of the character being printable. Net::LDAP makes a bad assumption here. > So line 310 of Net/LDAP/Util.pm should read > > $val =~ s/([\x00-\x1f])/sprintf("\\%02x",ord($1))/eg; > > instead of the current version: > > $val =~ s/([\x00-\x1f\x7f-\xff])/sprintf("\\%02x",ord($1))/eg; > > > When changing canonical_dn() anyway, maybe changing the > implementation into three functions would be helpful. > It would give users of Net::LDAP a standardized way of dealing with > DNs and parts of it (very helpful when moving entries, ..) without having > to reimplement the wheel themselves. I have thought about this before, just never done it :) > Here is my idea: Looks good. Graham. > > > ## split a DN string into its parts; code stolen from canonical_dn() ## > # Synopsis: @rdns = splitDN($dn, %optionHash) > # allowed options: > # * lowercase: convert attribute names to lower case > # * uppercase: convert attribute names to upper case > # * sortRDN: sort RDN values > # * splitRDN: split multi part RDNs into their parts > sub splitDN($%) > { > my $dn = shift; > my %opt = @_; > my @dn; > my @rdn; > > $dn = $dn->dn if ref($dn); > > while ($dn =~ /\G(?: > \s* > ([a-zA-Z][-a-zA-Z0-9]*|(?:[Oo][Ii][Dd]\.)?\d+(?:\.\d+)*) > \s* > = > \s* > ( > (?:[^\\",=+<>\#;]*[^\\",=+<>\#;\s]|\\(?:[\\ > ",=+<>#;]|[0-9a-fA-F]{2}))* > | > \#(?:[0-9a-fA-F]{2})+ > | > "(?:[^\\"]+|\\(?:[\\",=+<>#;]|[0-9a-fA-F]{2}))*" > ) > \s* > (?:([;,+])\s*(?=\S)|$) > )\s*/gcx) > { > my ($type,$val,$sep) = ($1,$2,$3); > > $type =~ s/^oid\.(\d+(\.\d+)*)$/$1/i; > $type = lc($type) if ($opt{lowercase}); > $type = uc($type) if ($opt{uppercase}); > > if ($val !~ /^#/) > { > $val =~ s/^"(.*)"$/$1/; > $val =~ s/\\([\\ ",=+<>#;]|[0-9a-fA-F]{2}) > /length($1)==1 ? $1 : chr(hex($1)) > /xeg; > $val =~ s/([\\",=+<>#;])/\\$1/g; > $val =~ s/([\x00-\x1F])/sprintf("\\%02x",ord($1))/eg; > > $val =~ s/(^\s+|\s+$)/"\\20" x length $1/ge; > } > > push @rdn, "$type=$val"; > > unless (defined $sep and $sep eq '+') > { > @rdn = sort(@rdn) if ($opt{sortRDN}); > push @dn, ($opt{splitRDN}) ? > ((scalar(@rdn) > 1) ? [ @rdn ] : ($rdn[0] || '')) : > join('+', @rdn); > @rdn = (); > } > } > > return((length($dn) != (pos($dn) || 0)) ? () : @dn); > } > > > ## join RDNs and RDN parts into a DN string ## > # Synopsis: $dn = joinDN(@dnpartref, %optionhash) > sub joinDN(\@%) > { > my @dnparts = @i{+shift}; > my %opt = @_; > my $dn = ''; > > @dnparts = reverse(@dnparts) if ($opt{reversed}); > > foreach my $part (@dnparts) > { > $dn .= (($opt{reversed}) ? \000 : ',') if ($dn); > > if (ref($part)) # multi part RDN > { > my $partlist = ($opt{revered}) ? reverse(@$part) : @$part; > my $rdn; > > foreach my $rdnpart (@partlist) > { > return if (!$rdnpart); > > $rdn .= (($opt{reversed}) ? \001 : '+') if ($rdn); > $rdn .= $rdnpart; > } > $dn .= $rdn; > } > else # single part RDN > { > return if (!$part); > > $dn .= $part; > } > } > > return($dn); > } > > > These two basic functions now allow to implement > canonical_dn() with only a few lines: > > sub canonical_dn($;$) { > my ($dn, $rev) = @_; > > $dn = $dn->dn if ref($dn); > > my @dnparts = splitDN($dn, uppercase => 1, splitRDN => 1, sortRDN => 1); > > joinDN(@dnparts, reversed => ($rev||0)); > } > > > > > Yours > Peter > > -- > Peter Marschall | eMail: pet...@ma... > Scheffelstraße 15 | pet...@is... > 97072 Würzburg | Tel: 0931/14721 > PGP: D7 FF 20 FE E6 6B 31 74 D1 10 88 E0 3C FE 28 35 > |
From: Peter M. <pet...@ma...> - 2002-01-10 04:40:42
|
Hi, On Tuesday 08 January 2002 17:27, you wrote: > > Characters with codes > 127 have UTF8 encodings that > > consist of 2 or more bytes that have all codes > 127. > > Since these characters are legal in LDAPv3 DNs they should > > not get escaped. > True, assuming the DN given is UTF8, which it should be with > LDAPv3 but not v2 Iin that case, adding an option called "version" to split_dn (and eventually canonical_dn) mit help to be absolutely compatible with the old behaviour. Depending on "version" the quoting of special charactes can be done (version not given or version <=3D 2) or not done (version >=3D3). canonical_dn simply has to pass the option to split_dn But maybe this is making things too complicated ;-)) > But the escaping should be done on the basis of the character being > printable. Net::LDAP makes a bad assumption here. Hmmm, ... "Printability" may depend heavily on your terminal settings or your application.=20 On (almost )any 8 bit computer UTF8 encoded characters are printable, since UTF8 was designed to convert Unicode characters into variable length 8bit chunks. So, I wopuld suggest to - at least for LDAPv3 - leave out the encodiung of characters in the range from 128 - 255. Yours Peter --=20 Peter Marschall | eMail: pet...@ma... Scheffelstra=DFe 15 | pet...@is... 97072 W=FCrzburg | Tel: 0931/14721 PGP: D7 FF 20 FE E6 6B 31 74 D1 10 88 E0 3C FE 28 35 |
From: John B. <joh...@ne...> - 2002-01-10 15:05:51
|
Hi folks. Sorry not to comment on the schema stuff. New changes look great. I'd like to ask about the following though: Peter Marschall wrote: > > I have two questions about the frmat of schema entries that may be a > > little off topic, but bay also affect perl-ldap: > > > > * Is ' (single quote) legal inside qdstrings in schema definitions > > RFC2252 does not explicitely deny it. It simply gives the > > following definitions > > > > dstring = 1*utf8 > > qdstring = whsp "'" dstring "'" whsp > > > > without defining utf8 further > > Since a single quote is a utf8 character, this could mean > > that single quotes inside qdstrings are really allowed. > > So > > DESC 'New Object's FS Rights' > > might be a legal description. Yep. But the question for me is whether ( and ) and whitespace are permitted (which they presumably are although I must confess to not having checked utf8 :-) because that makes the whole thing technically unparsable unless I am missing something. What about: attributeTypes: ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' DESC 'Standard Attribute' SYNTAX 2.16.840.1.113719.1.1.5.1.15{64512} X-NDS_NAME 'New Object' s FS Rights - correct this time :-)' X-NDS_NOT_SCHED_SYNC_IMMEDIATE '1' ) ? How can you parse that sensibly? (Even in perl...) The '1*utf8' seems to make a nonsense of any possibility of parsing. Peter Marschall wrote: > > I do not ask this just for fun. > > I find the assumptions made by Net::LDAP::Schema reasonable > > (since the things above make parsing the schema quite hard). Graham Barr wrote: > Right. Its not so much that it is hard, but it will be slower. I don't see how it is possible at all, without defining an additional escape mechanism or list of disallowed characters (the previous approach). regards, jb |
From: Graham B. <gb...@po...> - 2002-01-10 15:29:34
|
On Thu, Jan 10, 2002 at 03:05:20PM +0000, John Berthels wrote: > Graham Barr wrote: > > > Right. Its not so much that it is hard, but it will be slower. > > I don't see how it is possible at all, without defining an additional > escape mechanism or list of disallowed characters (the previous approach). Anything is possible, its just how long you want to take to do it. Looking at existing code to see what other do/did is also helpful. Graham. |
From: Chris R. <chr...@me...> - 2002-01-10 16:10:03
|
John Berthels <joh...@ne...> wrote: > Hi folks. > > Sorry not to comment on the schema stuff. New changes look great. I'd > like to ask about the following though: > > > Peter Marschall wrote: > >> > I have two questions about the frmat of schema entries that may be a >> > little off topic, but bay also affect perl-ldap: >> > >> > * Is ' (single quote) legal inside qdstrings in schema definitions >> > RFC2252 does not explicitely deny it. It simply gives the >> > following definitions >> > >> > dstring = 1*utf8 >> > qdstring = whsp "'" dstring "'" whsp >> > >> > without defining utf8 further >> > Since a single quote is a utf8 character, this could mean >> > that single quotes inside qdstrings are really allowed. >> > So >> > DESC 'New Object's FS Rights' >> > might be a legal description. > > Yep. But the question for me is whether ( and ) and whitespace are > permitted (which they presumably are although I must confess to not having > checked utf8 :-) because that makes the whole thing technically unparsable > unless I am missing something. > > What about: > > attributeTypes: ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' > DESC 'Standard Attribute' SYNTAX 2.16.840.1.113719.1.1.5.1.15{64512} > X-NDS_NAME 'New Object' s FS Rights - correct this time :-)' > X-NDS_NOT_SCHED_SYNC_IMMEDIATE '1' ) > > ? How can you parse that sensibly? (Even in perl...) > > The '1*utf8' seems to make a nonsense of any possibility of parsing. Nod. IIRC there is some IETF work going on to fix that ABNF. It actually cannot be parsed correctly for more reasons - whsp is defined as optional space. For example this monstrosity is legal: ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' DESCNAMESUPSYNTAXBORGCOLLECTIVE SUP anotherAttribute ) Cheers, Chris |
From: Graham B. <gb...@po...> - 2002-01-10 16:20:44
|
On Thu, Jan 10, 2002 at 04:09:46PM -0000, Chris Ridd wrote: > > The '1*utf8' seems to make a nonsense of any possibility of parsing. > > Nod. IIRC there is some IETF work going on to fix that ABNF. It actually > cannot be parsed correctly for more reasons - whsp is defined as optional > space. Darn, I did not notice that. Well I treat it as non-optional after a ' if you want ' in a qdstring :) > For example this monstrosity is legal: > > ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' > DESCNAMESUPSYNTAXBORGCOLLECTIVE SUP anotherAttribute ) Well all I can say is if you write that you get deserve all you get, or don't as the case may be. Graham. |
From: Peter M. <pet...@ma...> - 2002-01-11 04:40:46
|
Hi, On Thursday 10 January 2002 17:09, you wrote: > For example this monstrosity is legal: > > ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' > DESCNAMESUPSYNTAXBORGCOLLECTIVE SUP anotherAttribute ) Shouldn't DESC be followed by a qdstring ? I'm missing the quotes. Yours Peter --=20 Peter Marschall | eMail: pet...@ma... Scheffelstra=DFe 15 | pet...@is... 97072 W=FCrzburg | Tel: 0931/14721 PGP: D7 FF 20 FE E6 6B 31 74 D1 10 88 E0 3C FE 28 35 |
From: Chris R. <chr...@me...> - 2002-01-11 08:35:03
|
Peter Marschall <pet...@ma...> wrote: > Hi, > > On Thursday 10 January 2002 17:09, you wrote: >> For example this monstrosity is legal: >> >> ( 2.16.840.1.113719.1.56.4.1.1 NAME 'newObjectSFSRights' >> DESCNAMESUPSYNTAXBORGCOLLECTIVE SUP anotherAttribute ) > Shouldn't DESC be followed by a qdstring ? > I'm missing the quotes. You're right, and maybe that means that anything following by a quoted string is OK (until the quoted string contains ["()].) Anything that contained a 'woid' argument however is tricky because: oid = descr / numericoid descr = keystring numericoid = numericstring *( "." numericstring ) woid = whsp oid whsp Is EQUALITYNOORDERINGBORGCOLLECTIVE legal? I think it is, and could be parsed as: EQUALITY NO ORDERING BORG COLLECTIVE EQUALITY NO ORDERING BORGCOLLECTIVE EQUALITY NOORDERINGBORG COLLECTIVE EQUALITY NOORDERINGBORGCOLLECTIVE All of which are legal interpretations. Cheers, Chris |
From: Peter M. <pet...@ma...> - 2002-01-11 04:41:03
|
Hi, On Thursday 10 January 2002 16:05, you wrote: > > > RFC2252 does not explicitely deny it. It simply gives the > > > following definitions > > > dstring =3D 1*utf8 > > > qdstring =3D whsp "'" dstring "'" whsp > > > without defining utf8 further > > > Since a single quote is a utf8 character, this could mean > > > that single quotes inside qdstrings are really allowed. > Yep. But the question for me is whether ( and ) and whitespace are > permitted (which they presumably are although I must confess to not hav= ing > checked utf8 :-) because that makes the whole thing technically unparsa= ble > unless I am missing something. > ? How can you parse that sensibly? (Even in perl...) > > The '1*utf8' seems to make a nonsense of any possibility of parsing. If you take "utf8" as the definition of a UTF8 encoded Unicode character, parsing still stays possible (if it is impossible, it is for other reason= s). UTF8 encodes 7 bit ASCII to 7bit ASCII and any characters beyond \x7f into a sequence of characters in the range [\x80-\xFF]. Since all separation characters used in RFC 2252 are 7bit ASCII, parsing stays possible without having to think about those non-ASCII characters (just let them where they are ;-) > I don't see how it is possible at all, without defining an additional > escape mechanism or list of disallowed characters (the previous approac= h). My first idea was not only to check for allowed characters but to check f= or=20 those special words defined in RFC 2252 (DESC, MUST, MAY, ..) But I was to lazy to do it that way ;-)) Yours Peter --=20 Peter Marschall | eMail: pet...@ma... Scheffelstra=DFe 15 | pet...@is... 97072 W=FCrzburg | Tel: 0931/14721 PGP: D7 FF 20 FE E6 6B 31 74 D1 10 88 E0 3C FE 28 35 |
From: John B. <joh...@ne...> - 2002-01-11 09:29:41
|
> Since all separation characters used in RFC 2252 are 7bit ASCII, > parsing stays possible without having to think about those non-ASCII > characters (just let them where they are ;-) Isn't that the problem? The qdstrings are allowed to contain all the seperation characters in any combination they like. How then to detect the end of the qdstring? > > I don't see how it is possible at all, without defining an additional > > escape mechanism or list of disallowed characters (the previous approach). > My first idea was not only to check for allowed characters but to check for > those special words defined in RFC 2252 (DESC, MUST, MAY, ..) > But I was to lazy to do it that way ;-)) And those special words may also exist inside a qdstring. I am sure that we can define a reasonable approach by adding restrictions on qdstrings. However, I am really missing something if there is a way of parsing this 'as is'. regards, jb |