From: Thomas B. <Tho...@gm...> - 2007-12-20 14:20:30
Attachments:
query-like.patch
|
Yesterday I needed LIKE queries for properties, so I added it to SMW (patch attached). It was surprisingly simple. This patch adds a new comparator, %, to the already existing <, > and !. So you can say [[has capital::%A%]] and it will return all pages that have a property "has capital", with a value starting with "A". Notes: * The first % tells SMW that the following is to be interpreted as a parameter to LIKE. So if you want to get all pages with capitals that have an A in them, you'd have to say [[has capital::%%A%]] * NOT LIKE would be trivial to add, I just didn't need it and couldn't decide on the character to use (maybe § or &?) * Probably it should be possible to disable LIKE queries, as they could be quite expensive. * I saw that there is some (planned?) support for LIKE queries already, but it looked like it was not applicable to {{#ask}}, so I didn't reuse it. Thomas |
From: Asheesh L. <as...@cr...> - 2007-12-20 15:10:14
|
On Thu, 20 Dec 2007, Thomas Bleher wrote: > Yesterday I needed LIKE queries for properties, so I added it to SMW > (patch attached). It was surprisingly simple. This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. It would be great if later SMW could have Valgol support <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. -- Asheesh. P.S. In all total like seriousness, queries with LIKE support are a good idea.... -- The star of riches is shining upon you. |
From: Markus <ma...@ai...> - 2007-12-28 07:37:17
|
Thanks. I have applied the patch, and added a way of configuring this featu= re:=20 the parameter $smwgQComparators gives a (|-separated) list of supported=20 comparators, and can be used to enable or disable any of <, >, !, and %. By= =20 default its value is '<|>|!|%'. In this way one can also disable ! or even <, > if these are considered to = be=20 problematic. I wonder whether one should use another character instead of "%" as a wildc= ard=20 inside the pattern string, so that no double-% confusion can arise. Would *= =20 be an alternative or would it be too confusing w.r.t. the old <ask> print=20 requests? What about +? According examples (preprocessing would in each cas= e=20 ensure full compatibility with SQL): =2D %%substring% =2D %*substring* =2D %+substring+ Cheers Markus On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > On Thu, 20 Dec 2007, Thomas Bleher wrote: > > Yesterday I needed LIKE queries for properties, so I added it to SMW > > (patch attached). It was surprisingly simple. > > This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. > > It would be great if later SMW could have Valgol support > <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > > -- Asheesh. > > P.S. In all total like seriousness, queries with LIKE support are a good > idea.... > > -- > The star of riches is shining upon you. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: Yaron K. <ya...@gm...> - 2007-12-28 15:18:18
|
How about ~%substring% instead? The "~" is the symbol for pattern matching in Perl and some UNIX languages, and it might be a clearer indicator of function than "%". -Yaron On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch <ma...@ai...> wrot= e: > Thanks. I have applied the patch, and added a way of configuring this > feature: > the parameter $smwgQComparators gives a (|-separated) list of supported > comparators, and can be used to enable or disable any of <, >, !, and %. > By > default its value is '<|>|!|%'. > > In this way one can also disable ! or even <, > if these are considered t= o > be > problematic. > > I wonder whether one should use another character instead of "%" as a > wildcard > inside the pattern string, so that no double-% confusion can arise. Would > * > be an alternative or would it be too confusing w.r.t. the old <ask> print > requests? What about +? According examples (preprocessing would in each > case > ensure full compatibility with SQL): > > - %%substring% > - %*substring* > - %+substring+ > > Cheers > > Markus > > On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > > On Thu, 20 Dec 2007, Thomas Bleher wrote: > > > Yesterday I needed LIKE queries for properties, so I added it to SMW > > > (patch attached). It was surprisingly simple. > > > > This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. > > > > It would be great if later SMW could have Valgol support > > <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > > > > -- Asheesh. > > > > P.S. In all total like seriousness, queries with LIKE support are a goo= d > > idea.... > > > > -- > > The star of riches is shining upon you. > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Semediawiki-devel mailing list > > Sem...@li... > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > -- > Markus Kr=F6tzsch > Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > ma...@ai... www http://korrekt.org > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > |
From: Markus <ma...@ai...> - 2007-12-28 17:37:02
|
On Freitag, 28. Dezember 2007, Yaron Koren wrote: > How about ~%substring% instead? The "~" is the symbol for pattern matching > in Perl and some UNIX languages, and it might be a clearer indicator of > function than "%". > I would immediately use that, but IFRC the Halo extension has a similar syn= tax=20 for a custom editing-distance database function (requires modified MySQL=20 version, and probably also has significant performance issues). So the question is whether we want to overwrite that (assuming that this=20 particular Halo function is not used widely), or is there another idea for= =20 doing it? Other imaginable operators on my keyboard would be #, &, ?, @ --= =20 none really as nice as ~ ... Markus =20 > > On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch <ma...@ai...> wr= ote: > > Thanks. I have applied the patch, and added a way of configuring this > > feature: > > the parameter $smwgQComparators gives a (|-separated) list of supported > > comparators, and can be used to enable or disable any of <, >, !, and %. > > By > > default its value is '<|>|!|%'. > > > > In this way one can also disable ! or even <, > if these are considered > > to be > > problematic. > > > > I wonder whether one should use another character instead of "%" as a > > wildcard > > inside the pattern string, so that no double-% confusion can arise. Wou= ld > > * > > be an alternative or would it be too confusing w.r.t. the old <ask> pri= nt > > requests? What about +? According examples (preprocessing would in each > > case > > ensure full compatibility with SQL): > > > > - %%substring% > > - %*substring* > > - %+substring+ > > > > Cheers > > > > Markus > > > > On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > > > On Thu, 20 Dec 2007, Thomas Bleher wrote: > > > > Yesterday I needed LIKE queries for properties, so I added it to SMW > > > > (patch attached). It was surprisingly simple. > > > > > > This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. > > > > > > It would be great if later SMW could have Valgol support > > > <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > > > > > > -- Asheesh. > > > > > > P.S. In all total like seriousness, queries with LIKE support are a > > > good idea.... > > > > > > -- > > > The star of riches is shining upon you. > > > > -----------------------------------------------------------------------= =2D- > > > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > _______________________________________________ > > > Semediawiki-devel mailing list > > > Sem...@li... > > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > -- > > Markus Kr=F6tzsch > > Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > > ma...@ai... www http://korrekt.org > > > > -----------------------------------------------------------------------= =2D- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Semediawiki-devel mailing list > > Sem...@li... > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: DanTMan <dan...@te...> - 2007-12-29 07:25:12
|
A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals > Greater than >= Greater than or equal to < Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has value::!@Value]] or [[Has value::!#Value]]. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... There also is the possibility of instead of a separator, using brackets to encompass a comparator. I can hardly think of many places which would use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we also have the {} and [] type brackets. [] is used by external links, but {} is only used in multiples as a template or variable bit but never has use singularly, templates and values will have already been parsed out so only the singles remain, and as a bonus, { and } are illegal in titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with a legal title or match you can make. If you're worried about templates and parsing issues, those can't occur when your using something like {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. The only potential class is if someone wants to use {{{comparator|EQ}}} to specify the comparator. In that case, we could easily make { EQ } valid (trim spaces), so "{ {{{comparator|EQ}}} }" would work. But... now I'm droning a bit much... ~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com) Markus Krötzsch wrote: > On Freitag, 28. Dezember 2007, Yaron Koren wrote: > >> How about ~%substring% instead? The "~" is the symbol for pattern matching >> in Perl and some UNIX languages, and it might be a clearer indicator of >> function than "%". >> >> > > I would immediately use that, but IFRC the Halo extension has a similar syntax > for a custom editing-distance database function (requires modified MySQL > version, and probably also has significant performance issues). > > So the question is whether we want to overwrite that (assuming that this > particular Halo function is not used widely), or is there another idea for > doing it? Other imaginable operators on my keyboard would be #, &, ?, @ -- > none really as nice as ~ ... > > Markus > > >> On Dec 27, 2007 2:16 PM, Markus Krötzsch <ma...@ai...> wrote: >> >>> Thanks. I have applied the patch, and added a way of configuring this >>> feature: >>> the parameter $smwgQComparators gives a (|-separated) list of supported >>> comparators, and can be used to enable or disable any of <, >, !, and %. >>> By >>> default its value is '<|>|!|%'. >>> >>> In this way one can also disable ! or even <, > if these are considered >>> to be >>> problematic. >>> >>> I wonder whether one should use another character instead of "%" as a >>> wildcard >>> inside the pattern string, so that no double-% confusion can arise. Would >>> * >>> be an alternative or would it be too confusing w.r.t. the old <ask> print >>> requests? What about +? According examples (preprocessing would in each >>> case >>> ensure full compatibility with SQL): >>> >>> - %%substring% >>> - %*substring* >>> - %+substring+ >>> >>> Cheers >>> >>> Markus >>> >>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: >>> >>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: >>>> >>>>> Yesterday I needed LIKE queries for properties, so I added it to SMW >>>>> (patch attached). It was surprisingly simple. >>>>> >>>> This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. >>>> >>>> It would be great if later SMW could have Valgol support >>>> <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. >>>> >>>> -- Asheesh. >>>> >>>> P.S. In all total like seriousness, queries with LIKE support are a >>>> good idea.... >>>> >>>> -- >>>> The star of riches is shining upon you. >>>> >>> ------------------------------------------------------------------------- >>> >>> >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Semediawiki-devel mailing list >>>> Sem...@li... >>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel >>>> >>> -- >>> Markus Krötzsch >>> Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe >>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 >>> ma...@ai... www http://korrekt.org >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Semediawiki-devel mailing list >>> Sem...@li... >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel >>> > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > |
From: Markus <ma...@ai...> - 2007-12-29 15:12:37
|
On Samstag, 29. Dezember 2007, DanTMan wrote: > A lot of people are accustomed to the ? (single-character match) and * > (multi-character match) format. It would be easy to escape the '_'s and > '%'s in a match and then do a replace of ? to _ and * to %. (A little > preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern=20 syntax. So it remains to find a good symbol for the comparator. > I don't know about ~ though, in the languages I've used I recall ~ > having something to do with regex. I'd rather save that character for in > case we want to be able to use the REGEXP matching inside of SQL. > > From what I remember, I think most people with only a little insight > into technical stuff, would adjust easiest to using this set: > =3D Equals > > > Greater than > >=3D Greater than or equal to > > < Less than or equal to > ! Not > * Multi-character match > ? Single-character match > ~ regex As a note: "=3D" is not available in parser function #ask, since it has a=20 special meaning as parameter assignment, as e.g. in "format=3Dtable". The q= uery=20 is distinguished from the other parameters and print requests in #ask since= =20 it has no =3D symbol and does not start on ?. > > But I did have a thought about the @... It's not used anywhere afaik. > I did make a suggestion on using a pattern to separate the comparators > from the match value. It was using [[Property::comparitor::match]], but > as I now remember SMW lets you use :: to specify multiple properties. > However it may be a good idea if the separator was one which wouldn't > cause conflicting issues with other things.=20 Maybe I should remark that the comparator we chose will never block any sym= bol=20 from being used in values. You can always escape the initial comparator by= =20 inserting an initial space (which is ignored in all values). For instance, = to=20 look for pages with property value "<strange value>", one could write=20 [[some property:: <strange value>]] whereas [[some property::<strange value>]] would be equivalent to=20 [[some property::< strange value>]] which matches all values (alphabetically) smaller than "strange value>". So= we=20 can pick any comparator letter without conflicts. > @ is not commonly used and=20 > does provide a little bit of a way for people to understand it's use. Or > if you want a little farther from what can actually be used in a title > (To avoid clashing with things) the # is always invalid. > Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has > value::!@Value]] or [[Has value::!#Value]]. Basically, spaces already play the role of your proposed @ or #. > I'm probably droning on now... But what about finding a good separator > and allowing textual names ie: EQ[=3D], NOT/NEQ/[!] (!=3D could be thought > of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... Not sure whether that would be better internationally. "<" seems to be more= =20 universally understood than "LT". Another remark: "::!" stands for inequality (NEQ), not for negation (NOT). = It=20 looks for pages that have some property value unequal to the one that was=20 given, and it does not matter whether or not they also have some value that= =20 is equal. So a page that is annotated with [[property::1]] and=20 [[property::2]] would match a query atom [[property::!1]]. > There also is the possibility of instead of a separator, using brackets > to encompass a comparator. I can hardly think of many places which would > use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we > also have the {} and [] type brackets. [] is used by external links, but > {} is only used in multiples as a template or variable bit but never has > use singularly, templates and values will have already been parsed out > so only the singles remain, and as a bonus, { and } are illegal in > titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with > a legal title or match you can make. If you're worried about templates > and parsing issues, those can't occur when your using something like > {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. > The only potential class is if someone wants to use {{{comparator|EQ}}} > to specify the comparator. In that case, we could easily make { EQ } > valid (trim spaces), so "{ {{{comparator|EQ}}} }" would work. Yes, that would work too. But I am happy with our spaces (the fact that=20 initial and trailing spaces are ignored in all property values is the key t= o=20 make that work, and I think there is no harm in assuming that). There is, in principle, no problem with having multi-char sequences for=20 comparators, but I would prefer something that does not require=20 internationalisation. So, given that we use * and ? instead of % and _, the= re=20 are the following options: 1- [[property::%*substring*]] 2- [[property::#*substring*]] 3- [[property::~*substring*]] (clashes with Halo) 4- [[property::@*substring*]] 5- maybe more ... My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the Halo= =20 issue. Further ideas and arguments are still welcome until Sunday evening,= =20 when we hope to release SMW 1.0. Cheers, Markus > > But... now I'm droning a bit much... > > ~Daniel Friesen(Dantman) of: > -The Gaiapedia (http://gaia.wikia.com) > -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) > -and Wiki-Tools.com (http://wiki-tools.com) > > Markus Kr=F6tzsch wrote: > > On Freitag, 28. Dezember 2007, Yaron Koren wrote: > >> How about ~%substring% instead? The "~" is the symbol for pattern > >> matching in Perl and some UNIX languages, and it might be a clearer > >> indicator of function than "%". > > > > I would immediately use that, but IFRC the Halo extension has a similar > > syntax for a custom editing-distance database function (requires modifi= ed > > MySQL version, and probably also has significant performance issues). > > > > So the question is whether we want to overwrite that (assuming that this > > particular Halo function is not used widely), or is there another idea > > for doing it? Other imaginable operators on my keyboard would be #, &, = ?, > > @ -- none really as nice as ~ ... > > > > Markus > > > >> On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch <ma...@ai...>= =20 wrote: > >>> Thanks. I have applied the patch, and added a way of configuring this > >>> feature: > >>> the parameter $smwgQComparators gives a (|-separated) list of support= ed > >>> comparators, and can be used to enable or disable any of <, >, !, and > >>> %. By > >>> default its value is '<|>|!|%'. > >>> > >>> In this way one can also disable ! or even <, > if these are consider= ed > >>> to be > >>> problematic. > >>> > >>> I wonder whether one should use another character instead of "%" as a > >>> wildcard > >>> inside the pattern string, so that no double-% confusion can arise. > >>> Would * > >>> be an alternative or would it be too confusing w.r.t. the old <ask> > >>> print requests? What about +? According examples (preprocessing would > >>> in each case > >>> ensure full compatibility with SQL): > >>> > >>> - %%substring% > >>> - %*substring* > >>> - %+substring+ > >>> > >>> Cheers > >>> > >>> Markus > >>> > >>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > >>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: > >>>>> Yesterday I needed LIKE queries for properties, so I added it to SMW > >>>>> (patch attached). It was surprisingly simple. > >>>> > >>>> This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. > >>>> > >>>> It would be great if later SMW could have Valgol support > >>>> <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > >>>> > >>>> -- Asheesh. > >>>> > >>>> P.S. In all total like seriousness, queries with LIKE support are a > >>>> good idea.... > >>>> > >>>> -- > >>>> The star of riches is shining upon you. > >>> > >>> ---------------------------------------------------------------------= =2D- > >>>-- > >>> > >>>> This SF.net email is sponsored by: Microsoft > >>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>> _______________________________________________ > >>>> Semediawiki-devel mailing list > >>>> Sem...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>> > >>> -- > >>> Markus Kr=F6tzsch > >>> Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > >>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > >>> ma...@ai... www http://korrekt.org > >>> > >>> ---------------------------------------------------------------------= =2D- > >>>-- This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> _______________________________________________ > >>> Semediawiki-devel mailing list > >>> Sem...@li... > >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > ------------------------------------------------------------------------ > > > > -----------------------------------------------------------------------= =2D- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Semediawiki-devel mailing list > > Sem...@li... > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: DanTMan <dan...@te...> - 2007-12-30 02:54:58
|
^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ: [[property::value]] NEQ: [[property::!value]] GT: [[property::>value]] LT: [[property::<value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::>=value]] LTEQ: [[property::<=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#<value]] (Natural order greater than) NLT: [[property::#>value]] (Natural order less than) NGTEQ: [[property::#<=value]] (Natural order greater than or equal to) NLTEQ: [[property::#>=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. (> can be negated with <=, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of "1.2.3" and "1.12.3". Using a normal > it thinks that "1.2.3" is greater than "1.12.3" because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com) Markus Krötzsch wrote: > On Samstag, 29. Dezember 2007, DanTMan wrote: > >> A lot of people are accustomed to the ? (single-character match) and * >> (multi-character match) format. It would be easy to escape the '_'s and >> '%'s in a match and then do a replace of ? to _ and * to %. (A little >> preg and \ could still easily escape those.) >> > > Yes, I agree to that. I think, if nobody objects, this fixes the pattern > syntax. So it remains to find a good symbol for the comparator. > > >> I don't know about ~ though, in the languages I've used I recall ~ >> having something to do with regex. I'd rather save that character for in >> case we want to be able to use the REGEXP matching inside of SQL. >> >> From what I remember, I think most people with only a little insight >> into technical stuff, would adjust easiest to using this set: >> = Equals >> >> > Greater than >> >= Greater than or equal to >> >> < Less than or equal to >> ! Not >> * Multi-character match >> ? Single-character match >> ~ regex >> > > As a note: "=" is not available in parser function #ask, since it has a > special meaning as parameter assignment, as e.g. in "format=table". The query > is distinguished from the other parameters and print requests in #ask since > it has no = symbol and does not start on ?. > > >> But I did have a thought about the @... It's not used anywhere afaik. >> I did make a suggestion on using a pattern to separate the comparators >> from the match value. It was using [[Property::comparitor::match]], but >> as I now remember SMW lets you use :: to specify multiple properties. >> However it may be a good idea if the separator was one which wouldn't >> cause conflicting issues with other things. >> > > Maybe I should remark that the comparator we chose will never block any symbol > from being used in values. You can always escape the initial comparator by > inserting an initial space (which is ignored in all values). For instance, to > look for pages with property value "<strange value>", one could write > > [[some property:: <strange value>]] > > whereas [[some property::<strange value>]] would be equivalent to > > [[some property::< strange value>]] > > which matches all values (alphabetically) smaller than "strange value>". So we > can pick any comparator letter without conflicts. > > >> @ is not commonly used and >> does provide a little bit of a way for people to understand it's use. Or >> if you want a little farther from what can actually be used in a title >> (To avoid clashing with things) the # is always invalid. >> Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has >> value::!@Value]] or [[Has value::!#Value]]. >> > > Basically, spaces already play the role of your proposed @ or #. > > >> I'm probably droning on now... But what about finding a good separator >> and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought >> of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... >> > > Not sure whether that would be better internationally. "<" seems to be more > universally understood than "LT". > > Another remark: "::!" stands for inequality (NEQ), not for negation (NOT). It > looks for pages that have some property value unequal to the one that was > given, and it does not matter whether or not they also have some value that > is equal. So a page that is annotated with [[property::1]] and > [[property::2]] would match a query atom [[property::!1]]. > > >> There also is the possibility of instead of a separator, using brackets >> to encompass a comparator. I can hardly think of many places which would >> use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we >> also have the {} and [] type brackets. [] is used by external links, but >> {} is only used in multiples as a template or variable bit but never has >> use singularly, templates and values will have already been parsed out >> so only the singles remain, and as a bonus, { and } are illegal in >> titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with >> a legal title or match you can make. If you're worried about templates >> and parsing issues, those can't occur when your using something like >> {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. >> The only potential class is if someone wants to use {{{comparator|EQ}}} >> to specify the comparator. In that case, we could easily make { EQ } >> valid (trim spaces), so "{ {{{comparator|EQ}}} }" would work. >> > > Yes, that would work too. But I am happy with our spaces (the fact that > initial and trailing spaces are ignored in all property values is the key to > make that work, and I think there is no harm in assuming that). > > There is, in principle, no problem with having multi-char sequences for > comparators, but I would prefer something that does not require > internationalisation. So, given that we use * and ? instead of % and _, there > are the following options: > > 1- [[property::%*substring*]] > 2- [[property::#*substring*]] > 3- [[property::~*substring*]] (clashes with Halo) > 4- [[property::@*substring*]] > 5- maybe more ... > > My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the Halo > issue. Further ideas and arguments are still welcome until Sunday evening, > when we hope to release SMW 1.0. > > Cheers, > > Markus > > >> But... now I'm droning a bit much... >> >> ~Daniel Friesen(Dantman) of: >> -The Gaiapedia (http://gaia.wikia.com) >> -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) >> -and Wiki-Tools.com (http://wiki-tools.com) >> >> Markus Krötzsch wrote: >> >>> On Freitag, 28. Dezember 2007, Yaron Koren wrote: >>> >>>> How about ~%substring% instead? The "~" is the symbol for pattern >>>> matching in Perl and some UNIX languages, and it might be a clearer >>>> indicator of function than "%". >>>> >>> I would immediately use that, but IFRC the Halo extension has a similar >>> syntax for a custom editing-distance database function (requires modified >>> MySQL version, and probably also has significant performance issues). >>> >>> So the question is whether we want to overwrite that (assuming that this >>> particular Halo function is not used widely), or is there another idea >>> for doing it? Other imaginable operators on my keyboard would be #, &, ?, >>> @ -- none really as nice as ~ ... >>> >>> Markus >>> >>> >>>> On Dec 27, 2007 2:16 PM, Markus Krötzsch <ma...@ai...> >>>> > wrote: > >>>>> Thanks. I have applied the patch, and added a way of configuring this >>>>> feature: >>>>> the parameter $smwgQComparators gives a (|-separated) list of supported >>>>> comparators, and can be used to enable or disable any of <, >, !, and >>>>> %. By >>>>> default its value is '<|>|!|%'. >>>>> >>>>> In this way one can also disable ! or even <, > if these are considered >>>>> to be >>>>> problematic. >>>>> >>>>> I wonder whether one should use another character instead of "%" as a >>>>> wildcard >>>>> inside the pattern string, so that no double-% confusion can arise. >>>>> Would * >>>>> be an alternative or would it be too confusing w.r.t. the old <ask> >>>>> print requests? What about +? According examples (preprocessing would >>>>> in each case >>>>> ensure full compatibility with SQL): >>>>> >>>>> - %%substring% >>>>> - %*substring* >>>>> - %+substring+ >>>>> >>>>> Cheers >>>>> >>>>> Markus >>>>> >>>>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: >>>>> >>>>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: >>>>>> >>>>>>> Yesterday I needed LIKE queries for properties, so I added it to SMW >>>>>>> (patch attached). It was surprisingly simple. >>>>>>> >>>>>> This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. >>>>>> >>>>>> It would be great if later SMW could have Valgol support >>>>>> <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. >>>>>> >>>>>> -- Asheesh. >>>>>> >>>>>> P.S. In all total like seriousness, queries with LIKE support are a >>>>>> good idea.... >>>>>> >>>>>> -- >>>>>> The star of riches is shining upon you. >>>>>> >>>>> ----------------------------------------------------------------------- >>>>> -- >>>>> >>>>> >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> Semediawiki-devel mailing list >>>>>> Sem...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel >>>>>> >>>>> -- >>>>> Markus Krötzsch >>>>> Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe >>>>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 >>>>> ma...@ai... www http://korrekt.org >>>>> >>>>> ----------------------------------------------------------------------- >>>>> -- This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Semediawiki-devel mailing list >>>>> Sem...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel >>>>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Semediawiki-devel mailing list >>> Sem...@li... >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel >>> > > > > |
From: Yaron K. <ya...@gm...> - 2007-12-30 16:46:54
|
Dan - I doubt that there will ever be both a "regex" and a "wildcard" optio= n in SMW's query language - that seems like overkill, and somewhat bad design= . A single such option is enough, and if it happens, behind the scenes, to us= e both SQL's and PHP's pattern-matching capabilities at different times, that should be hidden from the user. So I doubt that there'll be a need for two different symbols (Markus, or anyone else, correct me if I'm wrong). So, let me argue in favor of the "~" symbol - hopefully it's not too late before the Sunday evening deadline. :) The Halo extension is a helpful one, but it's a spinoff of SMW, and thus there's no reason why it should hamper design decisions in SMW. That goes for all extensions that use Semantic MediaWiki - I know, for my own part, that the extensions I've created have to do all sorts of work to be compatible with the different versions of SMW= . That's as it should be - the spinoffs work around the main application. Fro= m what I understand, Halo is currently not compatible with the most recent versions of SMW anyway, so it needs to be modified anyway - there's no need to try to ensure backwards compatibility. And, as you point out, that functionality in Halo might not be getting used at all - though even if it were, that shouldn't affect how SMW is designed. -Yaron On Dec 29, 2007 9:54 PM, DanTMan < dan...@te...> wrote: > ^_^ ok, I thought we escaped with a \, which isn't something that normal > users would find easy to use. But a starting space escape is ok. > > I still would pick ~ as the best thing for use of REGEX and prefer a > different operator for wild cards > I guess the % is probably best for the wild card operator. Which brings > me the thought of: > > EQ: [[property::value]] > NEQ: [[property::!value]] > GT: [[property::>value]] > LT: [[property::<value]] > WILD: [[property::%value]] (Using ? and *) > > Also, I propose a few more additions since they will probably have some > good use to. > > GTEQ: [[property::>=3Dvalue]] > LTEQ: [[property::<=3Dvalue]] > NWILD: [[property::!%value]] (Negated wild card) > REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could > of course be replaced with !, [], etc... any valid in preg. > NGT: [[property::#<value]] (Natural order greater than) > NLT: [[property::#>value]] (Natural order less than) > NGTEQ: [[property::#<=3Dvalue]] (Natural order greater than or equal to) > NLTEQ: [[property::#>=3Dvalue]] (Natural order less than or equal to) > > Of course, the REGEX one is provided that we can fix the issue of > colliding with Halo. > But on note of that negated wild card. I added that one for one primary > reason. Unlike any of the other things, you cannot negate a wild card > with any other format. (> can be negated with <=3D, eq with !, and regex > can negate things inside of it. But you can't negate a wild card) Also, > remember to escape things so that we can use (\* and \? to use those > literally; I could draft all the replaces needed, but I got to go do > something first) > As for the Natural order ones, if you don't know what those are for, > it's things like values of "1.2.3" and "1.12.3". Using a normal > it > thinks that "1.2.3" is greater than "1.12.3" because the third character > is a two and the third character in the other is a 1. But a natural > order properly distinguishes the second number as 12. PHP has functions > for these built in and would be nice for use. > > ~Daniel Friesen(Dantman) of: > -The Gaiapedia ( http://gaia.wikia.com) > -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) > -and Wiki-Tools.com ( http://wiki-tools.com) > > Markus Kr=F6tzsch wrote: > > On Samstag, 29. Dezember 2007, DanTMan wrote: > > > >> A lot of people are accustomed to the ? (single-character match) and * > >> (multi-character match) format. It would be easy to escape the '_'s an= d > >> '%'s in a match and then do a replace of ? to _ and * to %. (A little > >> preg and \ could still easily escape those.) > >> > > > > Yes, I agree to that. I think, if nobody objects, this fixes the patter= n > > syntax. So it remains to find a good symbol for the comparator. > > > > > >> I don't know about ~ though, in the languages I've used I recall ~ > >> having something to do with regex. I'd rather save that character for > in > >> case we want to be able to use the REGEXP matching inside of SQL. > >> > >> From what I remember, I think most people with only a little insight > >> into technical stuff, would adjust easiest to using this set: > >> =3D Equals > >> > >> > Greater than > >> >=3D Greater than or equal to > >> > >> < Less than or equal to > >> ! Not > >> * Multi-character match > >> ? Single-character match > >> ~ regex > >> > > > > As a note: "=3D" is not available in parser function #ask, since it has= a > > special meaning as parameter assignment, as e.g. in "format=3Dtable". T= he > query > > is distinguished from the other parameters and print requests in #ask > since > > it has no =3D symbol and does not start on ?. > > > > > >> But I did have a thought about the @... It's not used anywhere afaik. > >> I did make a suggestion on using a pattern to separate the comparators > >> from the match value. It was using [[Property::comparitor::match]], bu= t > > >> as I now remember SMW lets you use :: to specify multiple properties. > >> However it may be a good idea if the separator was one which wouldn't > >> cause conflicting issues with other things. > >> > > > > Maybe I should remark that the comparator we chose will never block any > symbol > > from being used in values. You can always escape the initial comparator > by > > inserting an initial space (which is ignored in all values). For > instance, to > > look for pages with property value "<strange value>", one could write > > > > [[some property:: <strange value>]] > > > > whereas [[some property::<strange value>]] would be equivalent to > > > > [[some property::< strange value>]] > > > > which matches all values (alphabetically) smaller than "strange value>"= . > So we > > can pick any comparator letter without conflicts. > > > > > >> @ is not commonly used and > >> does provide a little bit of a way for people to understand it's use. > Or > >> if you want a little farther from what can actually be used in a title > >> (To avoid clashing with things) the # is always invalid. > >> Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has > >> value::!@Value]] or [[Has value::!#Value]]. > >> > > > > Basically, spaces already play the role of your proposed @ or #. > > > > > >> I'm probably droning on now... But what about finding a good separator > >> and allowing textual names ie: EQ[=3D], NOT/NEQ/[!] (!=3D could be tho= ught > >> of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... > >> > > > > Not sure whether that would be better internationally. "<" seems to be > more > > universally understood than "LT". > > > > Another remark: "::!" stands for inequality (NEQ), not for negation > (NOT). It > > looks for pages that have some property value unequal to the one that > was > > given, and it does not matter whether or not they also have some value > that > > is equal. So a page that is annotated with [[property::1]] and > > [[property::2]] would match a query atom [[property::!1]]. > > > > > >> There also is the possibility of instead of a separator, using bracket= s > > >> to encompass a comparator. I can hardly think of many places which > would > >> use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we > >> also have the {} and [] type brackets. [] is used by external links, > but > >> {} is only used in multiples as a template or variable bit but never > has > >> use singularly, templates and values will have already been parsed out > >> so only the singles remain, and as a bonus, { and } are illegal in > >> titles. So [[Has value::{NOT} Title]] is guaranteed to never clash wit= h > >> a legal title or match you can make. If you're worried about templates > >> and parsing issues, those can't occur when your using something like > >> {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash= . > >> The only potential class is if someone wants to use {{{comparator|EQ}}= } > >> to specify the comparator. In that case, we could easily make { EQ } > >> valid (trim spaces), so "{ {{{comparator|EQ}}} }" would work. > >> > > > > Yes, that would work too. But I am happy with our spaces (the fact that > > initial and trailing spaces are ignored in all property values is the > key to > > make that work, and I think there is no harm in assuming that). > > > > There is, in principle, no problem with having multi-char sequences for > > comparators, but I would prefer something that does not require > > internationalisation. So, given that we use * and ? instead of % and _, > there > > are the following options: > > > > 1- [[property::%*substring*]] > > 2- [[property::#*substring*]] > > 3- [[property::~*substring*]] (clashes with Halo) > > 4- [[property::@*substring*]] > > 5- maybe more ... > > > > My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the > Halo > > issue. Further ideas and arguments are still welcome until Sunday > evening, > > when we hope to release SMW 1.0. > > > > Cheers, > > > > Markus > > > > > >> But... now I'm droning a bit much... > >> > >> ~Daniel Friesen(Dantman) of: > >> -The Gaiapedia ( http://gaia.wikia.com) > >> -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG ) > >> -and Wiki-Tools.com (http://wiki-tools.com) > >> > >> Markus Kr=F6tzsch wrote: > >> > >>> On Freitag, 28. Dezember 2007, Yaron Koren wrote: > >>> > >>>> How about ~%substring% instead? The "~" is the symbol for pattern > >>>> matching in Perl and some UNIX languages, and it might be a clearer > >>>> indicator of function than "%". > >>>> > >>> I would immediately use that, but IFRC the Halo extension has a > similar > >>> syntax for a custom editing-distance database function (requires > modified > >>> MySQL version, and probably also has significant performance issues). > >>> > >>> So the question is whether we want to overwrite that (assuming that > this > >>> particular Halo function is not used widely), or is there another ide= a > >>> for doing it? Other imaginable operators on my keyboard would be #, &= , > ?, > >>> @ -- none really as nice as ~ ... > >>> > >>> Markus > >>> > >>> > >>>> On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch < mak@aifb.uni-karlsruhe.= de> > >>>> > > wrote: > > > >>>>> Thanks. I have applied the patch, and added a way of configuring > this > >>>>> feature: > >>>>> the parameter $smwgQComparators gives a (|-separated) list of > supported > >>>>> comparators, and can be used to enable or disable any of <, >, !, > and > >>>>> %. By > >>>>> default its value is '<|>|!|%'. > >>>>> > >>>>> In this way one can also disable ! or even <, > if these are > considered > >>>>> to be > >>>>> problematic. > >>>>> > >>>>> I wonder whether one should use another character instead of "%" as > a > >>>>> wildcard > >>>>> inside the pattern string, so that no double-% confusion can arise. > >>>>> Would * > >>>>> be an alternative or would it be too confusing w.r.t. the old <ask> > >>>>> print requests? What about +? According examples (preprocessing > would > >>>>> in each case > >>>>> ensure full compatibility with SQL): > >>>>> > >>>>> - %%substring% > >>>>> - %*substring* > >>>>> - %+substring+ > >>>>> > >>>>> Cheers > >>>>> > >>>>> Markus > >>>>> > >>>>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > >>>>> > >>>>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: > >>>>>> > >>>>>>> Yesterday I needed LIKE queries for properties, so I added it to > SMW > >>>>>>> (patch attached). It was surprisingly simple. > >>>>>>> > >>>>>> This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki= . > > >>>>>> > >>>>>> It would be great if later SMW could have Valgol support > >>>>>> < http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>= . > >>>>>> > >>>>>> -- Asheesh. > >>>>>> > >>>>>> P.S. In all total like seriousness, queries with LIKE support are = a > > >>>>>> good idea.... > >>>>>> > >>>>>> -- > >>>>>> The star of riches is shining upon you. > >>>>>> > >>>>> > ----------------------------------------------------------------------- > >>>>> -- > >>>>> > >>>>> > >>>>>> This SF.net email is sponsored by: Microsoft > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>> _______________________________________________ > >>>>>> Semediawiki-devel mailing list > >>>>>> Sem...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>>>>> > >>>>> -- > >>>>> Markus Kr=F6tzsch > >>>>> Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > >>>>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > >>>>> ma...@ai... www http://korrekt.org > >>>>> > >>>>> > ----------------------------------------------------------------------- > >>>>> -- This SF.net email is sponsored by: Microsoft > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>> _______________________________________________ > >>>>> Semediawiki-devel mailing list > >>>>> Sem...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>>>> > >>> > ------------------------------------------------------------------------ > >>> > >>> > ------------------------------------------------------------------------- > >>> This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> > ------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> Semediawiki-devel mailing list > >>> Sem...@li... > >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>> > > > > > > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > |
From: Markus <ma...@ai...> - 2007-12-31 15:00:47
|
On Sonntag, 30. Dezember 2007, Yaron Koren wrote: > Dan - I doubt that there will ever be both a "regex" and a "wildcard" > option in SMW's query language - that seems like overkill, and somewhat b= ad > design. A single such option is enough, and if it happens, behind the > scenes, to use both SQL's and PHP's pattern-matching capabilities at > different times, that should be hidden from the user. So I doubt that > there'll be a need for two different symbols (Markus, or anyone else, > correct me if I'm wrong). > > So, let me argue in favor of the "~" symbol - hopefully it's not too late > before the Sunday evening deadline. :)=20 There was a drastic change in the parser of MediaWiki 1.12 that has caused= =20 some delay. So deadline is moved to today ;-) > The Halo extension is a helpful one,=20 > but it's a spinoff of SMW, and thus there's no reason why it should hamper > design decisions in SMW. That goes for all extensions that use Semantic > MediaWiki - I know, for my own part, that the extensions I've created have > to do all sorts of work to be compatible with the different versions of > SMW. That's as it should be - the spinoffs work around the main > application. From what I understand, Halo is currently not compatible with > the most recent versions of SMW anyway, so it needs to be modified anyway= - > there's no need to try to ensure backwards compatibility. > > And, as you point out, that functionality in Halo might not be getting us= ed > at all - though even if it were, that shouldn't affect how SMW is designe= d. OK, I am convinced. Done. Markus > > -Yaron > > On Dec 29, 2007 9:54 PM, DanTMan < dan...@te...> wrote: > > ^_^ ok, I thought we escaped with a \, which isn't something that normal > > users would find easy to use. But a starting space escape is ok. > > > > I still would pick ~ as the best thing for use of REGEX and prefer a > > different operator for wild cards > > I guess the % is probably best for the wild card operator. Which brings > > me the thought of: > > > > EQ: [[property::value]] > > NEQ: [[property::!value]] > > GT: [[property::>value]] > > LT: [[property::<value]] > > WILD: [[property::%value]] (Using ? and *) > > > > Also, I propose a few more additions since they will probably have some > > good use to. > > > > GTEQ: [[property::>=3Dvalue]] > > LTEQ: [[property::<=3Dvalue]] > > NWILD: [[property::!%value]] (Negated wild card) > > REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could > > of course be replaced with !, [], etc... any valid in preg. > > NGT: [[property::#<value]] (Natural order greater than) > > NLT: [[property::#>value]] (Natural order less than) > > NGTEQ: [[property::#<=3Dvalue]] (Natural order greater than or equal to) > > NLTEQ: [[property::#>=3Dvalue]] (Natural order less than or equal to) > > > > Of course, the REGEX one is provided that we can fix the issue of > > colliding with Halo. > > But on note of that negated wild card. I added that one for one primary > > reason. Unlike any of the other things, you cannot negate a wild card > > with any other format. (> can be negated with <=3D, eq with !, and regex > > can negate things inside of it. But you can't negate a wild card) Also, > > remember to escape things so that we can use (\* and \? to use those > > literally; I could draft all the replaces needed, but I got to go do > > something first) > > As for the Natural order ones, if you don't know what those are for, > > it's things like values of "1.2.3" and "1.12.3". Using a normal > it > > thinks that "1.2.3" is greater than "1.12.3" because the third character > > is a two and the third character in the other is a 1. But a natural > > order properly distinguishes the second number as 12. PHP has functions > > for these built in and would be nice for use. > > > > ~Daniel Friesen(Dantman) of: > > -The Gaiapedia ( http://gaia.wikia.com) > > -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) > > -and Wiki-Tools.com ( http://wiki-tools.com) > > > > Markus Kr=F6tzsch wrote: > > > On Samstag, 29. Dezember 2007, DanTMan wrote: > > >> A lot of people are accustomed to the ? (single-character match) and= * > > >> (multi-character match) format. It would be easy to escape the '_'s > > >> and '%'s in a match and then do a replace of ? to _ and * to %. (A > > >> little preg and \ could still easily escape those.) > > > > > > Yes, I agree to that. I think, if nobody objects, this fixes the > > > pattern syntax. So it remains to find a good symbol for the comparato= r. > > > > > >> I don't know about ~ though, in the languages I've used I recall ~ > > >> having something to do with regex. I'd rather save that character for > > > > in > > > > >> case we want to be able to use the REGEXP matching inside of SQL. > > >> > > >> From what I remember, I think most people with only a little insight > > >> into technical stuff, would adjust easiest to using this set: > > >> =3D Equals > > >> > > >> > Greater than > > >> >=3D Greater than or equal to > > >> > > >> < Less than or equal to > > >> ! Not > > >> * Multi-character match > > >> ? Single-character match > > >> ~ regex > > > > > > As a note: "=3D" is not available in parser function #ask, since it h= as a > > > special meaning as parameter assignment, as e.g. in "format=3Dtable".= The > > > > query > > > > > is distinguished from the other parameters and print requests in #ask > > > > since > > > > > it has no =3D symbol and does not start on ?. > > > > > >> But I did have a thought about the @... It's not used anywhere afaik. > > >> I did make a suggestion on using a pattern to separate the comparato= rs > > >> from the match value. It was using [[Property::comparitor::match]], > > >> but > > >> > > >> as I now remember SMW lets you use :: to specify multiple properties. > > >> However it may be a good idea if the separator was one which wouldn't > > >> cause conflicting issues with other things. > > > > > > Maybe I should remark that the comparator we chose will never block a= ny > > > > symbol > > > > > from being used in values. You can always escape the initial comparat= or > > > > by > > > > > inserting an initial space (which is ignored in all values). For > > > > instance, to > > > > > look for pages with property value "<strange value>", one could write > > > > > > [[some property:: <strange value>]] > > > > > > whereas [[some property::<strange value>]] would be equivalent to > > > > > > [[some property::< strange value>]] > > > > > > which matches all values (alphabetically) smaller than "strange > > > value>". > > > > So we > > > > > can pick any comparator letter without conflicts. > > > > > >> @ is not commonly used and > > >> does provide a little bit of a way for people to understand it's use. > > > > Or > > > > >> if you want a little farther from what can actually be used in a tit= le > > >> (To avoid clashing with things) the # is always invalid. > > >> Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has > > >> value::!@Value]] or [[Has value::!#Value]]. > > > > > > Basically, spaces already play the role of your proposed @ or #. > > > > > >> I'm probably droning on now... But what about finding a good separat= or > > >> and allowing textual names ie: EQ[=3D], NOT/NEQ/[!] (!=3D could be t= hought > > >> of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... > > > > > > Not sure whether that would be better internationally. "<" seems to be > > > > more > > > > > universally understood than "LT". > > > > > > Another remark: "::!" stands for inequality (NEQ), not for negation > > > > (NOT). It > > > > > looks for pages that have some property value unequal to the one that > > > > was > > > > > given, and it does not matter whether or not they also have some value > > > > that > > > > > is equal. So a page that is annotated with [[property::1]] and > > > [[property::2]] would match a query atom [[property::!1]]. > > > > > >> There also is the possibility of instead of a separator, using > > >> brackets > > >> > > >> to encompass a comparator. I can hardly think of many places which > > > > would > > > > >> use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we > > >> also have the {} and [] type brackets. [] is used by external links, > > > > but > > > > >> {} is only used in multiples as a template or variable bit but never > > > > has > > > > >> use singularly, templates and values will have already been parsed o= ut > > >> so only the singles remain, and as a bonus, { and } are illegal in > > >> titles. So [[Has value::{NOT} Title]] is guaranteed to never clash > > >> with a legal title or match you can make. If you're worried about > > >> templates and parsing issues, those can't occur when your using > > >> something like {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so > > >> there's no clash. The only potential class is if someone wants to use > > >> {{{comparator|EQ}}} to specify the comparator. In that case, we could > > >> easily make { EQ } valid (trim spaces), so "{ {{{comparator|EQ}}} }" > > >> would work. > > > > > > Yes, that would work too. But I am happy with our spaces (the fact th= at > > > initial and trailing spaces are ignored in all property values is the > > > > key to > > > > > make that work, and I think there is no harm in assuming that). > > > > > > There is, in principle, no problem with having multi-char sequences f= or > > > comparators, but I would prefer something that does not require > > > internationalisation. So, given that we use * and ? instead of % and = _, > > > > there > > > > > are the following options: > > > > > > 1- [[property::%*substring*]] > > > 2- [[property::#*substring*]] > > > 3- [[property::~*substring*]] (clashes with Halo) > > > 4- [[property::@*substring*]] > > > 5- maybe more ... > > > > > > My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the > > > > Halo > > > > > issue. Further ideas and arguments are still welcome until Sunday > > > > evening, > > > > > when we hope to release SMW 1.0. > > > > > > Cheers, > > > > > > Markus > > > > > >> But... now I'm droning a bit much... > > >> > > >> ~Daniel Friesen(Dantman) of: > > >> -The Gaiapedia ( http://gaia.wikia.com) > > >> -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG ) > > >> -and Wiki-Tools.com (http://wiki-tools.com) > > >> > > >> Markus Kr=F6tzsch wrote: > > >>> On Freitag, 28. Dezember 2007, Yaron Koren wrote: > > >>>> How about ~%substring% instead? The "~" is the symbol for pattern > > >>>> matching in Perl and some UNIX languages, and it might be a clearer > > >>>> indicator of function than "%". > > >>> > > >>> I would immediately use that, but IFRC the Halo extension has a > > > > similar > > > > >>> syntax for a custom editing-distance database function (requires > > > > modified > > > > >>> MySQL version, and probably also has significant performance issues= ). > > >>> > > >>> So the question is whether we want to overwrite that (assuming that > > > > this > > > > >>> particular Halo function is not used widely), or is there another > > >>> idea for doing it? Other imaginable operators on my keyboard would = be > > >>> #, &, > > > > ?, > > > > >>> @ -- none really as nice as ~ ... > > >>> > > >>> Markus > > >>> > > >>>> On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch < > > >>>> ma...@ai...> > > > > > > wrote: > > >>>>> Thanks. I have applied the patch, and added a way of configuring > > > > this > > > > >>>>> feature: > > >>>>> the parameter $smwgQComparators gives a (|-separated) list of > > > > supported > > > > >>>>> comparators, and can be used to enable or disable any of <, >, !, > > > > and > > > > >>>>> %. By > > >>>>> default its value is '<|>|!|%'. > > >>>>> > > >>>>> In this way one can also disable ! or even <, > if these are > > > > considered > > > > >>>>> to be > > >>>>> problematic. > > >>>>> > > >>>>> I wonder whether one should use another character instead of "%" = as > > > > a > > > > >>>>> wildcard > > >>>>> inside the pattern string, so that no double-% confusion can aris= e. > > >>>>> Would * > > >>>>> be an alternative or would it be too confusing w.r.t. the old <as= k> > > >>>>> print requests? What about +? According examples (preprocessing > > > > would > > > > >>>>> in each case > > >>>>> ensure full compatibility with SQL): > > >>>>> > > >>>>> - %%substring% > > >>>>> - %*substring* > > >>>>> - %+substring+ > > >>>>> > > >>>>> Cheers > > >>>>> > > >>>>> Markus > > >>>>> > > >>>>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > > >>>>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: > > >>>>>>> Yesterday I needed LIKE queries for properties, so I added it to > > > > SMW > > > > >>>>>>> (patch attached). It was surprisingly simple. > > >>>>>> > > >>>>>> This would be LIKE TOTALLY AWESOME to get in to Semantic > > >>>>>> MediaWiki. > > >>>>>> > > >>>>>> > > >>>>>> It would be great if later SMW could have Valgol support > > >>>>>> < > > >>>>>> http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > > >>>>>> > > >>>>>> -- Asheesh. > > >>>>>> > > >>>>>> P.S. In all total like seriousness, queries with LIKE support are > > >>>>>> a > > >>>>>> > > >>>>>> good idea.... > > >>>>>> > > >>>>>> -- > > >>>>>> The star of riches is shining upon you. > > > > ----------------------------------------------------------------------- > > > > >>>>> -- > > >>>>> > > >>>>>> This SF.net email is sponsored by: Microsoft > > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>>>>> _______________________________________________ > > >>>>>> Semediawiki-devel mailing list > > >>>>>> Sem...@li... > > >>>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > >>>>> > > >>>>> -- > > >>>>> Markus Kr=F6tzsch > > >>>>> Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > > >>>>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > > >>>>> ma...@ai... www http://korrekt.org > > > > ----------------------------------------------------------------------- > > > > >>>>> -- This SF.net email is sponsored by: Microsoft > > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >>>>> _______________________________________________ > > >>>>> Semediawiki-devel mailing list > > >>>>> Sem...@li... > > >>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > ------------------------------------------------------------------------ > > > > > > -----------------------------------------------------------------------= =2D- > > > > >>> This SF.net email is sponsored by: Microsoft > > >>> Defy all challenges. Microsoft(R) Visual Studio 2005. > > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > > ------------------------------------------------------------------------ > > > > >>> _______________________________________________ > > >>> Semediawiki-devel mailing list > > >>> Sem...@li... > > >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > -----------------------------------------------------------------------= =2D- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Semediawiki-devel mailing list > > Sem...@li... > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: Markus <ma...@ai...> - 2007-12-30 21:10:10
|
OK, my conclusion now was to support the following syntax: [[property% *subs?r*]] where ? and * represent _ and % in SQL. Some more remarks inline below. On Sonntag, 30. Dezember 2007, DanTMan wrote: > ^_^ ok, I thought we escaped with a \, which isn't something that normal > users would find easy to use. But a starting space escape is ok. > > I still would pick ~ as the best thing for use of REGEX and prefer a > different operator for wild cards > I guess the % is probably best for the wild card operator. Which brings > me the thought of: > > EQ: [[property::value]] > NEQ: [[property::!value]] > GT: [[property::>value]] > LT: [[property::<value]] > WILD: [[property::%value]] (Using ? and *) > > Also, I propose a few more additions since they will probably have some > good use to. > > GTEQ: [[property::>=3Dvalue]] > LTEQ: [[property::<=3Dvalue]] Rarely needed, but already possible by using disjunctions. > NWILD: [[property::!%value]] (Negated wild card) Maybe in the future if someone really needs it. > REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could > of course be replaced with !, [], etc... any valid in preg. Unlikely. Regexps are very complex and not at all efficient on a DB scale=20 (even LIKE is a problem here, which is why it is disabled by default). > NGT: [[property::#<value]] (Natural order greater than) > NLT: [[property::#>value]] (Natural order less than) > NGTEQ: [[property::#<=3Dvalue]] (Natural order greater than or equal to) > NLTEQ: [[property::#>=3Dvalue]] (Natural order less than or equal to) > > Of course, the REGEX one is provided that we can fix the issue of > colliding with Halo. > But on note of that negated wild card. I added that one for one primary > reason. Unlike any of the other things, you cannot negate a wild card > with any other format. (> can be negated with <=3D, eq with !, and regex > can negate things inside of it. But you can't negate a wild card) Also, > remember to escape things so that we can use (\* and \? to use those > literally; I could draft all the replaces needed, but I got to go do > something first) > As for the Natural order ones, if you don't know what those are for, > it's things like values of "1.2.3" and "1.12.3". Using a normal > it > thinks that "1.2.3" is greater than "1.12.3" because the third character > is a two and the third character in the other is a 1. But a natural > order properly distinguishes the second number as 12. PHP has functions > for these built in and would be nice for use. On "natural orders": datatypes in SMW usually come with an own "natural ord= er"=20 and all comparators refer to those already. Things like the version numbers= =20 you mentioned are problematic unless restricted, since they are again hard = to=20 implement on a DB level efficiently (this is of course different if the=20 format is somehow restricted). Note that PHP implementations of comparators= =20 do not help us, since the DB must do all the comparisons. Markus > > ~Daniel Friesen(Dantman) of: > -The Gaiapedia (http://gaia.wikia.com) > -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) > -and Wiki-Tools.com (http://wiki-tools.com) > > Markus Kr=F6tzsch wrote: > > On Samstag, 29. Dezember 2007, DanTMan wrote: > >> A lot of people are accustomed to the ? (single-character match) and * > >> (multi-character match) format. It would be easy to escape the '_'s and > >> '%'s in a match and then do a replace of ? to _ and * to %. (A little > >> preg and \ could still easily escape those.) > > > > Yes, I agree to that. I think, if nobody objects, this fixes the pattern > > syntax. So it remains to find a good symbol for the comparator. > > > >> I don't know about ~ though, in the languages I've used I recall ~ > >> having something to do with regex. I'd rather save that character for = in > >> case we want to be able to use the REGEXP matching inside of SQL. > >> > >> From what I remember, I think most people with only a little insight > >> into technical stuff, would adjust easiest to using this set: > >> =3D Equals > >> > >> > Greater than > >> >=3D Greater than or equal to > >> > >> < Less than or equal to > >> ! Not > >> * Multi-character match > >> ? Single-character match > >> ~ regex > > > > As a note: "=3D" is not available in parser function #ask, since it has= a > > special meaning as parameter assignment, as e.g. in "format=3Dtable". T= he > > query is distinguished from the other parameters and print requests in > > #ask since it has no =3D symbol and does not start on ?. > > > >> But I did have a thought about the @... It's not used anywhere afaik. > >> I did make a suggestion on using a pattern to separate the comparators > >> from the match value. It was using [[Property::comparitor::match]], but > >> as I now remember SMW lets you use :: to specify multiple properties. > >> However it may be a good idea if the separator was one which wouldn't > >> cause conflicting issues with other things. > > > > Maybe I should remark that the comparator we chose will never block any > > symbol from being used in values. You can always escape the initial > > comparator by inserting an initial space (which is ignored in all > > values). For instance, to look for pages with property value "<strange > > value>", one could write > > > > [[some property:: <strange value>]] > > > > whereas [[some property::<strange value>]] would be equivalent to > > > > [[some property::< strange value>]] > > > > which matches all values (alphabetically) smaller than "strange value>". > > So we can pick any comparator letter without conflicts. > > > >> @ is not commonly used and > >> does provide a little bit of a way for people to understand it's use. = Or > >> if you want a little farther from what can actually be used in a title > >> (To avoid clashing with things) the # is always invalid. > >> Say, [[prop::comp@match]] or [[prop::comp#match]]. So for a not [[Has > >> value::!@Value]] or [[Has value::!#Value]]. > > > > Basically, spaces already play the role of your proposed @ or #. > > > >> I'm probably droning on now... But what about finding a good separator > >> and allowing textual names ie: EQ[=3D], NOT/NEQ/[!] (!=3D could be tho= ught > >> of),LT[<], GT[>], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... > > > > Not sure whether that would be better internationally. "<" seems to be > > more universally understood than "LT". > > > > Another remark: "::!" stands for inequality (NEQ), not for negation > > (NOT). It looks for pages that have some property value unequal to the > > one that was given, and it does not matter whether or not they also have > > some value that is equal. So a page that is annotated with > > [[property::1]] and > > [[property::2]] would match a query atom [[property::!1]]. > > > >> There also is the possibility of instead of a separator, using brackets > >> to encompass a comparator. I can hardly think of many places which wou= ld > >> use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we > >> also have the {} and [] type brackets. [] is used by external links, b= ut > >> {} is only used in multiples as a template or variable bit but never h= as > >> use singularly, templates and values will have already been parsed out > >> so only the singles remain, and as a bonus, { and } are illegal in > >> titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with > >> a legal title or match you can make. If you're worried about templates > >> and parsing issues, those can't occur when your using something like > >> {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. > >> The only potential class is if someone wants to use {{{comparator|EQ}}} > >> to specify the comparator. In that case, we could easily make { EQ } > >> valid (trim spaces), so "{ {{{comparator|EQ}}} }" would work. > > > > Yes, that would work too. But I am happy with our spaces (the fact that > > initial and trailing spaces are ignored in all property values is the k= ey > > to make that work, and I think there is no harm in assuming that). > > > > There is, in principle, no problem with having multi-char sequences for > > comparators, but I would prefer something that does not require > > internationalisation. So, given that we use * and ? instead of % and _, > > there are the following options: > > > > 1- [[property::%*substring*]] > > 2- [[property::#*substring*]] > > 3- [[property::~*substring*]] (clashes with Halo) > > 4- [[property::@*substring*]] > > 5- maybe more ... > > > > My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the > > Halo issue. Further ideas and arguments are still welcome until Sunday > > evening, when we hope to release SMW 1.0. > > > > Cheers, > > > > Markus > > > >> But... now I'm droning a bit much... > >> > >> ~Daniel Friesen(Dantman) of: > >> -The Gaiapedia (http://gaia.wikia.com) > >> -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) > >> -and Wiki-Tools.com (http://wiki-tools.com) > >> > >> Markus Kr=F6tzsch wrote: > >>> On Freitag, 28. Dezember 2007, Yaron Koren wrote: > >>>> How about ~%substring% instead? The "~" is the symbol for pattern > >>>> matching in Perl and some UNIX languages, and it might be a clearer > >>>> indicator of function than "%". > >>> > >>> I would immediately use that, but IFRC the Halo extension has a simil= ar > >>> syntax for a custom editing-distance database function (requires > >>> modified MySQL version, and probably also has significant performance > >>> issues). > >>> > >>> So the question is whether we want to overwrite that (assuming that > >>> this particular Halo function is not used widely), or is there another > >>> idea for doing it? Other imaginable operators on my keyboard would be > >>> #, &, ?, @ -- none really as nice as ~ ... > >>> > >>> Markus > >>> > >>>> On Dec 27, 2007 2:16 PM, Markus Kr=F6tzsch <mak@aifb.uni-karlsruhe.d= e> > > > > wrote: > >>>>> Thanks. I have applied the patch, and added a way of configuring th= is > >>>>> feature: > >>>>> the parameter $smwgQComparators gives a (|-separated) list of > >>>>> supported comparators, and can be used to enable or disable any of = <, > >>>>> >, !, and %. By > >>>>> default its value is '<|>|!|%'. > >>>>> > >>>>> In this way one can also disable ! or even <, > if these are > >>>>> considered to be > >>>>> problematic. > >>>>> > >>>>> I wonder whether one should use another character instead of "%" as= a > >>>>> wildcard > >>>>> inside the pattern string, so that no double-% confusion can arise. > >>>>> Would * > >>>>> be an alternative or would it be too confusing w.r.t. the old <ask> > >>>>> print requests? What about +? According examples (preprocessing wou= ld > >>>>> in each case > >>>>> ensure full compatibility with SQL): > >>>>> > >>>>> - %%substring% > >>>>> - %*substring* > >>>>> - %+substring+ > >>>>> > >>>>> Cheers > >>>>> > >>>>> Markus > >>>>> > >>>>> On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: > >>>>>> On Thu, 20 Dec 2007, Thomas Bleher wrote: > >>>>>>> Yesterday I needed LIKE queries for properties, so I added it to > >>>>>>> SMW (patch attached). It was surprisingly simple. > >>>>>> > >>>>>> This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. > >>>>>> > >>>>>> It would be great if later SMW could have Valgol support > >>>>>> <http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html>. > >>>>>> > >>>>>> -- Asheesh. > >>>>>> > >>>>>> P.S. In all total like seriousness, queries with LIKE support are a > >>>>>> good idea.... > >>>>>> > >>>>>> -- > >>>>>> The star of riches is shining upon you. > >>>>> > >>>>> -------------------------------------------------------------------= =2D- > >>>>>-- -- > >>>>> > >>>>>> This SF.net email is sponsored by: Microsoft > >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>>> _______________________________________________ > >>>>>> Semediawiki-devel mailing list > >>>>>> Sem...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>>>> > >>>>> -- > >>>>> Markus Kr=F6tzsch > >>>>> Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > >>>>> phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > >>>>> ma...@ai... www http://korrekt.org > >>>>> > >>>>> -------------------------------------------------------------------= =2D- > >>>>>-- -- This SF.net email is sponsored by: Microsoft > >>>>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>>>> _______________________________________________ > >>>>> Semediawiki-devel mailing list > >>>>> Sem...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > >>> > >>> ---------------------------------------------------------------------= =2D- > >>>- > >>> > >>> ---------------------------------------------------------------------= =2D- > >>>-- This SF.net email is sponsored by: Microsoft > >>> Defy all challenges. Microsoft(R) Visual Studio 2005. > >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >>> ---------------------------------------------------------------------= =2D- > >>>- > >>> > >>> _______________________________________________ > >>> Semediawiki-devel mailing list > >>> Sem...@li... > >>> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: Thomas B. <Tho...@gm...> - 2007-12-30 22:09:51
|
* Markus Kr=C3=B6tzsch <ma...@ai...> [2007-12-30 22:10]: > OK, my conclusion now was to support the following syntax: >=20 > [[property% *subs?r*]] >=20 > where ? and * represent _ and % in SQL. I think this is fine generally, but now you cannot query for a literal * or= ? anymore, AFAIK. Not a huge deal, but before, "a_b" searched for "a, followed by any char, followed by b", while "a\_b" searched for "exactly a_b". Properly escaping everything gets messy rather quickly, as \ can also be escaped to query for a literal \, so you need translations like: ? =3D> _ \? =3D> ? \\? =3D> \\_ \\\? =3D> \\? The following regular expressions work fine for me, but unfortunately they = are quite ugly: $value =3D str_replace(array('%', '_'), array('\%', '\_'), $value); // esca= pe % and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\*/', '$1%', $value); // = if there's an even number of \, change * to % $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\?/', '$1_', $value); // = ditto for ? and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\*/', '$1*', $value);= // if there's an odd number, * was escaped and should stay as is; but the = last \ is removed $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\?/', '$1?', $value);= // ditto for ? I think these should be added to SMW, so all characters can be queried. Regards, Thomas |
From: Markus <ma...@ai...> - 2008-01-02 07:37:14
|
On Sonntag, 30. Dezember 2007, Thomas Bleher wrote: > * Markus Kr=F6tzsch <ma...@ai...> [2007-12-30 22:10]: > > OK, my conclusion now was to support the following syntax: > > > > [[property% *subs?r*]] > > > > where ? and * represent _ and % in SQL. > > I think this is fine generally, but now you cannot query for a literal * = or > ? anymore, AFAIK. I would not consider this to be a major issue, given that those characters = are=20 not too common in typical application strings, and given the fact that=20 using "?" still queries for "some symbol" in that place -- it seems to be=20 very unlikely that too strings differ only in one position where the query= =20 string has a "?". So in most cases it will have the same hits anyway (yes,= =20 there are some cases that could be problematic [1] ;). Anyway, I will leave this issue at rest until any user actually complains=20 about this limitation. Regards, Markus [1] http://de.wikipedia.org/wiki/Die_drei_%3F%3F%3F > > Not a huge deal, but before, "a_b" searched for "a, followed by any char, > followed by b", while "a\_b" searched for "exactly a_b". > > Properly escaping everything gets messy rather quickly, as \ can also be > escaped to query for a literal \, so you need translations like: > > ? =3D> _ > \? =3D> ? > \\? =3D> \\_ > \\\? =3D> \\? > > The following regular expressions work fine for me, but unfortunately they > are quite ugly: > > $value =3D str_replace(array('%', '_'), array('\%', '\_'), $value); // es= cape > % and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\*/', '$1%', > $value); // if there's an even number of \, change * to % $value =3D > preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\?/', '$1_', $value); // ditto for= ? > and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\*/', '$1*', > $value); // if there's an odd number, * was escaped and should stay as is; > but the last \ is removed $value =3D > preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\?/', '$1?', $value); // ditto > for ? > > I think these should be added to SMW, so all characters can be queried. > > Regards, > Thomas =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |
From: Thomas B. <Tho...@gm...> - 2008-01-02 10:51:45
|
* Markus Kr=C3=B6tzsch <ma...@ai...> [2008-01-02 08:37]: > On Sonntag, 30. Dezember 2007, Thomas Bleher wrote: > > * Markus Kr=C3=B6tzsch <ma...@ai...> [2007-12-30 22:10]: > > > OK, my conclusion now was to support the following syntax: > > > > > > [[property% *subs?r*]] > > > > > > where ? and * represent _ and % in SQL. > > > > I think this is fine generally, but now you cannot query for a literal = * or > > ? anymore, AFAIK. >=20 > I would not consider this to be a major issue, given that those character= s are=20 > not too common in typical application strings, and given the fact that=20 > using "?" still queries for "some symbol" in that place -- it seems to be= =20 > very unlikely that too strings differ only in one position where the quer= y=20 > string has a "?". So in most cases it will have the same hits anyway (yes= ,=20 > there are some cases that could be problematic [1] ;). Agreed. > Anyway, I will leave this issue at rest until any user actually complains= =20 > about this limitation. Here I have to respectfully disagree. It seems unwise to wait until someone complains, when there is already a patch resolving the issue. Why spend more time later on when the issue can just be fixed right now? OK, the regexes where not very readable, but it doesn't really make the code more complicated. FWIW, the regexes where so ugly only because backslashes have to be escaped twice for PHPs preg_replace (so a single \ becomes \\\\). If we used ! as an escape sequence instead of \, the regexes would look like this (untested): $value =3D str_replace(array('%', '_'), array('!%', '!_'), $value); $value =3D preg_replace('/(?<!!)((?:!!)*)\*/', '$1%', $value); // if there'= s an even number of \, change * to %=20 $value =3D preg_replace('/(?<!!)((?:!!)*)\?/', '$1_', $value); // ditto for= ? and _=20 $value =3D preg_replace('/(?<!!)((?:!!)*)!\*/', '$1*', $value); // if there= 's an odd number, * was escaped and should stay as is; but the last \ is re= moved=20 $value =3D preg_replace('/(?<!!)((?:!!)*)!\?/', '$1?', $value); // ditto fo= r ? (?: ) is a subexpression for grouping, not capturing, (?<! ) is zero-width negative look-behind (i.e. we make sure that the character before our match is not !). Regards, Thomas =20 > [1] http://de.wikipedia.org/wiki/Die_drei_%3F%3F%3F >=20 > > > > Not a huge deal, but before, "a_b" searched for "a, followed by any cha= r, > > followed by b", while "a\_b" searched for "exactly a_b". > > > > Properly escaping everything gets messy rather quickly, as \ can also be > > escaped to query for a literal \, so you need translations like: > > > > ? =3D> _ > > \? =3D> ? > > \\? =3D> \\_ > > \\\? =3D> \\? > > > > The following regular expressions work fine for me, but unfortunately t= hey > > are quite ugly: > > > > $value =3D str_replace(array('%', '_'), array('\%', '\_'), $value); // = escape > > % and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\*/', '$1%', > > $value); // if there's an even number of \, change * to % $value =3D > > preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\?/', '$1_', $value); // ditto f= or ? > > and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\*/', '$1*', > > $value); // if there's an odd number, * was escaped and should stay as = is; > > but the last \ is removed $value =3D > > preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\?/', '$1?', $value); // dit= to > > for ? > > > > I think these should be added to SMW, so all characters can be queried. > > > > Regards, > > Thomas >=20 >=20 >=20 > --=20 > Markus Kr=C3=B6tzsch > Institut AIFB, Univers=C3=A4t Karlsruhe (TH), 76128 Karlsruhe > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > ma...@ai... www http://korrekt.org |
From: Markus <ma...@ai...> - 2008-01-07 16:43:05
|
On Mittwoch, 2. Januar 2008, Thomas Bleher wrote: > * Markus Kr=F6tzsch <ma...@ai...> [2008-01-02 08:37]: > > On Sonntag, 30. Dezember 2007, Thomas Bleher wrote: > > > * Markus Kr=F6tzsch <ma...@ai...> [2007-12-30 22:10]: > > > > OK, my conclusion now was to support the following syntax: > > > > > > > > [[property% *subs?r*]] > > > > > > > > where ? and * represent _ and % in SQL. > > > > > > I think this is fine generally, but now you cannot query for a literal > > > * or ? anymore, AFAIK. > > > > I would not consider this to be a major issue, given that those > > characters are not too common in typical application strings, and given > > the fact that using "?" still queries for "some symbol" in that place -- > > it seems to be very unlikely that too strings differ only in one positi= on > > where the query string has a "?". So in most cases it will have the same > > hits anyway (yes, there are some cases that could be problematic [1] ;). > > Agreed. > > > Anyway, I will leave this issue at rest until any user actually complai= ns > > about this limitation. > > Here I have to respectfully disagree. > It seems unwise to wait until someone complains, when there is already a > patch resolving the issue.=20 OK, I give in. I will see to it as soon as I find the time. =2D- Markus > Why spend more time later on when the issue=20 > can just be fixed right now? > > OK, the regexes where not very readable, but it doesn't really make the > code more complicated. > > FWIW, the regexes where so ugly only because backslashes have to be escap= ed > twice for PHPs preg_replace (so a single \ becomes \\\\). > > If we used ! as an escape sequence instead of \, the regexes would look > like this (untested): > > $value =3D str_replace(array('%', '_'), array('!%', '!_'), $value); > $value =3D preg_replace('/(?<!!)((?:!!)*)\*/', '$1%', $value); // if ther= e's > an even number of \, change * to % $value =3D > preg_replace('/(?<!!)((?:!!)*)\?/', '$1_', $value); // ditto for ? and _ > $value =3D preg_replace('/(?<!!)((?:!!)*)!\*/', '$1*', $value); // if the= re's > an odd number, * was escaped and should stay as is; but the last \ is > removed $value =3D preg_replace('/(?<!!)((?:!!)*)!\?/', '$1?', $value); // > ditto for ? > > (?: ) is a subexpression for grouping, not capturing, > (?<! ) is zero-width negative look-behind (i.e. we make sure that the > character before our match is not !). > > Regards, > Thomas > > > [1] http://de.wikipedia.org/wiki/Die_drei_%3F%3F%3F > > > > > Not a huge deal, but before, "a_b" searched for "a, followed by any > > > char, followed by b", while "a\_b" searched for "exactly a_b". > > > > > > Properly escaping everything gets messy rather quickly, as \ can also > > > be escaped to query for a literal \, so you need translations like: > > > > > > ? =3D> _ > > > \? =3D> ? > > > \\? =3D> \\_ > > > \\\? =3D> \\? > > > > > > The following regular expressions work fine for me, but unfortunately > > > they are quite ugly: > > > > > > $value =3D str_replace(array('%', '_'), array('\%', '\_'), $value); // > > > escape % and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\*/', > > > '$1%', $value); // if there's an even number of \, change * to % $val= ue > > > =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\?/', '$1_', $value); // d= itto > > > for ? and _ $value =3D preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\*/= ', > > > '$1*', $value); // if there's an odd number, * was escaped and should > > > stay as is; but the last \ is removed $value =3D > > > preg_replace('/(?<!\\\\)((?:\\\\\\\\)*)\\\\\?/', '$1?', $value); // > > > ditto for ? > > > > > > I think these should be added to SMW, so all characters can be querie= d. > > > > > > Regards, > > > Thomas > > > > -- > > Markus Kr=F6tzsch > > Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe > > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > > ma...@ai... www http://korrekt.org =2D-=20 Markus Kr=F6tzsch Institut AIFB, Univers=E4t Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 ma...@ai... www http://korrekt.org |