DrPython / Discussion / Developers: Find and Autocomplete

liyinghui - 2004-04-20

I'v seen the code. And I found that :

1. you can type anything including punctuation. It's seems not very good. I think it's better something reasonable, like some letters.

2. searched word

say the line is :

This is a example.

Then I enter a 't' after 'a', like:

This is a T example

When I execute the "find and complete" menu, the searched word will not 'T' what I expected, but 'This is a T'. Because you match white space from the line beginning, (There may be a bug in regular pattern that it may be '\s', not '\S'), but I think the searched word should be 'T', it was searched backward from the current position(behind the char 'T') till reaching white space.

3. how to determin a reasonable match

Whether we only detect it with white space, or with some complex strategy. For example, punctuation. Like "word.test", if we want to match 'w', we could get "word.test" or "word". Which one should we use? May be both are accessable. So we need decide the schema of matching. What thoughts?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-21
  
  Heh. When trying to replicate your bug, I got a completely new bug! I am not sure what is going on here, and will take another look at the code.
  
  I actually wanted to include \b at the beginning and end of the re string, but python did not give me the expected results.
  
  I could use some help here, frankly.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-21
  
  I can see what you are talking about. You mean the actual match (not what the user sees) maps out to the whole line. I have replicated this bug, and I am working on it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-21
  
  This should now be fixed in cvs.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-21
  
  Some other stuff is not working properly.
  
  Ignore case does not seem to be uniform, and a more messed up file, like:
  ********
  This is an example This
  
  t this ttt
  
  this
  
  This
  
  T
  ********
  Does not work, if you use find and complete after the final T. make it a lowercase t, and it will work (although it will ignore the case).
  
  wtf?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-21
  
  Making Find And Complete Case sensitive fixes all of this.
  
  Plus, it makes more sense (as the python interpreter is also case sensitive).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-21
  
  Having the find and complete case sensitive is fine (and desireable) for Python code, but as DrPython is now capable of multi-language development, it should be fixed for those coding in other languages. A preferences option for case sensitivity should be added.
  
  I'd also suggest including punctuation in the string being searched as it allows more precise selection when using namespaces. For example, if I had an object "stone" and list of objects "stones", I would want to enter "stone." just to get references of the object, otherwise if it used only on the "stone" part for the find, the results list would also include "stones" and its references.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-21
  
  I think there is also a bug. For example, I have some text like:
  
  the help There ssTdd
  This is aThis exampTle. Th
  
  1. Only one char is used to search matches. If I enter 'Th' that I want to search all the phrase beginning with 'Th', there are 'The' and 'This'. But the result is 'his', 'here', 'help', 'he', not like what I want. I think you'd better begin from whitespace or punctuation to get the searching phrase. Becuase in front of the 'Th' is a blank, so we can determin the searching phrase is 'Th'. If we enter 'Th' after '.', we also can determin the searching phrase is 'Th', because of '.' is a punctuation.
  
  2. If I set the cursor after the letter 'T' in 'exampTle.', the candidate strings may be 'Tle.', 'This', 'There', 'Th', 'Tdd'. You can find 'Tdd' and 'Tle.' also be selected to condidate strings, althought they are not the beginning chars of a word. And I think we shoud search from the beginning of a word, not from the middle.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  If you guys could play with this, I am wondering why
  
  "\b" + word + "\S*\b" does not work with Find And Complete,
  but "\btheword\S*\b" works using the find dialog.
  
  I will work at this some more, and see what I can do.
  
  In terms of case sensitive:
  
  Python: case sensitive
  C/C++: case sensitive
  HTML?
  
  Beyond that, I am actually not inclined to do this, since at the moment, without making it case sensitive, it does not work properly. I will work at this too though, and see if it can be muddled with.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  Try it now (cvs).
  I think I fixed limodou's bug.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  Chris, what are you saying here:
  '''I'd also suggest including punctuation in the string being searched as it allows more precise selection when using namespaces. For example, if I had an object "stone" and list of objects "stones", I would want to enter "stone." just to get references of the object, otherwise if it used only on the "stone" part for the find, the results list would also include "stones" and its references.'''?
  
  Could you rephrase? Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  Whether you should use 'r' mode, just like r'\btheword\S*\b'. Try it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  it works! wooo! What is the magical 'r' mode?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  You can see the document of re module. There are the words:
  
  Regular expressions use the backslash character ("\") to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python's usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal.
  
  The solution is to use Python's raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with "r". So r"\n" is a two-character string containing "\" and "n", while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
  
  And I notice that '\b' is an escape char, which is list in 2.4.1 String literals , Python Reference Manual. So without 'r', python treat '\b' as ASCII Backspace (BS) not the raw string '\b'. So I think if there is some conflict, you need use 'r'. If there is no conflict, you don't need to use 'r'. So '\S' work well. I think so.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  I also remember an another thing that if we should only apply to ascii characters. Because some languages like Chinese, there may be not whiteblanks between words and '\b' will be not work well. So I advise only when current char is in alphabet, that user can execute the 'Find and Complete' menu. And the searching string should only contain alphabet exclude punctuation and other things like 'unicode'. Is it a bit well?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  So are you saying find and complete only works with english like languages, or that we need to modify stuff (take out the /b tags, which I could do (or make an option)) to make find and complete work with langiages like chinese?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  Yes. Because chinese doesn't need whiteblank in a paragraph, so the find and compete will not work properly. So I don't advise apply this function on chinese and other language like japanese, korea. Thus if the character in searching string is not in apphabet , you can simply let the function disabled. That's much easy.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-22
  
  Dan:
  
  You asked me to rephrase my post earlier.
  
  Limodou reported that the "searched for" word was in fact the entire of the current line before the current cursor position, and that it should be from the cursor position back to the first encountered non-letter character. This I agree with entirely. However, the suggestion was to also ignore punctuation, and I disagree with this in part, as it is very important that the full stop (".") and several other non-alphanumeric characters (e.g. "-" and "_") be included as part of the searched for word. Hence my example. All other forms of puncutation I can think of (brackets, commas etc.) should not be treated as part of the word.
  
  I'll use an example to make it clearer what I mean:
  
  If I type "count += txtDocument."and press Ctrl+Enter, I would expect find and autocomplete to search for occurences of "txtDocument." (any properties of txtDocument) and not the entire line I had just typed. This requires that the search string be the word immediately prior to the current cursor position, and that the word be allowed to include full stops.
  
  If the word was not allowed to contain punctuation, then no search would be started as there had been nothing typed after the ".". The word could potentially contain multiple "."s where nested namespaces exist e.g. "wx.lib.dialogs", and "-" and "_" character which often occur in variable names.
  
  A couple more quick examples:
  
  "myfunc(one,two,thee,fo" should search for "fo"
  "my_variable_n" should search for "my_variable_n"
  "myobject.my_vari" should search for "myobject.my_vari"
  "result = object.func(" should search for "object.func("
  
  This definitely needs to be addressed, as selecting the entire of the current line for a search as happens at the moment is not correct.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-22
  
  Oops. My last quick example should not have had the opening bracket at the end of both the typed and searched for strings. In this case, the find and autocomplete would not search for anything based upon my suggested approach.
  
  Looking at it now, I think that it would be handy to include all punctuation in the searched for word, as in the last example, searching for "object.func(" would be handy way of recalling previously typed function call parameters).
  
  Can we just say the word is everything from the current cursor position to the previous whitespace character. Should be easier to code anyway.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  I agree in part. Excluding all puncutation may does not accord with the practice. So we can include some puncutation in searching alphabet set, for example, '_', '.'. I don't think all puncutation should be included. Whether we should only think about something like vairable name, like these: a, a.b, _a, etc.
  
  "Can we just say the word is everything from the current cursor position to the previous whitespace character. Should be easier to code anyway. "
  
  This could be not right. For example:
  
  filename = 'Untitled'
  setTitle(fi
  
  If I put the cursor after 'fi', then want to execute find and complete, the searchde for string will be 'setTitle(fi' not 'fi'. But 'fi' may be what I want. If one like type whitespace after each word, your plan will be run well. But if one like to omit whitespace, then you will not get the right thing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-22
  
  Hmmm. I see what you mean. I suppose it's a double edged sword.
  
  In my example, I think it'd be handy to be able to recall the full parameter list for the last call of a function, but your example is equally valid and useful. Both are valid, and I suspect will be so in an equal number of scenarios. Suppose the ideal would be to have a preferences option to allow invalid character to be specified.
  
  I've actaully had a quick look at the code, and to get things working as I'd suggested is as simple as changing the regular expression on line 1883 of drpython.py (v2.4.5) from "\S" to "\S+\Z".
  
  If we're not going to allow certain punctuation, then it should be changed to ""[^\s()]+\Z" which would not allow "(", and ")" characters. I'd suggest leaving all other punctuation as valid in the searched for word, but any other punctuation can be specified as invalid in a word by just adding it inside the square brackets.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- liyinghui - 2004-04-22
  
  We can just define a char set which including all available character , like '_.A-Za-z0-9'. Any thing else?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-22
  
  I think we're pretty much in agreement what needs to be done, and it's very easy to do, it's just deciding on valid characers. In this case I think a preferences option would be the way to go.
  
  We can assume all alpha-numeric characters would be valid, so we could just have a preferences field for the non-alphanumerics which could be appended to the standard regex string you've suggested.
  
  Guess it'd be best leave it up to Dan to decide.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Pozmanter - 2004-04-22
  
  I don't think a prefs option will fix our problem.
  
  Take these two strings:
  
  "self.OnFunction("
  "self.OnFunction(fi"
  
  Whether or not you stop at the '(' depends on the context.
  
  Now scintilla itself (via scite) simply ignores '()'.
  
  However I think it owuld be good to play with the idea of thinking of ways to figure out context, and then do the appropriate thing.
  
  For example:
  
  If the cursor is right after a '(', then don't stop at the parenthesis. If the item immediately preceeding the cursor is alphanumeric, then stop at the parenthesis.
  
  Can you guys think of any other examples of:
  1. Characters we should stop at
  2. contexts where we should not stop at those characters.
  
  I think we could code for this, that it would not be a performance hit, and would allow us to create a context sensitive find and complete, which would be rather cool.
  
  PS Chris, your suggestion for including a '\Z' locating the current word already exists in cvs. However it is "/S*/Z".
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chris Wilson - 2004-04-22
  
  You're right. Doing this properly and having a context sensitive find and autocomplete is the only real way forward. I think we were originally trying to find a quick fix that doesn't exist.
  
  As the code to identify the word to searched for will only be executed once before the search begins, performance shouldn't be an issue. That said, if we get to a situation where we are automatically popping up context sensitive suggestions (code completion as opposed to find and autocomplete) as happens in other IDEs, it would quickly become an issue.
  
  I'll give some thought to stop characters and the situations they should be ignored. My plan is to look through some old code and find situations where I personally would use an autocomplete feature when coding. However, I'll more than likely leave this for a couple of days though, as I want to finish off working on the project manager.
  
  Is anyone still looking at the codemarks feature? If not, I've an idea how this could be implemented quite quickly, but still allowing it to be extended and used as a "to-do" list and notes feature. Unless anyone else has already started, I'll most likely do this later this week or at the weekend.
  
  Before releasing code (project manager) I tend to like to pick up and finish another small project (codemarks) then go back and check over the first set of code. It's amazing how often you pick up minor bugs, or come up with a better approach to some aspect of the coding.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Find and Autocomplete

Forums

Help

Find and Autocomplete

Find and Autocomplete

Forums

Help

Find and Autocomplete document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Find and Autocomplete