I could understand a variant of autocompletion lists that allowed arbitrary orderings and worked by eliminating non-matching choices but this retains non-matching items so the context provided by the list is not really useful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If there are a lot of correct matches (ex. this situation), certain ones should be placed first (because they are more likely to be used), however, the other items cannot necessarily always be eliminated.
(For pruning the choices, Code::Blocks rebuilds the list every x number of characters typed.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are some performance concerns here: a potentially expensive sort is being performed where it wasn't before and clients have no way to turn this off. Unnecessary work is being performed: Sorter::operator() is stripping off image IDs when they are not returned from AutoComplete::GetValue.
AutoComplete::GetValue is retrieving values from the GUI layer's list box implementation which may be expensive, depending on platform. It creates extra std::string objects which is fine for returning a single item when selected by the user but is not so great when in the middle of a comparison function.
It would allow client choice to add another setting like SCI_AUTOCSETSORT with an enumerated value indicating if it is to be assumed sorted or not. Could be extended in the future to automatically sort the values. May also be faster and minimize platform-dependencies to perform the sort on the raw data instead of retrieving it from the platform layer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Okay; I will rework the patch to attempt to address these points.
(Earlier, I was have some issues when I did not strip the image IDs, but now that I tested removing them again, the problem does not seem to have returned... ?!)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This patch utilizes the raw data for sorting. It also adds a member variable autoSort to determine if it should assume presorted, sort internally (list remains in original order), or actually sort the list. (I was not able to make sense of how to hook this up to Scintilla's settings structure, so that still needs to be implemented.)
This isn't portable:
+ int len = strlen(list);
+ char sortedList[len + 1];
../src/AutoComplete.cxx:164:25: warning: ISO C++ forbids variable length array 'sortedList' [-Wvla]
You can use a vector<char> or similar.
There needs to be an explanation of what this mode does that can go in the documentation. "sort internally (do not change the actual order)" will just cause support questions.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
An explanation should cover all the effects of the choice in full sentences, not as a short snippet. This is for people that who are coming to this new.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The code assumes that there is a terminating separator when there may not be. For example, the call may be SCI_AUTOCSHOW(1, "aardvark\nant"), not SCI_AUTOCSHOW(1, "aardvark\nant\n"). This often results in an extra garbage value in the list. It also means that the code that builds the sorted string may merge two values so "ant\naardvark" will produce a list containing an item "aardvarkant".
If possible the "Check for a logically earlier match" should be moved out of the binary search. Yes, I know there is a similar case for SC_CASEINSENSITIVEBEHAVIOUR_RESPECTCASE. At some point I'd like to replace the binary search code with a call to std::binary_search.
A patch is attached with the changes to other files required for this. Different orders have symbolic names that should be used instead of numeric literals. SC_ORDER_PERFORMSORT and SC_ORDER_UNSORTED have swapped values since it makes more sense as a narrative. SC_ORDER_UNSORTED is a little purposeless: something like SC_ORDER_PRIORITY may be better.
Fixed the terminating separator issue, incorporated the changes from your patch, and renamed SC_ORDER_UNSORTED to SC_ORDER_CUSTOM. I did not move the earlier match check because it would, in my opinion, be less clear.
The separator issue still causes problems: its not just that it still displays a garbage value but it also reads into any garbage after the argument. The \0 at the end of the argument is reached by the third ++i inside Sorter::Sorter (line 119) and then the for loop's ++i moves i into post \0 garbage.
The problem with the extra match code is that it is inside the binary search loop. This raises questions like: Can it be executed more than once? What are the effects of executing it more than once? Can it prevent the binary search from terminating? Moving it outside the binary search makes it much easier to analyze its behaviour.
Last edit: Neil Hodgson 2013-03-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That is working better but there are still some problems. There are two parallel data structures here: the visual list and the sorted index and they need to be synchronized. One problem occurs if there is an empty entry in the list - the data structures aren't synchronized and it asserts out inside std::vector. Try this call in mode SC_ORDER_CUSTOM
and (with Visual Studio 2012) there'll be an assertion:
---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Assertion Failed!
Program: C:\Windows\system32\MSVCP110D.dll
File: C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\vector
Line: 1140
Expression: vector subscript out of range
Looking for problems with desynchronization can be made simplaer by adding another assertion to check that they are, at least, the same length. Just after the call to SetList for SC_ORDER_CUSTOM:
Modifying other platform layers to remove empty entries to match wxScintilla would be more work than changing wxScintilla. Its likely SC_ORDER_CUSTOM will lead to more formatting including non-selectable headers and blank lines.
It should be possible to change wxScintilla to be the same as other platforms using wxTOKEN_RET_EMPTY in the tokenizer constructor.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Although, keeping empty entries in the sorted index can be done by switching the line: while (list[i] == ac->GetSeparator())
to: if (list[i] == ac->GetSeparator())
I could understand a variant of autocompletion lists that allowed arbitrary orderings and worked by eliminating non-matching choices but this retains non-matching items so the context provided by the list is not really useful.
If there are a lot of correct matches (ex. this situation), certain ones should be placed first (because they are more likely to be used), however, the other items cannot necessarily always be eliminated.
(For pruning the choices, Code::Blocks rebuilds the list every x number of characters typed.)
..\src\AutoComplete.cxx(127) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
cmp = lenA - lenB;
There are some performance concerns here: a potentially expensive sort is being performed where it wasn't before and clients have no way to turn this off. Unnecessary work is being performed: Sorter::operator() is stripping off image IDs when they are not returned from AutoComplete::GetValue.
AutoComplete::GetValue is retrieving values from the GUI layer's list box implementation which may be expensive, depending on platform. It creates extra std::string objects which is fine for returning a single item when selected by the user but is not so great when in the middle of a comparison function.
It would allow client choice to add another setting like SCI_AUTOCSETSORT with an enumerated value indicating if it is to be assumed sorted or not. Could be extended in the future to automatically sort the values. May also be faster and minimize platform-dependencies to perform the sort on the raw data instead of retrieving it from the platform layer.
Okay; I will rework the patch to attempt to address these points.
(Earlier, I was have some issues when I did not strip the image IDs, but now that I tested removing them again, the problem does not seem to have returned... ?!)
This patch utilizes the raw data for sorting. It also adds a member variable
autoSort
to determine if it should assume presorted, sort internally (list remains in original order), or actually sort the list. (I was not able to make sense of how to hook this up to Scintilla's settings structure, so that still needs to be implemented.)Minor optimizations of last patch.
This isn't portable:
+ int len = strlen(list);
+ char sortedList[len + 1];
../src/AutoComplete.cxx:164:25: warning: ISO C++ forbids variable length array 'sortedList' [-Wvla]
You can use a vector<char> or similar.
There needs to be an explanation of what this mode does that can go in the documentation. "sort internally (do not change the actual order)" will just cause support questions.
I am using
-Wall
and-Wextra
; what other warning flags do you use?g++ -DNDEBUG -Os -Wall -Wno-missing-braces -Wno-char-subscripts -pedantic -I ../include -I ../src -I../lexlib -fno-rtti -c ../src/AutoComplete.cxx
../src/AutoComplete.cxx: In member function 'void AutoComplete::SetList(const char*)':
../src/AutoComplete.cxx:164:25: warning: ISO C++ forbids variable length array 'sortedList' [-Wvla]
Switched to
char
array tostd::string
.An explanation should cover all the effects of the choice in full sentences, not as a short snippet. This is for people that who are coming to this new.
Are these comments descriptive enough?
I also switched the constructor to
autoSort(0)
so the default behaviour will remain the same as previously.OK. There's a large feature landing soon so this won't be committed until after that. May be a week or two.
Understood.
The code assumes that there is a terminating separator when there may not be. For example, the call may be SCI_AUTOCSHOW(1, "aardvark\nant"), not SCI_AUTOCSHOW(1, "aardvark\nant\n"). This often results in an extra garbage value in the list. It also means that the code that builds the sorted string may merge two values so "ant\naardvark" will produce a list containing an item "aardvarkant".
If possible the "Check for a logically earlier match" should be moved out of the binary search. Yes, I know there is a similar case for SC_CASEINSENSITIVEBEHAVIOUR_RESPECTCASE. At some point I'd like to replace the binary search code with a call to std::binary_search.
A patch is attached with the changes to other files required for this. Different orders have symbolic names that should be used instead of numeric literals. SC_ORDER_PERFORMSORT and SC_ORDER_UNSORTED have swapped values since it makes more sense as a narrative. SC_ORDER_UNSORTED is a little purposeless: something like SC_ORDER_PRIORITY may be better.
Fixed the terminating separator issue, incorporated the changes from your patch, and renamed
SC_ORDER_UNSORTED
toSC_ORDER_CUSTOM
. I did not move the earlier match check because it would, in my opinion, be less clear.The separator issue still causes problems: its not just that it still displays a garbage value but it also reads into any garbage after the argument. The \0 at the end of the argument is reached by the third ++i inside Sorter::Sorter (line 119) and then the for loop's ++i moves i into post \0 garbage.
The problem with the extra match code is that it is inside the binary search loop. This raises questions like: Can it be executed more than once? What are the effects of executing it more than once? Can it prevent the binary search from terminating? Moving it outside the binary search makes it much easier to analyze its behaviour.
Last edit: Neil Hodgson 2013-03-17
I see. Although I have been unable to reproduce this, I believe this patch should resolve it.
That is working better but there are still some problems. There are two parallel data structures here: the visual list and the sorted index and they need to be synchronized. One problem occurs if there is an empty entry in the list - the data structures aren't synchronized and it asserts out inside std::vector. Try this call in mode SC_ORDER_CUSTOM
and (with Visual Studio 2012) there'll be an assertion:
Looking for problems with desynchronization can be made simplaer by adding another assertion to check that they are, at least, the same length. Just after the call to SetList for SC_ORDER_CUSTOM:
That is a problem because the wxScintilla/wxSTC implementation discards empty entries:
How should Scintilla handle empty entries?
Modifying other platform layers to remove empty entries to match wxScintilla would be more work than changing wxScintilla. Its likely SC_ORDER_CUSTOM will lead to more formatting including non-selectable headers and blank lines.
It should be possible to change wxScintilla to be the same as other platforms using wxTOKEN_RET_EMPTY in the tokenizer constructor.
Although, keeping empty entries in the sorted index can be done by switching the line:
while (list[i] == ac->GetSeparator())
to:
if (list[i] == ac->GetSeparator())
Makes no difference. The first \n was consumed earlier in the loop so changing from while to if does not preserve the second \n.