Scaffold Hunter should provide an option to generate random subsets with a user defined size. (Most preferable in the context menu of an existing subset and the global menu)
Performance:
The current implementation has a quadratic runtime behavior. The single operations are fast, but it can take some seconds for a larger dataset. This is caused by the removal from the ArrayList which takes O(n) / linear time in the worst and average case. You can avoid this overhead by swapping the element which should be removed with the last element in the list and remove the last element afterwards (since you do not depend on the ordering of the list). For this purpose i already implemented a SwappingArrayList and i just committed it to the trunk (see the util package). That means you can just call the swapAndRemove method.
The user interface:
In my opinion we should not use two dialogs that pop up after each other. Two dialogs do fragment the workflow. Furthermore, i can not read the complete title of the window in my window manager as there is not enough space. Therefore, the purpose of the text field has to be guessed (there are also window managers out there with no title at all ).
-> We should create a label in front of text field that explains the dialog (something like: "please enter the size of the random subset"). I know the already implemented rename dialog was not the best example here
-> we should remove the second dialog and integrate the naming directly in the first dialog OR just create the subset with a fixed name (the user can rename it afterwards if the name does not fit. together with the below suggestion about the default naming, this seem a solutions that is not less usable than the first one / you can decide here)
-> the size field should only accept numbers (we do not need the explanation in the field anymore if we have a label)
-> we should give a more meaningful default name: e.g.: random(parent name, size). That means the user can directly see that this subset is created randomly from parent subset with a specific size).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
implemented in r2176; patch submitted by Andrew
There are some things to fix before release:
From devel mailinglist:
Performance:
The current implementation has a quadratic runtime behavior. The single operations are fast, but it can take some seconds for a larger dataset. This is caused by the removal from the ArrayList which takes O(n) / linear time in the worst and average case. You can avoid this overhead by swapping the element which should be removed with the last element in the list and remove the last element afterwards (since you do not depend on the ordering of the list). For this purpose i already implemented a SwappingArrayList and i just committed it to the trunk (see the util package). That means you can just call the swapAndRemove method.
The user interface:
In my opinion we should not use two dialogs that pop up after each other. Two dialogs do fragment the workflow. Furthermore, i can not read the complete title of the window in my window manager as there is not enough space. Therefore, the purpose of the text field has to be guessed (there are also window managers out there with no title at all ).
-> We should create a label in front of text field that explains the dialog (something like: "please enter the size of the random subset"). I know the already implemented rename dialog was not the best example here
-> we should remove the second dialog and integrate the naming directly in the first dialog OR just create the subset with a fixed name (the user can rename it afterwards if the name does not fit. together with the below suggestion about the default naming, this seem a solutions that is not less usable than the first one / you can decide here)
-> the size field should only accept numbers (we do not need the explanation in the field anymore if we have a label)
-> we should give a more meaningful default name: e.g.: random(parent name, size). That means the user can directly see that this subset is created randomly from parent subset with a specific size).
patch was send in by Andrew "Anjenson" Zhilka and applied in rev 2179