#1 Feature request

open
nobody
None
5
2013-01-06
2009-09-01
Xiaopang
No

Hi there,

thanks for your great tool. I've been using it for months every single day. I'd like to request some features. Sorry if this is in the wrong section, this would be clearly wrong in the bugs section.

Please add a feature that returns the number of hits per line during a search,e.g if I searched for "\" in a list of pathnames, then the result for "C:\temp\folder" should be 2. You already implemented a similar feature. When using the replace command then sfk gives a result with the hit count, however, to get one for each line, the user would have to use trace the line via batch file, pass each and every one to sfk, record the verbose and then filter it to get rid of the unwanted extra information.

Please add a feature that allows to extract filenames from lines in a text file.

Another great feature would be the ability to process and extract data between two markers and not just between two lines.

Please add a feature that allows to extract text portions that surround a single character. The text cutoff should be determined by delimiters (binary and ascii and multiple ones should be allowed), that should be specified by the user, for example:

character @
delimiters: 0D0A (binary)

This would allow to extract email addresses from a text file on seperate lines. Another example:

character: .
delimiters: \

This enables the user to extract a file name from a text file.

Anyway, thanks for your great tool and let me know what you think about my suggestions.

Best Regards,

Xiaopang

Discussion

  • stahlworks
    stahlworks
    2009-09-01

    Just a note, I read your request but as I'm currently busy I will reply in detail within one or two weeks. until then, vince

     
  • Xiaopang
    Xiaopang
    2009-09-01

    thanks for your quick reply. i'm looking forward to your answer once you have some spare time :)

     
  • stahlworks
    stahlworks
    2009-09-08

    > Please add a feature that returns the number of hits per line during a
    > search,e.g if I searched for "\" in a list of pathnames, then the result
    > for "C:\temp\folder" should be 2.

    I'll put it on my list and check if it can be done with few effort.
    but if so, it will take time.

    > Please add a feature that allows to extract filenames
    > from lines in a text file.

    SFK can use filenames from a text file already, like

    sfk filter my-file-list.txt +run "copy $qfile c:\tmp"

    the conversion from text to filenames may also be enforced
    by a chain command "+texttofilenames" or "+ttf" like

    sfk filter my-file-list.txt +ttf +run "copy $qfile c:\tmp"

    but in most cases this conversion is done implicitely.
    If you need something else, please give an example.

    > Another great feature would be the ability to process and extract data
    > between two markers and not just between two lines.

    there is an internal, experimental feature already, filter -within.
    for example,

    sfk filt mycsv.txt -ssep "\t" -form "°$col1°\t$col2\t$col3"
    -within "°*°" -replace _o_x_

    would replace "o" by "x" only within the first column of tab separated data.
    the feature isn't tested enough yet, therefore it's not listed in the help.

    > Please add a feature that allows to extract text portions that surround a
    > single character.

    please give a detailed example:

    - of some text input
    - of the syntax, how the command may look like
    - of some text output.

     
  • Xiaopang
    Xiaopang
    2009-09-14

    thanks a lot for answering so thoroughly :)

    > SFK can use filenames from a text file already

    this requires a valid list of filenames and that was exactly what i tried to create with sfk from a file that contained a lot more data per line than just filenames ;)

    > the conversion from text to filenames may also be enforced
    > by a chain command "+texttofilenames" or "+ttf"

    i tried it, but it didn't work at all. in my specific case i tried to filter an iss-installer file and to extract all filenames from it. it uses the following format in each line:

    "Source: C:\test\testfile.txt; DestName: testfile.txt; components: General"
    "Source: C:\anotherfolder\executable.exe; DestName: executable.exe; components: General"
    .
    .
    .

    I hoped to extract testfile.txt, executable.exe and all following filenames (thousands in that case) from the file, but the following syntax only outputted the same lines as posted above:

    sfk filter "testrun.iss" +ttf +run $qfile>files.txt

    > there is an internal, experimental feature already, filter -within.

    thanks for pointing that out. this seems to be an excellent feature. its only downside is that it seems to be limited to the replace command. i tried to extract the above filenames with it, but i had to use replace, even though i didn't need to replace anything. a simple display of the content between both characters would have been enough, e.g.

    sfk155 filt "testrun.iss" -within "DestName:*;" >filenames.txt

    may be i'm missing something here. in any case thanks for this option. i deem it to be the most useful for me. in the past i had to break up text artificially with sfk and insert marker text to allow processing of only special portions. would be great if you extended this feature with a simple "display text between within parameters"-switch :)

    >> Please add a feature that allows to extract text portions that surround a
    >> single character.

    > please give a detailed example:

    well, i had the same task as above in mind: extracting filenames. sometimes valid data is not placed between the always same characters, so the within command wouldn't work. it might be useful to just define a marker character from which sfk could output the text that surrounds it. all that would be needed would be delimiter characters that would tell sfk where to stop the processing, e.g.:

    sfk filt "testrun.iss" -around "." -delim _/_\_:_;_}_ >files.txt

    the fictional commandline above defines the dot as the center marker. sfk would then select all text around that marker that would be in the limits of any of the closest delimiters defined in delim.

    in my example above the following line would be processed as such: "

    Source: C:\test\testfile.txt; DestName: testfile.txt; components: General"

    sfk looks for a dot and finds it twice in the line in within "testfile.txt".
    it then parses the line starting from the first dot backwards until it finds a character that matches any specified delimiter. it then parses the line forwards for the next delimiter. it displays or processes the text with other commands between the two delimiters.

    result: testfile.txt testfile.txt

    the same procedure could also be used to extract email-addresses from binary files, such as storage containers of email clients for the mail correspondence. i'm sure there are many more uses for such a feature. i hope i explained everything in an understandable way. ansonsten ginge es auch auf deutsch ;)

    anyway, thanks again for your reply. hope to hear from you.

     
  • stahlworks
    stahlworks
    2009-09-16

    > I hoped to extract testfile.txt, executable.exe and all
    > following filenames (thousands in that case) from the file

    I see, so it should rather work this way:

    sfk filt testrun.iss -rep "_*destname: __" -rep "_; components:*__"

    the principle is to cut text to the left and right side
    of the text you're interested in.

    > "display text between within parameters" switch

    good idea. the syntax could look like

    -use "DestName: *; components"

    > sfk filt "testrun.iss" -around "." -delim _/_\_:_;_}_

    good idea as well. the -delim however might be specified
    differently, like -delim "_/\:;}_".

    thinking this further, there could be

    sfk filt "testrun.iss" -from "DestName: " -delim "_;_"

    or

    sfk filt "testrun.iss" -after "DestName: " -until ";"

    but for now, try the above -rep example.
    if you need to extract both Source and Destname,
    for example separated by a TAB character, it should work like

    sfk filt testrun.iss -spat -rep "_*Source: __"
    -rep "_; DestName: _\t_" -rep "_; components: *__"

     
  • Xiaopang
    Xiaopang
    2009-09-17

    > the principle is to cut text to the left and right side
    > of the text you're interested in.

    exactly :) i guess i overcomplicated my explanation lol

    > thinking this further, there could be

    > sfk filt "testrun.iss" -from "DestName: " -delim "_;_"

    > or

    > sfk filt "testrun.iss" -after "DestName: " -until ";"

    looks all logical, but don't forget support for multiple delimiters ;) this is most crucial for flexible text processing. also, it would be great if you supported binary values for delimiters. i ran into problems with replace in the past when i wanted to use command line unfriendly characters such as ":" with it. it confused sfk and i had to resort to binary values to overcome this problem. this is also great if you want to include line breaks as delimiters.

    > but for now, try the above -rep example.
    > if you need to extract both Source and Destname,
    > for example separated by a TAB character,

    thanks for the suggestion, but this is where my problems kick in and my idea for this feature originated from. i only need the filenames but not their paths. since the paths constantly change in length, value and depth there is no way sfk could solve this problem on its own. that's why i wrote the batch to conquer this problem. i insert marker text in front of the filenames with sfk and then trace every line to that marker and pass the marker with the filename to sfk to remove the marker again. it's very slow, but it works. took me 10 hours to process a 10MB iss-file, but better to wait that time than doing that by hand ;)

    anyway, i think that sfk could do that in seconds. i'm glad you liked my idea. i was surprised to find out that sfk doesn't already support such a feature. i read through the whole documentation several times, just because i thought i had overlooked it.