Menu

How to sort

write
2013-05-19
2014-01-26
  • write

    write - 2013-05-19

    I'm very newby in pyton script

    I'd like to ask You how I can arrange a pyton script in notepad++ doing a sort unique using the third element of each line ( where the emements of each lines are identifided by TAB)

    Just as example:

    ------------> original file

    aaaa bbbb AAAAA
    ccc dddd VVVVV
    ddd jjjj FFFFF
    ggg bbb AAAAA
    hhh hhh BBB
    jjjj jjj FFFFF
    yyy yyyy BBB

    ------------ > the result is

    aaaa bbbb AAAAA
    hhh hhh BBB
    ddd jjjj FFFFF
    ccc dddd VVVVV

    If it' too complex run in a single step I'd appreciate an help also
    splitting the script in this steps

    1) do a reverse for each string

    AAAAA bbbb aaaa
    VVVVV dddd ccc
    FFFFF jjjj ddd
    AAAAA bbb ggg
    BBB hhh hhh
    FFFFF jjj jjjj
    BBB yyyy yyy

    2) make a sort ( already available )

    AAAAA bbbb aaaa
    AAAAA bbb ggg
    BBB hhh hhh
    BBB yyyy yyy
    FFFFF jjjj ddd
    FFFFF jjj jjjj
    VVVVV dddd ccc

    3) leave only lines havig an unique first element

    AAAAA bbbb aaaa
    BBB hhh hhh
    FFFFF jjjj ddd
    VVVVV dddd ccc

    4 ) do reverse again

    aaaa bbbb AAAAA
    hhh hhh BBB
    ddd jjjj FFFFF
    ccc dddd VVVVV

    Thanks in advance to all af You helping me

     

    Last edit: write 2013-05-19
  • Dave Brotherstone

    There's probably a more efficient way to do it, but ...

    You could add all the lines to a dictionary, with the key as the third element. Then output the dictionary values in order of the sorted key

    import re
    
    regex = re.compile(r'([^\t]*)\t([^\t]*)\t(.*)$')
    
    def get_element(line):
        m = regex.match(line)
        # If the line matches our regex, then return a tuple with the 
        # element we want, and the whole line.
        # For example: line = 'aaa   bbb    ccc'
        #   We return: ('ccc', 'aaa bbb ccc')
        if m:
            return (m.group(3), line)
        else:
            # If the line doesn't match our pattern, 
            # then we just give the line back as it is, with itself as the key
            return (line, line)
    
    # Put all the lines in a dictionary, with the relevant element 
    # as the key.  This effectively "uniques" the file by the 3rd element
    lines = { k:v for k,v in map(get_element, editor) }
    
    # Create a new file for the output
    notepad.new()
    # Output the lines sorted by the key
    for key in sorted(lines):
        editor.write(lines[key])
    
     
  • write

    write - 2013-05-21

    I thank You very much for Your answer You has been very kind ...
    I test it, it works quite well but I realize, thanks to You, that pyton in notepad++ is not the rigth solution ..

    Let me ask You a Your advice:

    The real file that I have to manage for the above needed start to be quite big ( near to 150000 lines )
    Up to now I was using excel ( I'm quite good using formulas ) but now my poor laptop start to be in pain and it spend time to do the job and the CPU temperature start to be hight.

    Then I was looking for a good solution in order to handle this need:

    When I inject a bunch of new line in the page ( but probably I have to say in the future in the DB ) my need is to exsclude the new lines having the third element already inserted in the page

    I was thinking also to use VBA in excell but I have no idea if I will have or not the same ( time and temperature ) problem that I have now by using formulas in excel ...

    Do You think that I have to use a database ( es: Microsoft sql server ) or I can apply other solutions like writing a program in basic using the interpreter or also compiling it .. ?

    I'll appreciate a Your suggestion ...

    thanks in advace

     

    Last edit: write 2013-05-22
  • Heinz

    Heinz - 2014-01-26

    Do You think that I have to use a database ( es: Microsoft sql server )

    it's an old thread but maybe you are still lookin for a solution.
    In this case you can have a look at the SQL plugin that allows you to run SQL queries against the text:
    http://www.scout-soft.com/sql/

    First you would need to replace all Spaces with , to get a CVS style text.
    And you need to add a header line so you text would look like this:

    field1,field2,field3
    aaaa,bbbb,AAAAA
    ccc,dddd,VVVVV
    ddd,jjjj,FFFFF
    ggg,bbb,AAAAA
    hhh,hhh,BBB
    jjjj,jjj,FFFFF
    yyy,yyyy,BBB

    the query "select * from data group by field3" will result something like this:

    ggg,bbb,aaaaa
    yyy,yyyy,bbb
    jjjj,jjj,fffff
    ccc,dddd,vvvvv

    hth
    Heinz