Notepad++ / Discussion / [READ ONLY] Scripting: How to sort

write - 2013-05-19

I'm very newby in pyton script

I'd like to ask You how I can arrange a pyton script in notepad++ doing a sort unique using the third element of each line ( where the emements of each lines are identifided by TAB)

Just as example:

------------> original file

aaaa bbbb AAAAA
ccc dddd VVVVV
ddd jjjj FFFFF
ggg bbb AAAAA
hhh hhh BBB
jjjj jjj FFFFF
yyy yyyy BBB

------------ > the result is

aaaa bbbb AAAAA
hhh hhh BBB
ddd jjjj FFFFF
ccc dddd VVVVV

If it' too complex run in a single step I'd appreciate an help also
splitting the script in this steps

1) do a reverse for each string

AAAAA bbbb aaaa
VVVVV dddd ccc
FFFFF jjjj ddd
AAAAA bbb ggg
BBB hhh hhh
FFFFF jjj jjjj
BBB yyyy yyy

2) make a sort ( already available )

AAAAA bbbb aaaa
AAAAA bbb ggg
BBB hhh hhh
BBB yyyy yyy
FFFFF jjjj ddd
FFFFF jjj jjjj
VVVVV dddd ccc

3) leave only lines havig an unique first element

AAAAA bbbb aaaa
BBB hhh hhh
FFFFF jjjj ddd
VVVVV dddd ccc

4 ) do reverse again

aaaa bbbb AAAAA
hhh hhh BBB
ddd jjjj FFFFF
ccc dddd VVVVV

Thanks in advance to all af You helping me

Last edit: write 2013-05-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

There's probably a more efficient way to do it, but ...

You could add all the lines to a dictionary, with the key as the third element. Then output the dictionary values in order of the sorted key

import re

regex = re.compile(r'([^\t]*)\t([^\t]*)\t(.*)$')

def get_element(line):
    m = regex.match(line)
    # If the line matches our regex, then return a tuple with the 
    # element we want, and the whole line.
    # For example: line = 'aaa   bbb    ccc'
    #   We return: ('ccc', 'aaa bbb ccc')
    if m:
        return (m.group(3), line)
    else:
        # If the line doesn't match our pattern, 
        # then we just give the line back as it is, with itself as the key
        return (line, line)

# Put all the lines in a dictionary, with the relevant element 
# as the key.  This effectively "uniques" the file by the 3rd element
lines = { k:v for k,v in map(get_element, editor) }

# Create a new file for the output
notepad.new()
# Output the lines sorted by the key
for key in sorted(lines):
    editor.write(lines[key])

write - 2013-05-21

I thank You very much for Your answer You has been very kind ...
I test it, it works quite well but I realize, thanks to You, that pyton in notepad++ is not the rigth solution ..

Let me ask You a Your advice:

The real file that I have to manage for the above needed start to be quite big ( near to 150000 lines )
Up to now I was using excel ( I'm quite good using formulas ) but now my poor laptop start to be in pain and it spend time to do the job and the CPU temperature start to be hight.

Then I was looking for a good solution in order to handle this need:

When I inject a bunch of new line in the page ( but probably I have to say in the future in the DB ) my need is to exsclude the new lines having the third element already inserted in the page

I was thinking also to use VBA in excell but I have no idea if I will have or not the same ( time and temperature ) problem that I have now by using formulas in excel ...

Do You think that I have to use a database ( es: Microsoft sql server ) or I can apply other solutions like writing a program in basic using the interpreter or also compiling it .. ?

I'll appreciate a Your suggestion ...

thanks in advace

Last edit: write 2013-05-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heinz - 2014-01-26

Do You think that I have to use a database ( es: Microsoft sql server )

it's an old thread but maybe you are still lookin for a solution.
In this case you can have a look at the SQL plugin that allows you to run SQL queries against the text:
http://www.scout-soft.com/sql/

First you would need to replace all Spaces with , to get a CVS style text.
And you need to add a header line so you text would look like this:

field1,field2,field3
aaaa,bbbb,AAAAA
ccc,dddd,VVVVV
ddd,jjjj,FFFFF
ggg,bbb,AAAAA
hhh,hhh,BBB
jjjj,jjj,FFFFF
yyy,yyyy,BBB

the query "select * from data group by field3" will result something like this:

ggg,bbb,aaaaa
yyy,yyyy,bbb
jjjj,jjj,fffff
ccc,dddd,vvvvv

hth
Heinz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to sort

Notepad++ project is moving to GitHub:

Forums

Help

How to sort

How to sort

Notepad++ project is moving to GitHub:

Forums

Help

How to sort document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to sort