I'd like to ask You how I can arrange a pyton script in notepad++ doing a sort unique using the third element of each line ( where the emements of each lines are identifided by TAB)
There's probably a more efficient way to do it, but ...
You could add all the lines to a dictionary, with the key as the third element. Then output the dictionary values in order of the sorted key
importreregex=re.compile(r'([^\t]*)\t([^\t]*)\t(.*)$')defget_element(line):m=regex.match(line)# If the line matches our regex, then return a tuple with the # element we want, and the whole line.# For example: line = 'aaa bbb ccc'# We return: ('ccc', 'aaa bbb ccc')ifm:return(m.group(3),line)else:# If the line doesn't match our pattern, # then we just give the line back as it is, with itself as the keyreturn(line,line)# Put all the lines in a dictionary, with the relevant element # as the key. This effectively "uniques" the file by the 3rd elementlines={k:vfork,vinmap(get_element,editor)}# Create a new file for the outputnotepad.new()# Output the lines sorted by the keyforkeyinsorted(lines):editor.write(lines[key])
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I thank You very much for Your answer You has been very kind ...
I test it, it works quite well but I realize, thanks to You, that pyton in notepad++ is not the rigth solution ..
Let me ask You a Your advice:
The real file that I have to manage for the above needed start to be quite big ( near to 150000 lines )
Up to now I was using excel ( I'm quite good using formulas ) but now my poor laptop start to be in pain and it spend time to do the job and the CPU temperature start to be hight.
Then I was looking for a good solution in order to handle this need:
When I inject a bunch of new line in the page ( but probably I have to say in the future in the DB ) my need is to exsclude the new lines having the third element already inserted in the page
I was thinking also to use VBA in excell but I have no idea if I will have or not the same ( time and temperature ) problem that I have now by using formulas in excel ...
Do You think that I have to use a database ( es: Microsoft sql server ) or I can apply other solutions like writing a program in basic using the interpreter or also compiling it .. ?
I'll appreciate a Your suggestion ...
thanks in advace
Last edit: write 2013-05-22
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Do You think that I have to use a database ( es: Microsoft sql server )
it's an old thread but maybe you are still lookin for a solution.
In this case you can have a look at the SQL plugin that allows you to run SQL queries against the text: http://www.scout-soft.com/sql/
First you would need to replace all Spaces with , to get a CVS style text.
And you need to add a header line so you text would look like this:
I'm very newby in pyton script
I'd like to ask You how I can arrange a pyton script in notepad++ doing a sort unique using the third element of each line ( where the emements of each lines are identifided by TAB)
Just as example:
------------> original file
aaaa bbbb AAAAA
ccc dddd VVVVV
ddd jjjj FFFFF
ggg bbb AAAAA
hhh hhh BBB
jjjj jjj FFFFF
yyy yyyy BBB
------------ > the result is
aaaa bbbb AAAAA
hhh hhh BBB
ddd jjjj FFFFF
ccc dddd VVVVV
If it' too complex run in a single step I'd appreciate an help also
splitting the script in this steps
1) do a reverse for each string
AAAAA bbbb aaaa
VVVVV dddd ccc
FFFFF jjjj ddd
AAAAA bbb ggg
BBB hhh hhh
FFFFF jjj jjjj
BBB yyyy yyy
2) make a sort ( already available )
AAAAA bbbb aaaa
AAAAA bbb ggg
BBB hhh hhh
BBB yyyy yyy
FFFFF jjjj ddd
FFFFF jjj jjjj
VVVVV dddd ccc
3) leave only lines havig an unique first element
AAAAA bbbb aaaa
BBB hhh hhh
FFFFF jjjj ddd
VVVVV dddd ccc
4 ) do reverse again
aaaa bbbb AAAAA
hhh hhh BBB
ddd jjjj FFFFF
ccc dddd VVVVV
Thanks in advance to all af You helping me
Last edit: write 2013-05-19
There's probably a more efficient way to do it, but ...
You could add all the lines to a dictionary, with the key as the third element. Then output the dictionary values in order of the sorted key
I thank You very much for Your answer You has been very kind ...
I test it, it works quite well but I realize, thanks to You, that pyton in notepad++ is not the rigth solution ..
Let me ask You a Your advice:
The real file that I have to manage for the above needed start to be quite big ( near to 150000 lines )
Up to now I was using excel ( I'm quite good using formulas ) but now my poor laptop start to be in pain and it spend time to do the job and the CPU temperature start to be hight.
Then I was looking for a good solution in order to handle this need:
When I inject a bunch of new line in the page ( but probably I have to say in the future in the DB ) my need is to exsclude the new lines having the third element already inserted in the page
I was thinking also to use VBA in excell but I have no idea if I will have or not the same ( time and temperature ) problem that I have now by using formulas in excel ...
Do You think that I have to use a database ( es: Microsoft sql server ) or I can apply other solutions like writing a program in basic using the interpreter or also compiling it .. ?
I'll appreciate a Your suggestion ...
thanks in advace
Last edit: write 2013-05-22
it's an old thread but maybe you are still lookin for a solution.
In this case you can have a look at the SQL plugin that allows you to run SQL queries against the text:
http://www.scout-soft.com/sql/
First you would need to replace all Spaces with , to get a CVS style text.
And you need to add a header line so you text would look like this:
field1,field2,field3
aaaa,bbbb,AAAAA
ccc,dddd,VVVVV
ddd,jjjj,FFFFF
ggg,bbb,AAAAA
hhh,hhh,BBB
jjjj,jjj,FFFFF
yyy,yyyy,BBB
the query "select * from data group by field3" will result something like this:
ggg,bbb,aaaaa
yyy,yyyy,bbb
jjjj,jjj,fffff
ccc,dddd,vvvvv
hth
Heinz