I need to extract tens of thousands of reference numbers from a massive file that can be opened in NotePad ++ (it is actually a Translation Memory file)
I have my search formula that may need to be refined but for the moment it is
"<([^[:space:]]+_)+[^[:space:]]+>"
This was given to me by a colleague and Im hoping it works in NotePad ++
But once it has found all occurrences of the reference numbers I want
eg ABC_123_DEFG_456 etc etc etc
I don't know how to 'extract' them into another txt.file
I don't want anything else except the found reference numbers.
Is there a way to do this please?
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Take a look at the linefilter2 plugin. I use it alot when digging through log files. You provide it a search pattern or regular expression and it pulls all the matching lines into a new tab.
Last edit: Justin Dailey 2013-05-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2013-06-01
Hello
Thanks dail8859. Someone gave me a similar link to a software called Text Crawler and that seems to have worked after a bit of effort to respect the simplified regular expression code. I'm not stuck on the second part of my operation but that will be another post
I'll check out the plugin though
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2013-06-01
Opps
Just realised that it was Fool4 on this forum that gave me the text crawler link
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By the way, you can simply use the Extract button to extract only all the reference number, when you use the regular expression for the numbers themselves (justyourregex without anything before and after it on the same line).
In the Scratch Pad window that appears, you can Save the list of found matches.
So you don't have to replace anything in any document.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello
I need to extract tens of thousands of reference numbers from a massive file that can be opened in NotePad ++ (it is actually a Translation Memory file)
I have my search formula that may need to be refined but for the moment it is
"<([^[:space:]]+_)+[^[:space:]]+>"
This was given to me by a colleague and Im hoping it works in NotePad ++
But once it has found all occurrences of the reference numbers I want
eg ABC_123_DEFG_456 etc etc etc
I don't know how to 'extract' them into another txt.file
I don't want anything else except the found reference numbers.
Is there a way to do this please?
Thanks
You could do a Find/Replace on anything surrounding and including the regular expression and replacing that with only the found regex match.
Find:
^.*?(yourregexgoeshereinbetweenparentheses).*$
Replace by:
\1
This works for one expression per line. Any other occurrence will be lost, I guess.
My suggestion is to use Text Crawler for extracting matches.
http://www.digitalvolcano.co.uk/content/textcrawler
Last edit: Fool4UAnyway 2013-05-27
Take a look at the linefilter2 plugin. I use it alot when digging through log files. You provide it a search pattern or regular expression and it pulls all the matching lines into a new tab.
Last edit: Justin Dailey 2013-05-28
Hello
Thanks dail8859. Someone gave me a similar link to a software called Text Crawler and that seems to have worked after a bit of effort to respect the simplified regular expression code. I'm not stuck on the second part of my operation but that will be another post
I'll check out the plugin though
Opps
Just realised that it was Fool4 on this forum that gave me the text crawler link
Thanks
You're welcome, SafeTex.
By the way, you can simply use the Extract button to extract only all the reference number, when you use the regular expression for the numbers themselves (justyourregex without anything before and after it on the same line).
In the Scratch Pad window that appears, you can Save the list of found matches.
So you don't have to replace anything in any document.