crgrep is really handy for looking at Word files, which sadly I have to do a lot. But I notice that it seems to treat a non-breaking space (ctrl-alt-spacebar in Word) as if it was not there, which causes words to run together and then not show up in the results. For example, if I write 34 mm (as in millimetres), I don't want the mm separated from the 34, so I put a nbsp in between; but if I grep for "34 mm" on the command line, with a normal space typed in as part of the expression, I get no results. If I grep for "34mm" with no space, I get a result. Tried looking at unicode values and stuff, but I'm not smart enough to work it out. Would be nice to be able to grep for a phrase (1) with only nbsp (2) with only regular space (3) with either. Of course, (2) is already possible. Anyway, much thanks. If this is already possible I apologise.
Darren
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Darren, I could look at handling nbsp the same as space. Leave that with me.
In the meantime, I created a Word doc with nbsp line (nb space char between 34 and mm) and another line using a normal space character and the crgrep call below matched both for me using a wildcard in the pattern:
Data: (the first line will display as '35mm' on the command line)
34 mm nbsp
34 mm space
$ crgrep '34*mm' nbspace.docx
nbspace.docx:P:34mm nbsp
nbspace.docx:P:34 mm space
Hope that helps as a workaround.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2017-11-10
Yep, that's great in some circumstances and I'll use it. Thanks agin. I use crgrep on an almost daily basis.
Thanks again
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Craig
crgrep is really handy for looking at Word files, which sadly I have to do a lot. But I notice that it seems to treat a non-breaking space (ctrl-alt-spacebar in Word) as if it was not there, which causes words to run together and then not show up in the results. For example, if I write 34 mm (as in millimetres), I don't want the mm separated from the 34, so I put a nbsp in between; but if I grep for "34 mm" on the command line, with a normal space typed in as part of the expression, I get no results. If I grep for "34mm" with no space, I get a result. Tried looking at unicode values and stuff, but I'm not smart enough to work it out. Would be nice to be able to grep for a phrase (1) with only nbsp (2) with only regular space (3) with either. Of course, (2) is already possible. Anyway, much thanks. If this is already possible I apologise.
Darren
Hi Darren, I could look at handling nbsp the same as space. Leave that with me.
In the meantime, I created a Word doc with nbsp line (nb space char between 34 and mm) and another line using a normal space character and the crgrep call below matched both for me using a wildcard in the pattern:
Data: (the first line will display as '35mm' on the command line)
34 mm nbsp
34 mm space
$ crgrep '34*mm' nbspace.docx nbspace.docx:P:34mm nbsp nbspace.docx:P:34 mm space
Hope that helps as a workaround.
Yep, that's great in some circumstances and I'll use it. Thanks agin. I use crgrep on an almost daily basis.
Thanks again