lesspipe.sh, a preprocessor for less
====================================
Version: 1.82
Author : Wolfgang Friebel, DESY (Wolfgang.Friebel AT desy.de)
License: GPL
Latest version available from:
http://freshmeat.net/projects/lesspipe/ or
http://sourceforge.net/projects/lesspipe/
The development version can be checked out using svn:
svn co svn://svn.code.sf.net/p/lesspipe/code/trunk
To report bugs or make proposals to improve lesspipe please contact
the author by email.
Contents
========
1. Introduction
2. Usage
3. Required programs
4. Supported file formats
4.1 Supported compression methods
4.2 List of preprocessed file types
4.3 Conversion of files with alternate character encoding
5. Syntax highlighting
5.1.1 List of supported languages
5.1.2 Syntax highlighting alternatives
5.2 Colored directory listing
5.3 Colored listing of tar file contents
6. Displaying files with special characters in the file name
7. Examples
8. Other documentation about lesspipe
9. External links
9.1 URLs to some utilities
9.2 References
10. Contributors
1. Introduction
===============
To browse files under UNIX the excellent viewer less [1] can be used. By
setting the environment variable LESSOPEN, less can be enhanced by external
filters to become even more powerful. Most Linux distributions come already
with a "lesspipe.sh" that covers the most common situations.
The input filter for less described here is called "lesspipe.sh". It is able
to process a wide variety of file formats. It enables users to deeply inspect
archives and to display the contents of files in archives without having to
unpack them before. That means file contents can be properly interpreted even
if the files are compressed and contained in a hierarchy of archives (often
found in RPM or DEB archives containing source tarballs). The filter is easily
extensible for new formats.
The input filter which is also called "lesspipe.sh" is written in a ksh
compatible language (ksh, bash, zsh) as one of these is nearly always installed
on UNIX systems and uses comparably few resources. Otherwise an implementation
in perl for example would have been somewhat simpler to code. The code looks
less clean than it could as it was tried to make the script compatible with
a number of old shells and applications especially found on non Linux systems.
The filter does different things depending on the file format. In most cases
it is determined on the output of the "file" command [2], [6], that recognizes
lots of formats. Only in a few cases the file suffix is used to determine what
to display. Up to date file descriptions are included in the "file" package.
Maintaining a list of file formats is therefore only a matter of keeping that
package up to date.
2. Usage
========
(see also the man page lesspipe.1)
To activate lesspipe.sh the environment variable LESSOPEN has to be defined
in the following way:
LESSOPEN="|lesspipe.sh %s"; export LESSOPEN (sh like shells)
setenv LESSOPEN "|lesspipe.sh %s" (csh, tcsh)
If lesspipe.sh is not in the UNIX search path or if the wrong lesspipe.sh is
found in the search path, then the full path to lesspipe.sh should be given
in the above commands.
As lesspipe.sh is accepting only a single argument, a hierarchical list of file
names has to be separated by a non blank character. A colon is rarely found
in file names, therefore it has been chosen as the separator character. If a
file name does however contain at least one isolated colon, the equal sign =
can be used as an alternate separator character. In that case the = character
has to reoccur as the last character of the argument. At each stage in
extracting files from such a hierarchy the file type is determined. This
guarantees a correct processing and display at each stage of the filtering.
To view files in multifile archives the following command can be used:
less archive_file:contained_file or
less archive_file=contained_file= (with = as separator)
This can be used to extract single files from a multifile archive:
less archive_file:contained_file > extracted_file
For extracting files less is not required, that can be done also using:
lesspipe.sh archive_file:contained_file > extracted_file
Even a file in a multifile archive that itself is contained in yet
another archive can be viewed this way:
less super_archive:archive_file:contained_file
The script is able to extract files up to a depth of 6 where applying a
decompression algorithm counts as a separate level. In a few rare cases the
file command does not recognize the correct format (especially with nroff).
In such cases the filtering can be suppressed by a trailing colon on the file
name.
Display the last file in the file1:..:fileN chain in raw format:
Suppress input filtering: less file1:..:fileN: (append 1 colon)
Suppress decompression: less file1:..:fileN:: (append 2 colons)
Suppress syntax highlighting: less file1:..:fileN: (append 1 colon)
Syntax highlighting (see below) is only tried if less is called with -r or -R
and highlighting support was requested when generating lesspipe.sh !!!
Several environment variables can influence the behavior of lesspipe.sh.
LESSQUIET will suppress additional output not belonging to the file contents
if set to a non empty value.
LESS can be used to switch on colored less output (has to contain -r or -R).
LESSCOLORIZER can contain a syntax highlighting program different from
the code2color used by default (currently only pygmentize allowed).
Code for using LESS_ADVANCED_PREPROCESSOR is optionally generated (configure).
LESS_ADVANCED_PREPROCESSOR will switch on filtering methods for html, rtf, ps
files and files with alternate character encoding, if this variable is set.
Filtering these formats is also done if there is no LESS_ADVANCED_PREPROCESSOR
support (then this string is not contained in lesspipe.sh). Otherwise these
types of files will be shown unmodified.
3. Required programs
====================
bash or zsh or ksh (also pdksh, tested with version 5.2). Configure puts an
appropriate first line in the script
file (a version with an up to date magic file) (GNU file 4.xx recommended)
perl (for configure, code2color and tarcolor, lesspipe.sh works without it)
Standard UNIX programs like ar, cat, cut, dd, egrep, gzip, ln, ls, mkdir,
rm, sed, strings, tar, tput and further programs for special formats.
4. Supported file formats
=========================
Currently lesspipe.sh [3],[4] supports the following compression methods
and file types (i.e. the file contents gets transformed by lesspipe.sh):
4.1 Supported compression methods
---------------------------------
gzip, compress, pack requires gzip
bzip2 requires bzip2
zip requires unzip
rar requires rar or unrar
7-zip requires 7za
lzip requires lzip
lzma requires lzma (limited support only)
xz requires xz
4.2 List of preprocessed file types
-----------------------------------
tar requires GNU tar and optionally tarcolor for coloring
nroff(mandoc) requires groff
ar library requires ar
jar archive requires fastjar or unzip
rar archive requires unrar or rar
7-zip archive requires 7za
lzip archive requires lzip
shared library requires nm
executable requires strings
directory displayed using ls -lA
RPM requires GNU cpio and rpm2cpio or rpmunpack, optionally rpm
Microsoft Word requires antiword or catdoc
MS Powerpoint requires ppthtml
MS Excel requires xlhtml
Debian requires ar, gzip and tar, shows more info if dpkg is installed
html requires html2text or elinks or links or lynx or w3m
pdf requires pdftotext or pdftohtml
perl requires pod2text
rtf requires unrtf
dvi requires dvi2tty
djvu requires djvutxt
ps requires pstotext or ps2ascii (from the gs package)
mp3 requires id3v2 or mp3info or mp3info2
jpg, png, gif requires identify
iso images requires isoinfo
MacOSX archive requires lsbom
MacOS X bom requires lsbom
MacOS X plist requires plutil
cab requires cabextract (version 1.0 or above)
gpg encrypted requires gpg
perl storable requires perl (and the perl modules Storable and Data::Dumper)
perl pod requires perldoc
OASIS Opendocument text documents (used for Openoffice, Libreoffice)
requires unzip and o3tohtml or sxw2txt (distributed together
with lesspipe)
4.3 Conversion of files with alternate character encoding
---------------------------------------------------------
If the file utility reports text containing ISO-8859, UTF-8 or UTF-16 encoded
characters then the text will be transformed using iconv into the default
encoding. This does assume iconv has the right default which can be wrong
in some situations. It is checked if iconv would fail. Then the text is
displayed unmodified.
4.4 File formats currently not supported
----------------------------------------
(code contributed but commented out)
jpeg and pbm graphics files to be displayed in ASCII art. The ASCII art
library works with overprinting that does not work properly within less.
Therefore the resulting quality of the converted picture is not satisfactory.
Display of video streams using mplayer with -aadriver (again ASCII art) is
considered abuse of less and also commented out.
looking at contents of DOS formatted disks by accessing the proper device file
5. Colorizing the output
========================
ATTENTION: Syntax highlighting and other methods of colorizing the output
is only activated if the environment variable LESS is existing and contains
the option -R or -r or less is called with one of these options.
This guarantees, that instead of literal escape sequences colors are
displayed. The detection of the -r/-R presence at run time is rather
dependent on the operating system and may not work in all cases.
Putting the option in the LESS environment variable is guaranteed to
work. By installing the perl module Proc::ProcessTable the OS dependence
can be reduced as well.
The display of wrapped long lines and moving backward in a file using the
options -r/-R can give weird output. For an explanation see
http://www.greenwoodsoftware.com/less/faq.html#dashr
5.1 Syntax highlighting
-----------------------
Experimental support for syntax highlighting was added through a perl
script 'code2color' which is derived from code2html [5].
As syntax highlighting is rather resource intense it can be switched off by
appending a colon after the file name if the output was colorful. If the
wrong language was chosen for syntax highlighting then another one can be
forced by appending a colon and a suffix to the file name as follows (assuming
this is a file with perl syntax):
less config_file:.pl
That works as well to force the call of code2color for a given language.
5.1.1 List of supported languages (code2color)
----------------------------------------------
Text files for the following languages can be highlighted using code2color:
ada, asm, awk, c, c++, groff, html, xml, java, javascript, lisp, m4,
make, pascal, patch, perl, povray, python, ruby, shellscript, sql
The corresponding suffixes recognized by code2color are:
.ada .asm .inc .awk .c .h .cpp .cxx .groff .html .php .xml .java .js .lsp .m4
Makefile .pas .patch .diff .pm .pl .pod .pov .py .rb .sh .sql
5.1.2 Syntax highlighting alternatives
--------------------------------------
The enabling of syntax highlighting contains OS dependent code and is not
guaranteed to work (it was tested on Linux, Solaris, IRIX, HPUX, AIX,
MacOS X, Cygwin and FreeBSD). It is deactivated by default and not
recommended by me. It can be activated using "configure" or "make MODE=ask".
The function code2color contains code to guarantee that color codes are only
sent if less is called with one of the options -r or -R. To ensure that these
checks are always performed, alternate syntax colorizers will be called
from within code2color by setting the environment variable LESSCOLORIZER
to the name of another program. Currently only pygmentize (and code2color as
the default) is allowed. This can be changed in the first lines of code2color.
Much better syntax highlighting is obtained using the less emulation of vim:
The editor vim comes with a file less.sh, in my case located in
/usr/share/vim/vim73/macros. Assuming that file location
a function lessc (bash, zsh, ksh users)
lessc () { /usr/share/vim/vim73/macros/less.sh "$@"}
or an alias lessc (csh, tcsh users)
alias lessc /usr/share/vim/vim73/macros/less.sh
is defined and "lessc filename" is used to view the colorful file contents.
5.2 Colored Directory listing
-----------------------------
The conditions to display a colored listing are described above. Depending
on the operating system ls is then called with appropriate options to
produce colored output.
5.3 Colored listing of tar file contents
----------------------------------------
As above less has to be called with -r or -R. If also the executable tarcolor
(contained in the lesspipe tar file, see also [7]) is installed, then the
listing of tar file contents is colored in a similar fashion as directory
contents.
6. Displaying files with special characters in the file name
============================================================
Shell meta characters in file names: space (frequently used in windows
file names), the characters | & ; ( ) ` < > " ' # ~ = $ * ? [ ] or \
must be escaped by a \ when used in the shell, e.g. less a\ b.tar.gz:a\"b
will display the file a"b contained in the gzipped tar archive a b.tar.gz.
Files within an archive that do have an isolated colon in the name cannot
be displayed using the
archive_name:contained_file_name
notation. These files can be displayed using a notation with the alternate
separator character = as follows:
archive_name=contained_file_name=
Please note the trailing = which is required.
7. Examples
===========
As a typical usage case it is shown how one could display the man page
"file.man" found in the Fedora10 RPM source archive file-4.26-3.fc10.src.rpm
The less command enhanced with the lesspipe.sh filter
less file-4.26-3.fc10.src.rpm
yields the following output
...
-rw-r--r-- 1 root root 584803 Sep 15 16:29 file-4.26.tar.gz
-rw-r--r-- 1 root root 17124 Oct 16 13:01 file.spec
Then the command
less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz
produces the output
...
-rw-rw-r-- 10080/10080 16027 2008-08-30 12:01:41 file-4.26/doc/Makefile.in
-rw-rw-r-- 10080/10080 16097 2008-03-07 16:00:07 file-4.26/doc/file.man
-rw-rw-r-- 10080/10080 16943 2008-08-30 11:50:20 file-4.26/doc/magic.man
...
The desired man page can finally be viewed with
less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man
The subcomponents of the argument to less were easily obtained by cut and
paste using information contained in the previous lines of output.
Care has been taken to display the subcomponents already in the way
required by lesspipe, so that in most cases double clicking will select it.
If the nroff sources should have been displayed instead, appending
another colon at the end of the argument would have done the job:
less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man:
If the man page was compressed (e.g. as file.man.gz) it would have been
uncompressed anyway. To also disallow uncompressing the source file.man.gz
a second colon would have to be appended to the argument.
Even extracting single files from an archive is possible, like with
less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/src/file.c > file.c
Files with binary contents can be extracted as well:
less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:: > file-4.26.tar.gz
Here the two colons at the end of the argument are required to suppress the
unzipping of the resulting file and to extract the tar file instead of
interpreting it.
Another interesting example is to get the dominating colors of a picture,
that contains a diagram with a few colors only. The command
less diagram.png
does produce a lot of information, among others
...
Histogram:
720: ( 0, 0,127) #00007F
3032: (127,127,127) grey50
18935: ( 0, 0,255) blue
21480: ( 0,255, 0) lime
21041: ( 0,255,255) cyan
8719: (255, 0, 0) red
14476: (255, 0,255) magenta
8822: (255,183, 0) #FFB700
13608: (255,255, 0) yellow
49167: (255,255,255) white
...
Other interesting examples are the inspection of Java's .jar files or Debian
package contents without unpacking the files and even without having java
installed or without working necessarily on a Debian system.
The contrib directory does contain a less wrapper (a bash/zsh function) that
can be used to display URLs using less. This allows to pass the URL contents
through lesspipe as if it would be a local file.
8. Other documentation about lesspipe
=====================================
http://ref.cern.ch/CERN/CNL/2002/001/unix-less/
http://www.linux-magazine.com/issue/21/lesspipe.pdf
in bash cookbook (Ch. 8.15) by Carl Albing, Cameron Newham, J. P. Vossen
http://carloscosta.org/2008/07/05/how-to-get-more-from-less/
Documentation in german:
german.txt (distributed with lesspipe, not updated)
http://www.linux-magazin.de/Heft-Abo/Ausgaben/2001/01/Bessere-Sicht
http://www.linux-user.de/ausgabe/2002/04/060-ootb/lesspipe-1.html
9. External links
=================
(last checked: Jan 10 2013):
9.1 URLs to some utilities
--------------------------
antiword http://www.winfield.demon.nl/
html2text http://www.mbayer.de/html2text/
cabextract http://www.cabextract.org.uk/
7za https://sourceforge.net/projects/p7zip/
lzip http://download.savannah.gnu.org/releases/lzip/
dvi2tty http://www.ctan.org/tex-archive/dviware/dvi2tty/
unrtf http://ftp.gnu.org/gnu/unrtf/
id3v2 http://id3v2.sourceforge.net/
9.2 References
--------------
[1] http://www.greenwoodsoftware.com/less/ (less)
[2] ftp://ftp.astron.com/pub/file/ (file)
[3] http://freshmeat.net/projects/lesspipe/
[4] http://sourceforge.net/projects/lesspipe/
[5] http://www.palfrader.org/code2html/ (code2html)
[6] http://www.darwinsys.com/file/ (file)
[7] https://github.com/msabramo/tarcolor (tarcolor)
10. Contributors
================
The script lesspipe.sh is constantly enhanced thanks to suggestions from
users. Among the additions to lesspipe.sh is the code to browse the ASCII
contents of Word or Openoffice files, to show characteristics of mp3 files
or to decode MacOS X formats.
Thanks to (in alphabetical order):
Marc Abramowitz: allow for color output when ls or tar commands are used
James Ahlborn: do not interpret .xml files as html
Sören Andersen: PPD files colorization requested
Andrew Barnert: shell syntax fix
Peter D. Barnes, Jr.: plist files for Mac OS X
Eduard Bloch: proposed support for ISO images
Mathieu Bouillaguet: add support for xz compression
Florian Cramer: MS Word, Openoffice support (o3read), ASCIIart, DjVu support
Philippe Defert: unattended installation
Antonio Diaz Diaz: proposed support for lzip
Bastian Fuchs: Issues using bash vs. sh
Carl Greco: enhanced output for .deb files
Stephan Hegel: suggested better 7za support
Michel Hermier: support for DESTDIR in the Makefile
Tobias Hoffmann: bug fix introduced in v 1.72
Christian Höltje: suggested to use LESSCOLORIZER like in the Gentoo distro
Jürgen Kahnert: display debian files without dpkg
Sebastian Kayser: suggested a less wrapper to display URLs
Ben Kibbey: works on FreeBSD
Peter Kostka: mktemp on MacOS fixes
Heinrich Kuettler: formatting, html via lynx
Vincent Lefèvre: runtime checks for shell, many enhancements, provided sxw2txt,
determine options for 'file' at runtime, display text in the proper encoding
David Leverton: detect helper programs at runtime
Jay Levitt: suggested to use enscript for highlighting support (see 4.2 above)
Vladimir Linek: inspired me to add ps and dvi support
Istvan Marko: speedup of the procedure
Markus Meyer: improved mp3 handling
Remi Mommsen: Mac OS X support
Derek B. Noonburg: PDF files support
Martin Otte: mktemp on MacOS fixes
Jim Pryor: many enhancements, bug fixes, restructuring of code, tar detection
Slaven Rezic: Cygwin support, bug fixes
Daniel Risacher: gpg support
Jens Schleusener: ksh syntax fixes
Ken Teague?: support more versions of file command
Paul Townsend: improved zip support for Solaris, bug fixes in configure
Petr Uzel: detect helper programs at runtime
Chelban Vasile: trap command not working under /bin/sh
Götz Waschk: suggested lzma support
Michael Wiedmann: Debian packages support
Dale Wijnand: Proposed the suppression of informal output