ts_extract is a tool that helps you to recover lost .ts (MPEG2 transport stream)
files. It does so by scanning the files/partitions/devices you specify at the
command line and detecting valid .ts file fragments.
It does then analyze these and see whether they can be recombined into complete
files by looking at stream PIDs (and the PAT/PMT directories) and the continuity
fields in them as well as the PCR.
This can be useful to recover lost files from a hard disk after a disk crash or
after deleting files by mistake. (For a crashed disk, first create an image
using a tool like e.g. dd_rescue.)
The program does work in three steps
It reads all the input files/partitions/disks/images/... and creates a list
of extents that represent TS file fragments. (This step typically takes the
longest; the results of it can be stored in a log file that can be reread
later, skipping over this step.)
It then uses the analyzed metadata to reconnect the fragments into (hopefully)
complete TS streams. (This does take some computation time, tens of minutes
for tens of thousands of fragments.)
The resulting streams can then optionally be copied into files.
The tool looks for a number of features in TS files to detect whether they
belong together:
The TS packet of 188 does NOT nicely fit into a 4k filesystem block; this
actually is an advantage for us, as it gives us a test for consecutive
blocks with only 1/(188/4) error probability.
The TS packets have a per PID cont counter (4bits) that's increased per
segment; this gives us a test with a 1/16 error rate; given the fact that
you find several PIDs in a block typically, this becomes even more useful.
The TS packets belong to a program that's described by a PAT and PMT;
we assume that only ONE program is recorded in a .ts file (which is the
case for e.g. E2 based VDRs); a change in the PID set or the PAT or
PMT indicates that we have a new stream. (This also applies to the the
TSID.)
One of the PIDs of a program typically carries a timestamp (PCR). This
needs to be continuous and allows us to determine how to reconnect the
fragments.
You can filter for transport stream IDs (TSID) and program IDs. If you look
for a few files only or for a series of files, this is an effective way
to limit the amount of (CPU) time needed to reconnect the fragments and
to limit the amount of files and storage required to output the results.
The state can be saved and reloaded again; this is a good time saver;
the recommended approach to rescue files is to NOT filter in the first
step, but save the state (option -l
). You can then use that as input
(option -L
) and skip over the first step.
Options -v
(verbose) and -d
allow you to get more information and
observe the inner workings of the program. You can sepcify these options
multiple time to increase the level of verbosity.
There's no man page (yet), only the -h
(help) summary and this little
document.
If you have deleted files by mistake, make sure you cut off write operations
to that filesystem as soon (or as much) as possible.
If you have a crashed disk, do the analysis/extraction on an image (created
by e.g. dd_rescue.)
Use options -l
and -L
to save time. Use -l
without filtering (options
-t
and -g
) initially and then use -L
together with filtering.
The program does not parse tables (PAT, PMT) larger than one packet currently.
The reconnection routine does only work reliably up to files with up to
13hrs of stream data (this is due to an optimization).
The program only deals with stream that contain ONE program. (The PAT
may reference multiple programs, that's fine.)
I release the under the terms of the GNU General Public License (GPL) version 2
or version 3 (at your option).