Download Latest Version ts_extract-0.9.tar.bz2 (16.7 kB)
Email in envelope

Get an email when there's a new version of ts_extract

Home
Name Modified Size InfoDownloads / Week
README.md 2013-10-31 4.2 kB
ts_extract-0.9.tar.bz2 2013-10-31 16.7 kB
Totals: 2 Items   20.8 kB 2

Documentation for ts_extract

What is this?

ts_extract is a tool that helps you to recover lost .ts (MPEG2 transport stream) files. It does so by scanning the files/partitions/devices you specify at the command line and detecting valid .ts file fragments.

It does then analyze these and see whether they can be recombined into complete files by looking at stream PIDs (and the PAT/PMT directories) and the continuity fields in them as well as the PCR.

This can be useful to recover lost files from a hard disk after a disk crash or after deleting files by mistake. (For a crashed disk, first create an image using a tool like e.g. dd_rescue.)

Approach

The program does work in three steps

  1. It reads all the input files/partitions/disks/images/... and creates a list of extents that represent TS file fragments. (This step typically takes the longest; the results of it can be stored in a log file that can be reread later, skipping over this step.)

  2. It then uses the analyzed metadata to reconnect the fragments into (hopefully) complete TS streams. (This does take some computation time, tens of minutes for tens of thousands of fragments.)

  3. The resulting streams can then optionally be copied into files.

TS analysis

The tool looks for a number of features in TS files to detect whether they belong together:

  • The TS packet of 188 does NOT nicely fit into a 4k filesystem block; this actually is an advantage for us, as it gives us a test for consecutive blocks with only 1/(188/4) error probability.

  • The TS packets have a per PID cont counter (4bits) that's increased per segment; this gives us a test with a 1/16 error rate; given the fact that you find several PIDs in a block typically, this becomes even more useful.

  • The TS packets belong to a program that's described by a PAT and PMT; we assume that only ONE program is recorded in a .ts file (which is the case for e.g. E2 based VDRs); a change in the PID set or the PAT or PMT indicates that we have a new stream. (This also applies to the the TSID.)

  • One of the PIDs of a program typically carries a timestamp (PCR). This needs to be continuous and allows us to determine how to reconnect the fragments.

Features

  • You can filter for transport stream IDs (TSID) and program IDs. If you look for a few files only or for a series of files, this is an effective way to limit the amount of (CPU) time needed to reconnect the fragments and to limit the amount of files and storage required to output the results.

  • The state can be saved and reloaded again; this is a good time saver; the recommended approach to rescue files is to NOT filter in the first step, but save the state (option -l). You can then use that as input (option -L) and skip over the first step.

  • Options -v (verbose) and -d allow you to get more information and observe the inner workings of the program. You can sepcify these options multiple time to increase the level of verbosity.

  • There's no man page (yet), only the -h (help) summary and this little document.

Hints

  • If you have deleted files by mistake, make sure you cut off write operations to that filesystem as soon (or as much) as possible.

  • If you have a crashed disk, do the analysis/extraction on an image (created by e.g. dd_rescue.)

  • Use options -l and -L to save time. Use -l without filtering (options -t and -g) initially and then use -L together with filtering.

Limitations

  • The program does not parse tables (PAT, PMT) larger than one packet currently.

  • The reconnection routine does only work reliably up to files with up to 13hrs of stream data (this is due to an optimization).

  • The program only deals with stream that contain ONE program. (The PAT may reference multiple programs, that's fine.)

License

I release the under the terms of the GNU General Public License (GPL) version 2 or version 3 (at your option).

More information

Source: README.md, updated 2013-10-31