Child pages
  • How to recover data from a corrupted .tar.bz2 file?

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
tail -c +17185 recovery2_failing.tar > recovery2_working.tar

This command copies everything from recovery2_failing.tar, starting at offset +17185 into recovery2_working.tar.
Great, now we have a "recovery2_working.tar" tar file, which WORKS !

...

Well, right, you did it.
We can get something out of it.
For instance, thou shall not use gzip compressed archives for relatively critical stuffs,
because if it ever gets corrupted, well, it's just lost. Sad story, huh ?
Second thing, tar archives are quite fine with data corrupting, at least,
they are better than gzipped files.
Here, we could restore everything from a .tar.bz2 file, EXCEPT what was
within the corrupted bzip block, and everything until the first clean header
after the corrupted block. To sum it up: we lost one block, and any file with
either its header or a part of the body in that block.
If you are saving critical stuff, you could tell BZip2 to use 100kb block-size.
If your archive gets corrupted, you loose a multiple of 100kb,
against a multiple of 900kb if you use 900kb block-size,
which could actually make a BIG difference!

Addendum : Expected Minimal Data Loss
Best case (minimal loss): No file has its header within a corrupted block and its data block in others.
Wors case (maximal loss): Each corrupted block contains the header of a big file. The whole block is lost, plus that file. (hypothetically, unlimited amount of data can be lost, it could be a 100GB file....)

Block Size

Minimal loss/N corrupted blocks

100 kB

100 x N kB

200 kB

200 x N kB

300 kB

300 x N kB

400 kB

400 x N kB

500 kB

500 x N kB

600 kB

600 x N kB

700 kB

700 x N kB

800 kB

800 x N kB

900 kB

900 x N kB

Please note that statistically, on a high amount of corrupted blocks, if the average filesize is M kB, the expected data lost is around (block size + average file size) x (number of corrupt blocks)