That is definitely a good question. To get the answer, bzip2 has a neat option: -t.
bzip2 -t archive.tar.bz2
This will tell you if your bzipped archive is fine, or not.
If it's fine, well, enjoy your day Otherwise, read on, we'll recover it.
cd into recovery.
Here, we'll use the magic bzip2recover command. Hey, but what's that bzip2recover command. Hmmm.
Bzip2 compressed file are divided into blocks (each block being 100k, 200k, ..., 900k bytes big,
depending on what compression options you used - default is 900k).
What bzip2recover does, is splitting a bzip2 archive into many smaller
bzip2 archives (one per block, actually). That's why it's generating soooo many small files.
So, here we go:
No, i'm not. I'm not the one with a corrupted archive <evil laugh>.
Seriously, now that we have divided the archive into smaller parts, we'll be able to "isolate" the corrupted parts.
To do so, we'll use bzip2 -t, as we did before, but this time on every small archive file.
Here we go:
bzip2 -tv rec*.bz2 > testoutput.log 2>&1
Ok, now, we will search for any corrupted small archive through the log file.
grep [^ok]$ testoutput.log
Ok, cd into recovery1.
Here, we have the beginning of a tar file, nothing's corrupted, but the tar file is not complete.
Right. That makes things easy.
We will just bunzip all the small archives into one recovery1.tar file:
bzip2 -dc rec*.bz2 > recovery1.tar
Let's have a look at the result .tar file :
tar tf recovery1.tar
Wow ! We're getting a list of file, and an error. Not perfect, but better than nothing!
We have here all the files which were into the original archive.tar.bz2 until the first corrupted block.
We're done for recovery1 !
recovery2 !! cd ../recovery2
Hmmmm trying the same method as above fails. Why that ? Because tar sux. Yes, it does.
It does not manage to find a correct header right at the start of the file, and so, fails.
Creepy, huh ? But we are smarter than Tar, and there's not much that a little of Perl Magic can't solve.
First, let's have our bzip2 small archives bunziped into a "failing" tar.
bzip2 -dc rec*.bz2 > recovery2_failing.tar
As I told you right before, a tar tf recovery2_failing.tar would.... fail
What we would need to fix it, is having our recovery2_failing.tar
starting from the begining of a clean header block.
A simple but efficient perl script will help us to make our way out: *find the find_tar_headerheaders.pl.bz2 attached*
Yeah, bunzip2 . chmod +x on it.
Now, to find the first clean tar header on recovery2_failing.tar, do the following:
This will generate quite a bunch of output. The only one interesting here is the first result. You can then do :
./findtarheader.pl recovery2_failing.tar | head -n 1
To do so, do the following :
tail -c +17185 recovery2_failing.tar > recovery2_working.tar
This command copies everything from recovery2_failing.tar, starting at offset +17185 into recovery2_working.tar.
Great, now we have a "recovery2_working.tar" tar file, which WORKS !
tar tf recovery2_working.tar