Page History

...

As I told you right before, a tar tf recovery2_failing.tar would.... fail
What we would need to fix it, is having our recovery2_failing.tar
starting from the begining of a clean header block.
A simple but efficient perl script will help us to make our way out:

*findtarheaderfind the find_tar_header.pl

Panel

title	findtarheader.pl

#!/usr/bin/perl -w
use strict;

99.9% of all credits for this script go
to Tore Skjellnes <torsk@elkraft.ntnu.no>
who is the originator.

my $tarfile;
my $c;
my $hit;
my $header;

if you don't get any results, outcomment the line below and
decomment the line below the it and retry
my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),ord(" "), ord(" "),0);
#my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),0,ord('0'),ord('0'));

die "No tar file given on command line" if $#ARGV != 0;

Wiki Markup
$tarfile = $ARGV\[0\];

open(IN,$tarfile) or die "Could not open `$tarfile': $!";

$hit = 0;
$| = 1;
seek(IN,257,0) or die "Could not seek forward 257 characters in `$tarfile': $!";
while (read(IN,$c,1) == 1){

Wiki Markup

($hit = 0, next) unless (ord($c) == $src\[$hit\]);     $hit = $hit + 1;
( print "hit: $hit", next ) unless $hit > $#src;       # we have a probable header at (pos - 265)\!
my $pos = tell(IN) - 265;
seek(IN,$pos,0) 	or (warn "Could not seek to position $pos in `$tarfile': $\!", next);
(read(IN,$header,512) == 512) 	or (warn "Could not read 512 byte header at position $pos in `$tarfile': $\!", seek(IN,$pos+265,0),next);
my ($name, $mode, $uid, $gid, $size, $mtime, $chksum, $typeflag, 	$linkname, $magic, $version, $uname, $gname, 	$devmajor, $devminor, $prefix) 	= unpack ("Z100a8a8a8Z12a12a8a1a100a6a2a32a32a8a8Z155", $header);
$size = int $size;
printf("%s:%s:%s:%s\n",$tarfile,($pos+1),$name,$size);
$hit = 0;
}

close(IN) or warn "Error closing `$tarfile': $!";

Yeah, copy/paste and save it.bz2 attached*

Yeah, bunzip2 . chmod +x on it.
Now, to find the first clean tar header on recovery2_failing.tar, do the following:

...

Addendum : Expected Minimal Data Loss
Best case (minimal loss): No file has its header within a corrupted block and its data block in others.
Wors case (maximal loss): Each corrupted block contains the header of a big file. The whole block is lost, plus that file. (hypothetically, unlimited amount of data can be lost, it could be a 100GB file....)

...

Block Size

...

Minimal loss/N corrupted blocks

...

100 kB

...

100 x N kB

...

200 kB

...

200 x N kB

...

300 kB

...

300 x N kB

...

400 kB

...

400 x N kB

...

500 kB

...

500 x N kB

...

600 kB

...

600 x N kB

...

700 kB

...

700 x N kB

...

800 kB

...

800 x N kB

...

900 x N kB

Please note that statistically,with a size of block of 'B' kB on a high amount of corrupted blocks ('N'), if the average
filesize is 'M' kB, the expected data lost is around (block size + average file size) x (number of corrupt blocks)

Code Block
Estimated average data lost over coruption: (B + (M+1)/2 ) x ( N ) kB

On a tar file within which the average file size is 200kB, bziped with 900 kB per block, 10 faulty blocks, data
loss is around (900 + 101) x 10 = 10.1 MB.
Same thing with 100kB per block, (100 + 101) x 10 = 2.02 MB.

This should be considered when deciding to build a bzip2 zipped archive, the smaller the block size is, the faster it will compress, the worse the compression will be, and the smaller data will be lost in case of corruption.

Space shortcuts

Child pages

Versions Compared

Old Version 7

New Version 8

Key