How

...

to

...

recover

...

data

...

from

...

a

...

corrupted

...

tar.bz2

...

file

...

?

...

This
...
page
...
will
...
show
...
you
...
how
...
to
...
recover
...
data
...
from
...
a
...
corrupted
...
.tar.bz2
...
file,
...

and
...
NOT
...
from
...
a
...
.tar.gz
...
file,
...
since
...
gzip
...
cant
...
do
...
anything
...
with
...
corrupted
...
archives,
...
it
...
will
...
just
...
leave
...
you
...
despaired,
...
sad,
...
and
...
lonely.
...

we'll
...
assume
...
you
...
have
...
an
...
archive
...
called
...
"archive.tar.bz2",
...
for
...
which
...
you
...
want:
...
To
...
make
...
sure
...
it's
...
not
...
corrupted
...
If
...
it's
...
corrupted,
...
get
...
most
...
everything
...
that
...
is
...
still
...
valid
...
out
...
of
...
it.
...

Is

...

my

...

archive

...

corrupted

...

?

...

That

...

is

...

definitely

...

a

...

good

...

question.

...

To

...

get

...

the

...

answer,

...

bzip2

...

has

...

a

...

neat

...

option:

...

-t

...

.

Code Block
{code}bzip2 -t archive.tar.bz2{code}

This

...

will

...

tell

...

you

...

if

...

your

...

bzipped

...

archive

...

is

...

fine,

...

or

...

not.

...

If

...

it's

...

fine,

...

well,

...

enjoy

...

your

...

day

...

Otherwise,

...

read

...

on,

...

we'll

...

recover

...

it.

...

Hey,

...

my

...

archive

...

is

...

corrupted.

...

Should

...

I

...

go

...

for

...

the

...

rope

...

or

...

the

...

gun

...

?

General Information.

None. You can't

...

fix

...

archives

...

with

...

ropes,

...

nor

...

with

...

guns.

...

Instead,

...

we'll

...

unleash

...

the

...

power

...

of

...

Bzip2

...

and

...

its

...

builtin

...

Blockwise

...

CRC

...

checks.

...

Note

...

:

...

The

...

step

...

will

...

generate

...

A

...

LOT

...

of

...

files.

...

You

...

might

...

want

...

to

...

create

...

a

...

folder

...

dedicated

...

to

...

that

...

purpose,

...

and

...

copy

...

your

...

archive.tar.bz2

...

in

...

it.

...

Let's

...

assume

...

you

...

did

...

that,

...

and

...

that

...

you

...

called

...

that

...

folder

...

recovery/

...

.

...

Getting

...

the

...

data

...

blocks

...

out

...

of

...

the

...

bzip2

...

archive.

...

cd

...

into

...

recovery.

...

Here,

...

we'll

...

use

...

the

...

magic

...

bzip2recover

...

command.

...

Hey,

...

but

...

what's

...

that

...

bzip2recover

...

command

...

.

...

Hmmm.

...

Bzip2

...

compressed

...

file

...

are

...

divided

...

into

...

blocks

...

(each

...

block

...

being

...

100k,

...

200k,

...

...,

...

900k

...

bytes

...

big,

...

depending

...

on

...

what

...

compression

...

options

...

you

...

used

...

-

...

default

...

is

...

900k).

...

What

...

bzip2recover

...

does,

...

is

...

splitting

...

a

...

bzip2

...

archive

...

into

...

many

...

smaller

...

bzip2

...

archives

...

(one

...

per

...

block,

...

actually).

...

That's

...

why

...

it's

...

generating

...

soooo

...

many

...

small

...

files.

...

So,

...

here

...

we

...

go:

...

Code Block
bzip2recover archive.tar.bz2{code}

You'll

...

end

...

up

...

with

...

all

...

these

...

rec00XXXarchive.tar.bz2

...

files.

...

Lovely,

...

huh

...

?

...

Hey,

...

now

...

I

...

have

...

9k

...

smaller

...

archives,

...

but

...

it

...

didnt

...

fix

...

my

...

problem,

...

you

...

guy

...

are

...

so

...

useless

...

!

...

No,

...

i'm

...

not

...

.

...

I'm

...

not

...

the

...

one

...

with

...

a

...

corrupted

...

archive

...

<evil

...

laugh>.

...

Seriously,

...

now

...

that

...

we

...

have

...

divided

...

the

...

archive

...

into

...

smaller

...

parts,

...

we'll

...

be

...

able

...

to

...

"isolate"

...

the

...

corrupted

...

parts.

...

To

...

do

...

so,

...

we'll

...

use

...

bzip2

...

-t

...

,

...

as

...

we

...

did

...

before,

...

but

...

this

...

time

...

on

...

every

...

small

...

archive

...

file.

...

Here

...

we

...

go:

Code Block
{code}bzip2 -tv rec*.bz2 > testoutput.log 2>&1{code}

the

...

2>&1

...

stuff

...

is

...

to

...

redirect

...

the

...

stderr

...

output

...

to

...

stdout

...

(otherwise

...

we

...

wouldn't

...

see

...

which

...

file

...

is

...

corrupted.

...

Mean,

...

huh

...

?)

...

Ok,

...

now,

...

we

...

will

...

search

...

for

...

any

...

corrupted

...

small

...

archive

...

through

...

the

...

log

...

file.

Code Block
{code}grep [^ok]$ testoutput.log\{code}

(this

...

actually

...

parses

...

the

...

output

...

of

...

bzip2

...

-t

...

to

...

extract

...

only

...

files

...

which

...

don't

...

end

...

up

...

with

...

a

...

candid

...

"ok"

...

-

...

guess

...

what,

...

corrupted

...

files

...

don't

...

generate

...

this

...

kind

...

of

...

candid

...

output

...

)

Ouch,

...

i've

...

got

...

corrupted

...

blocks.

...

What

...

should

...

I

...

do

...

with

...

that

...

?

...

Note
...
:
...
We
...
are
...
here
...
restoring
...
from
...
a
...
file
...
with
...
only
...
1
...
corrupted
...
block.
...
The
...
process
...
would
...
be
...
the
...
same
...
if
...
you
...
had
...
15
...
corrupted
...
blocks,
...
but
...
you
...
had
...
to
...
restore
...
from
...
begining
...
of
...
file
...
to
...
first
...
corrupted
...
block,
...
then
...
from
...
1st
...
corrupted
...
block
...
to
...
2nd,
...
then
...
from
...
2nd
...
to
...
3d,
...
...,
...
and
...
finally
...
from
...
15th
...
to
...
the
...
end
...
of
...
file.
...
Did
...
you
...
get
...
it
...
?
...
I'm
...
sure
...
you
...
did.
...
Let's
...
focus
...
on
...
the
...
recovery
...
process,
...
now.
...
Ok,
...
let's
...
say
...
you
...
have
...
a
...
corrupted
...
block,
...
which
...
is
...
block
...
no.
...
10.
...
("rec00010archive.tar.bz2")
...

So,
...
you
...
have
...
blocks
...
1-9
...
fine,
...
then
...
a
...
dumb
...
corrupted
...
block,
...
then
...
everything
...
is
...
fine
...
again.
...

We'll
...
have
...
to
...
work
...
on
...
these
...
2
...
parts
...
separately.
...

Lets
...
create
...
2
...
directories,
...
well
...
call
...
it
...
"recovery1"
...
and
...
"recovery2".
...

Copy
...
all
...
blocks
...
from
...
block
...
1
...
to
...
block
...
9
...
(our
...
"clean"
...
blocks
...
until
...
our
...
corrupted
...
block)
...
into
...
recovery1.
...

Copy
...
all
...
blocks
...
from
...
block
...
11
...
to
...
the
...
last
...
block
...
into
...
recovery2.
...

Let's

...

tackle

...

everything

...

up

...

to

...

the

...

first

...

corrupted

...

block

...

!

...

Ok,

...

cd

...

into

...

recovery1.

...

Here,

...

we

...

have

...

the

...

beginning

...

of

...

a

...

tar

...

file,

...

nothing's

...

corrupted,

...

but

...

the

...

tar

...

file

...

is

...

not

...

complete.

...

Right.

...

That

...

makes

...

things

...

easy.

...

We

...

will

...

just

...

bunzip

...

all

...

the

...

small

...

archives

...

into

...

one

...

recovery1.tar

...

file:

...

}

Code Block

bzip2 \-dc rec*.bz2 > recovery1.tar {code}

Let's

...

have

...

a

...

look

...

at

...

the

...

result

...

.tar

...

file

...

:

...

}

Code Block

tar tf recovery1.tar {code} Wow \!

Wow ! We're

...

getting

...

a

...

list

...

of

...

file,

...

and

...

an

...

error.

...

Not

...

perfect,

...

but

...

better

...

than

...

nothing

...

!

...

We

...

have

...

here

...

all

...

the

...

files

...

which

...

were

...

into

...

the

...

original

...

archive.tar.bz2

...

until

...

the

...

first

...

corrupted

...

block.

...

We're

...

done

...

for

...

recovery1

...

!

And now,

...

last

...

but

...

not

...

least......

...

recovery2 !! cd ../recovery2

...

Hmmmm

...

trying

...

the

...

same

...

method

...

as

...

above

...

fails.

...

Why

...

that

...

?

...

Because

...

tar

...

sux.

...

Yes,

...

it

...

does

...

.

...

It

...

does

...

not

...

manage

...

to

...

find

...

a

...

correct

...

header

...

right

...

at

...

the

...

start

...

of

...

the

...

file,

...

and

...

so,

...

fails.

...

Creepy,

...

huh

...

?

...

But

...

we

...

are

...

smarter

...

than

...

Tar,

...

there's

...

not

...

much

...

that

...

a

...

little

...

of

...

Perl

...

Magic

...

can't

...

solve.

...

First,

...

let's

...

have

...

our

...

bzip2

...

small

...

archives

...

bunziped

...

into

...

a

...

"failing"

...

tar.

...

}

Code Block

bzip2 \-dc rec*.bz2 > recovery2_failing.tar {code}

As

...

I

...

told

...

you

...

right

...

before,

...

a

...

tar

...

tf

...

recovery2_failing.tar

...

would....

...

fail
What we would need to fix it, is having our recovery2_failing.tar

...

starting

...

at

...

the

...

begining

...

of

...

a

...

clean

...

header

...

block.

...

A

...

simple

...

but

...

efficient

...

perl

...

script

...

will

...

help

...

us

...

to

...

make

...

our

...

way

...

out:

...

*findtarheader.pl

...

}

Code Block

\#\!/usr/bin/perl \-w
use strict;

# 99.9% of all credits for this script go
# to Tore Skjellnes <torsk@elkraft.ntnu.no>
# who is the originator.

my $tarfile;
my $c;
my $hit;
my $header;

# if you don't get any results, outcomment the line below and
# decomment the line below the it and retry
my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),ord(" "), ord(" "),0);
\#my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),0,ord('0'),ord('0'));

die "No tar file given on command line" if $#ARGV \!= 0;

$tarfile = $ARGV[0];

open(IN,$tarfile) or die "Could not open `$tarfile': $\!";

$hit = 0;
 $\| = 1;
seek(IN,257,0) or die "Could not seek forward 257 characters in `$tarfile': $\!";
while (read(IN,$c,1) == 1)
{
     ($hit = 0, next) unless (ord($c) == $src[$hit]);
     $hit = $hit + 1;
     ( print "hit: $hit", next ) unless $hit > $#src;


       # we have a probable header at (pos - 265)!
     my $pos = tell(IN) - 265;
     seek(IN,$pos,0)
 	or (warn "Could not seek to position $pos in `$tarfile': $!", next);

      (read(IN,$header,512) == 512)
 	or (warn "Could not read 512 byte header at position $pos in `$tarfile': $!", seek(IN,$pos+265,0),next);

      my ($name, $mode, $uid, $gid, $size, $mtime, $chksum, $typeflag,
 	$linkname, $magic, $version, $uname, $gname,
 	$devmajor, $devminor, $prefix)
 	= unpack ("Z100a8a8a8Z12a12a8a1a100a6a2a32a32a8a8Z155", $header);
     $size = int $size;
     printf("%s:%s:%s:%s\n",$tarfile,($pos+1),$name,$size);

      $hit = 0;
 }

close(IN) or warn "Error closing `$tarfile': $\!";
{code}

Yeah,

...

copy/paste

...

and

...

save

...

it.

...

chmod

...

+x

...

on

...

it.

...

Now,

...

to

...

find

...

the

...

first

...

clean

...

tar

...

header

...

on

...

recovery2_failing.tar,

...

do

...

the

...

following:

...

}

Code Block

./findtarheader.pl recovery2_failing.tar {code}

This

...

will

...

generate

...

quite

...

a

...

bunch

...

of

...

output.

...

The

...

only

...

one

...

interesting

...

here

...

is

...

the

...

first

...

result.

...

You

...

can

...

then

...

do

...

:

...

}

Code Block

./findtarheader.pl recovery2_failing.tar \\| head \-n 1 {code}

to

...

get

...

only

...

the

...

first

...

occurence

...

of

...

a

...

tar

...

header.

...

You

...

get

...

something

...

like

...

:

...

recovery2_failing.tar:17185:naked_girls/pamela.jpg:157106

...

Beside

...

the

...

file

...

name

...

"naked_girls/pamela.jpg",

...

which

...

obviously

...

shows

...

that

...

the

...

tar

...

file

...

is

...

the

...

backup

...

of

...

Bharathy's

...

Home

...

directory,

...

have

...

a

...

look

...

at

...

the

...

second

...

field

...

of

...

the

...

output:

...

17185

...

:

...

this

...

is

...

the

...

offset

...

of

...

the

...

clean

...

tar

...

header,

...

from

...

the

...

begining

...

of

...

the

...

file.

...

Good

...

!

...

Now

...

we

...

have

...

the

...

offset

...

of

...

a

...

clean

...

Tar

...

Header

...

!

...

!

...

We

...

will

...

be

...

able

...

to

...

recover

...

everything

...

in

...

the

...

tar

...

file

...

starting

...

from

...

that

...

file

...

!

...

Wooohhhoooooo

...

.

...

To

...

do

...

so,

...

do

...

the

...

following

...

:

...

}

Code Block

tail \-c \+17185 recovery2_failing.tar > recovery2_working.tar {code}

Great,

...

now

...

we

...

have

...

a

...

"recovery2_working.tar"

...

tar

...

file,

...

which

...

WORKS

...

!

...

}

Code Block

tar tf recovery2_working.tar {

} _

Code Block

Yoodelihoo

...

!

...

You

...

did

...

it

...

!

...

!

Food for thoughts.

Well, right, you did it.
We can get something out of it.
For instance, thou shall not use gzip compressed archives for relatively critical stuffs, because if it ever gets corrupted, well, it's just lost. Sad story, huh ?
Second thing, tar archives are quite fine with data corrupting, at least, they are better than gzipped files.
Here, we could restore everything from a .tar.bz2 file, EXCEPT what was within the corrupted bzip block, and everything until the first clean header after the corrupted block. To sum it up: we lost one block, and any file with either its header or a part of the body in that block.
If you are saving critical stuff, you could tell BZip2 to use 100kb block-size. If your archive gets corrupted, you loose a multiple of 100kb, against a multiple of 900kb if you use 900kb block-size, which could actually make a BIG difference!

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 2

New Version 3

Key

How

to

recover

data

from

a

corrupted

tar.bz2

file

?

Is

my

archive

corrupted

?

Hey,

my

archive

is

corrupted.

Should

I

go

for

the

rope

or

the

gun

?

General Information.

Getting

the

data

blocks

out

of

the

bzip2

archive.

Hey,

now

I

have

9k

smaller

archives,

but

it

didnt

fix

my

problem,

you

guy

are

so

useless

!

Ouch,

i've

got

corrupted

blocks.

What

should

I

do

with

that

?

Let's

tackle