Child pages
  • How to recover data from a corrupted .tar.bz2 file?

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How

...

to

...

recover

...

data

...

from

...

a

...

corrupted

...

tar.bz2

...

file

...

?

...

This

...

page

...

will

...

show

...

you

...

how

...

to

...

recover

...

data

...

from

...

a

...

corrupted

...

.tar.bz2

...

file,

...


and

...

NOT

...

from

...

a

...

.tar.gz

...

file,

...

since

...

gzip

...

cant

...

do

...

anything

...

with

...

corrupted

...

archives,

...

it

...

will

...

just

...

leave

...

you

...

despaired,

...

sad,

...

and

...

lonely.

...


we'll

...

assume

...

you

...

have

...

an

...

archive

...

called

...

"archive.tar.bz2",

...

for

...

which

...

you

...

want:

...

  • To

...

  • make

...

  • sure

...

  • it's

...

  • not

...

  • corrupted

...

  • If

...

  • it's

...

  • corrupted,

...

  • get

...

  • most

...

  • everything

...

  • that

...

  • is

...

  • still

...

  • valid

...

  • out

...

  • of

...

  • it.

...

Is

...

my

...

archive

...

corrupted

...

?

...

That

...

is

...

definitely

...

a

...

good

...

question.

...

To

...

get

...

the

...

answer,

...

bzip2

...

has

...

a

...

neat

...

option:

...

-t

...

.

Code Block
{code}bzip2 -t archive.tar.bz2{code}

This

...

will

...

tell

...

you

...

if

...

your

...

bzipped

...

archive

...

is

...

fine,

...

or

...

not.

...


If

...

it's

...

fine,

...

well,

...

enjoy

...

your

...

day

...

(big grin) Otherwise,

...

read

...

on,

...

we'll

...

recover

...

it.

...

Hey,

...

my

...

archive

...

is

...

corrupted.

...

Should

...

I

...

go

...

for

...

the

...

rope

...

or

...

the

...

gun

...

?

General Information.

None. You can't

...

fix

...

archives

...

with

...

ropes,

...

nor

...

with

...

guns.

...

Instead,

...

we'll

...

unleash

...

the

...

power

...

of

...

Bzip2

...

and

...

its

...

builtin

...

Blockwise

...

CRC

...

checks.

...


Note

...

:

...

The

...

next

...

step

...

will

...

generate

...

A

...

LOT

...

of

...

files.

...

You

...

might

...

want

...

to

...

create

...

a

...

folder

...

dedicated

...

to

...

that

...

purpose,

...

and

...

copy

...

your

...

archive.tar.bz2

...

in

...

it.

...

Let's

...

assume

...

you

...

did

...

that,

...

and

...

that

...

you

...

called

...

that

...

folder

...

recovery/

...

.

...

Getting

...

the

...

data

...

blocks

...

out

...

of

...

the

...

bzip2

...

archive.

...

cd

...

into

...

recovery.

...


Here,

...

we'll

...

use

...

the

...

magic

...

bzip2recover

...

command.

...

Hey,

...

but

...

what's

...

that

...

bzip2recover

...

command

...

.

...

Hmmm.

...

Bzip2

...

compressed

...

file

...

are

...

divided

...

into

...

blocks

...

(each

...

block

...

being

...

100k,

...

200k,

...

...,

...

900k

...

bytes

...

big,

...

depending

...

on

...

what

...

compression

...

options

...

you

...

used

...

-

...

default

...

is

...

900k).

...

What

...

bzip2recover

...

does,

...

is

...

splitting

...

a

...

bzip2

...

archive

...

into

...

many

...

smaller

...

bzip2

...

archives

...

(one

...

per

...

block,

...

actually).

...

That's

...

why

...

it's

...

generating

...

soooo

...

many

...

small

...

files.

...


So,

...

here

...

we

...

go:

...

Code Block

bzip2recover archive.tar.bz2{code}

You'll

...

end

...

up

...

with

...

all

...

these

...

rec00XXXarchive.tar.bz2

...

files.

...

Lovely,

...

huh

...

?

...

Hey,

...

now

...

I

...

have

...

9k

...

smaller

...

archives,

...

but

...

it

...

didnt

...

fix

...

my

...

problem,

...

you

...

guy

...

are

...

so

...

useless

...

!

...

No,

...

i'm

...

not

...

.

...

I'm

...

not

...

the

...

one

...

with

...

a

...

corrupted

...

archive

...

<evil

...

laugh>.

...


Seriously,

...

now

...

that

...

we

...

have

...

divided

...

the

...

archive

...

into

...

smaller

...

parts,

...

we'll

...

be

...

able

...

to

...

"isolate"

...

the

...

corrupted

...

parts.

...


To

...

do

...

so,

...

we'll

...

use

...

bzip2

...

-t

...

,

...

as

...

we

...

did

...

before,

...

but

...

this

...

time

...

on

...

every

...

small

...

archive

...

file.

...


Here

...

we

...

go:

Code Block
{code}bzip2 -tv rec*.bz2 > testoutput.log 2>&1{code}

the

...

2>&1

...

stuff

...

is

...

to

...

redirect

...

the

...

stderr

...

output

...

to

...

stdout

...

(otherwise

...

we

...

wouldn't

...

see

...

which

...

file

...

is

...

corrupted.

...

Mean,

...

huh

...

?)

...

Ok,

...

now,

...

we

...

will

...

search

...

for

...

any

...

corrupted

...

small

...

archive

...

through

...

the

...

log

...

file.

Code Block
{code}grep [^ok]$ testoutput.log\{code}

(this

...

actually

...

parses

...

the

...

output

...

of

...

bzip2

...

-t

...

to

...

extract

...

only

...

files

...

which

...

don't

...

end

...

up

...

with

...

a

...

candid

...

"ok"

...

-

...

guess

...

what,

...

corrupted

...

files

...

don't

...

generate

...

this

...

kind

...

of

...

candid

...

output

...

(big grin) )

Ouch,

...

i've

...

got

...

corrupted

...

blocks.

...

What

...

should

...

I

...

do

...

with

...

that

...

?

...

Note

...

:

...

We

...

are

...

here

...

restoring

...

from

...

a

...

file

...

with

...

only

...

1

...

corrupted

...

block.

...

The

...

process

...

would

...

be

...

the

...

same

...

if

...

you

...

had

...

15

...

corrupted

...

blocks,

...

but

...

you

...

had

...

to

...

restore

...

from

...

begining

...

of

...

file

...

to

...

first

...

corrupted

...

block,

...

then

...

from

...

1st

...

corrupted

...

block

...

to

...

2nd,

...

then

...

from

...

2nd

...

to

...

3d,

...

...,

...

and

...

finally

...

from

...

15th

...

to

...

the

...

end

...

of

...

file.

...

Did

...

you

...

get

...

it

...

?

...

I'm

...

sure

...

you

...

did.

...

Let's

...

focus

...

on

...

the

...

recovery

...

process,

...

now.

...

Ok,

...

let's

...

say

...

you

...

have

...

a

...

corrupted

...

block,

...

which

...

is

...

block

...

no.

...

10.

...

("rec00010archive.tar.bz2")

...


So,

...

you

...

have

...

blocks

...

1-9

...

fine,

...

then

...

a

...

dumb

...

corrupted

...

block,

...

then

...

everything

...

is

...

fine

...

again.

...


We'll

...

have

...

to

...

work

...

on

...

these

...

2

...

parts

...

separately.

...


Lets

...

create

...

2

...

directories,

...

well

...

call

...

it

...

"recovery1"

...

and

...

"recovery2".

...


Copy

...

all

...

blocks

...

from

...

block

...

1

...

to

...

block

...

9

...

(our

...

"clean"

...

blocks

...

until

...

our

...

corrupted

...

block)

...

into

...

recovery1.

...


Copy

...

all

...

blocks

...

from

...

block

...

11

...

to

...

the

...

last

...

block

...

into

...

recovery2.

...

Let's

...

tackle

...

everything

...

up

...

to

...

the

...

first

...

corrupted

...

block

...

!

...

Ok,

...

cd

...

into

...

recovery1.

...


Here,

...

we

...

have

...

the

...

beginning

...

of

...

a

...

tar

...

file,

...

nothing's

...

corrupted,

...

but

...

the

...

tar

...

file

...

is

...

not

...

complete.

...

Right.

...

That

...

makes

...

things

...

easy.

...


We

...

will

...

just

...

bunzip

...

all

...

the

...

small

...

archives

...

into

...

one

...

recovery1.tar

...

file:

...

}
Code Block
bzip2 \-dc rec*.bz2 > recovery1.tar
{code}

Let's

...

have

...

a

...

look

...

at

...

the

...

result

...

.tar

...

file

...

:

...

}
Code Block
tar tf recovery1.tar
{code}
Wow \! 

Wow ! We're

...

getting

...

a

...

list

...

of

...

file,

...

and

...

an

...

error.

...

Not

...

perfect,

...

but

...

better

...

than

...

nothing

...

!

...


We

...

have

...

here

...

all

...

the

...

files

...

which

...

were

...

into

...

the

...

original

...

archive.tar.bz2

...

until

...

the

...

first

...

corrupted

...

block.

...


We're

...

done

...

for

...

recovery1

...

!

And now,

...

last

...

but

...

not

...

least......

...

recovery2 !! cd ../recovery2

...


Hmmmm

...

trying

...

the

...

same

...

method

...

as

...

above

...

fails.

...

Why

...

that

...

?

...

Because

...

tar

...

sux.

...

Yes,

...

it

...

does

...

.

...

It

...

does

...

not

...

manage

...

to

...

find

...

a

...

correct

...

header

...

right

...

at

...

the

...

start

...

of

...

the

...

file,

...

and

...

so,

...

fails.

...

Creepy,

...

huh

...

?

...

But

...

we

...

are

...

smarter

...

than

...

Tar,

...

there's

...

not

...

much

...

that

...

a

...

little

...

of

...

Perl

...

Magic

...

can't

...

solve.

...


First,

...

let's

...

have

...

our

...

bzip2

...

small

...

archives

...

bunziped

...

into

...

a

...

"failing"

...

tar.

...

}
Code Block
bzip2 \-dc rec*.bz2 > recovery2_failing.tar
{code}

As

...

I

...

told

...

you

...

right

...

before,

...

a

...

tar

...

tf

...

recovery2_failing.tar

...

would....

...

fail (big grin)
What we would need to fix it, is having our recovery2_failing.tar

...

starting

...

at

...

the

...

begining

...

of

...

a

...

clean

...

header

...

block.

...


A

...

simple

...

but

...

efficient

...

perl

...

script

...

will

...

help

...

us

...

to

...

make

...

our

...

way

...

out:

...

*findtarheader.pl

...

}
Code Block
\#\!/usr/bin/perl \-w
use strict;

# 99.9% of all credits for this script go
# to Tore Skjellnes <torsk@elkraft.ntnu.no>
# who is the originator.

my $tarfile;
my $c;
my $hit;
my $header;

# if you don't get any results, outcomment the line below and
# decomment the line below the it and retry
my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),ord(" "), ord(" "),0);
\#my @src = (ord('u'),ord('s'),ord('t'),ord('a'),ord('r'),0,ord('0'),ord('0'));

die "No tar file given on command line" if $#ARGV \!= 0;

$tarfile = $ARGV[0];

open(IN,$tarfile) or die "Could not open `$tarfile': $\!";

$hit = 0;
 $\| = 1;
seek(IN,257,0) or die "Could not seek forward 257 characters in `$tarfile': $\!";
while (read(IN,$c,1) == 1)
{
     ($hit = 0, next) unless (ord($c) == $src[$hit]);
     $hit = $hit + 1;
     ( print "hit: $hit", next ) unless $hit > $#src;


       # we have a probable header at (pos - 265)!
     my $pos = tell(IN) - 265;
     seek(IN,$pos,0)
 	or (warn "Could not seek to position $pos in `$tarfile': $!", next);

      (read(IN,$header,512) == 512)
 	or (warn "Could not read 512 byte header at position $pos in `$tarfile': $!", seek(IN,$pos+265,0),next);

      my ($name, $mode, $uid, $gid, $size, $mtime, $chksum, $typeflag,
 	$linkname, $magic, $version, $uname, $gname,
 	$devmajor, $devminor, $prefix)
 	= unpack ("Z100a8a8a8Z12a12a8a1a100a6a2a32a32a8a8Z155", $header);
     $size = int $size;
     printf("%s:%s:%s:%s\n",$tarfile,($pos+1),$name,$size);

      $hit = 0;
 }

close(IN) or warn "Error closing `$tarfile': $\!";
{code}

Yeah,

...

copy/paste

...

and

...

save

...

it.

...

chmod

...

+x

...

on

...

it.

...


Now,

...

to

...

find

...

the

...

first

...

clean

...

tar

...

header

...

on

...

recovery2_failing.tar,

...

do

...

the

...

following:

...

}
Code Block
./findtarheader.pl recovery2_failing.tar
{code}

This

...

will

...

generate

...

quite

...

a

...

bunch

...

of

...

output.

...

The

...

only

...

one

...

interesting

...

here

...

is

...

the

...

first

...

result.

...

You

...

can

...

then

...

do

...

:

...

}
Code Block
./findtarheader.pl recovery2_failing.tar \| head \-n 1
{code}

to

...

get

...

only

...

the

...

first

...

occurence

...

of

...

a

...

tar

...

header.

...


You

...

get

...

something

...

like

...

:

...


recovery2_failing.tar:17185:naked_girls/pamela.jpg:157106

...

Beside

...

the

...

file

...

name

...

"naked_girls/pamela.jpg",

...

which

...

obviously

...

shows

...

that

...

the

...

tar

...

file

...

is

...

the

...

backup

...

of

...

Bharathy's

...

Home

...

directory,

...

have

...

a

...

look

...

at

...

the

...

second

...

field

...

of

...

the

...

output:

...


17185

...

:

...

this

...

is

...

the

...

offset

...

of

...

the

...

clean

...

tar

...

header,

...

from

...

the

...

begining

...

of

...

the

...

file.

...


Good

...

!

...

Now

...

we

...

have

...

the

...

offset

...

of

...

a

...

clean

...

Tar

...

Header

...

!

...

!

...

We

...

will

...

be

...

able

...

to

...

recover

...

everything

...

in

...

the

...

tar

...

file

...

starting

...

from

...

that

...

file

...

!

...

Wooohhhoooooo

...

.

...


To

...

do

...

so,

...

do

...

the

...

following

...

:

...

}
Code Block
tail \-c \+17185 recovery2_failing.tar > recovery2_working.tar
{code}

Great,

...

now

...

we

...

have

...

a

...

"recovery2_working.tar"

...

tar

...

file,

...

which

...

WORKS

...

!

...

}
Code Block
tar tf recovery2_working.tar
{
} _
Code Block

Yoodelihoo

...

!

...

You

...

did

...

it

...

!

...

!

Food for thoughts.

Well, right, you did it.
We can get something out of it.
For instance, thou shall not use gzip compressed archives for relatively critical stuffs, because if it ever gets corrupted, well, it's just lost. Sad story, huh ?
Second thing, tar archives are quite fine with data corrupting, at least, they are better than gzipped files.
Here, we could restore everything from a .tar.bz2 file, EXCEPT what was within the corrupted bzip block, and everything until the first clean header after the corrupted block. To sum it up: we lost one block, and any file with either its header or a part of the body in that block.
If you are saving critical stuff, you could tell BZip2 to use 100kb block-size. If your archive gets corrupted, you loose a multiple of 100kb, against a multiple of 900kb if you use 900kb block-size, which could actually make a BIG difference!