How to make a drive into a file in linux

A friend of mine asked me to help him make a copy of two drives. These drives work together on a RAID 0. With a RAID 0, you have approximately twice the speed, but also twice the chance of failure.

So, understandably, they would like to have a backup of the system. But they don’t want something that has to be installed, and then configured, and all that stuff. They are not “linux people”. They want something where they can yank out the old drives and insert two new drives and have it “just work”.

Here’s the overall procedure:

  • Make a bootable Knoppix USB drive.
  • Connect a new drive to the system. This new drive will store the two source disk images.
  • Boot Knoppix and Ctrl-Alt-F1 into command-line mode.
  • Make a file from the entire 1st disk and put it on the new disk.
  • If the other drive is not connected, shut the system down and connect the other drive.
  • Make a file from the entire 2nd disk and put it on the new disk.
  • Disconnect the new drive & the USB drive, and reconnect the originals.
  • Boot it up again and watch it to make sure all is well.

For the linux guru, here are the commands I used:

fdisk -l

fdisk is usually the tool you use to set up partitions from the command-line. Using it like this with the -l switch gives you a list of all the partitions on all connected drives, whether they are mounted or not. Very useful.

dd if=/dev/sdc | gzip > drive1_250g_2011-08-19.img.gz

This is the magic linux command used to actually make the file from one whole disk. dd is a command which simply routes data. Every device in linux is a file, so it’s easy to read an entire disk. In the above case, dd is getting it’s data (if = In File) from the device which represents the entire disk: /dev/sdc. sdc means the third SCSI drive detected. On your system, it may be sda or sdd.

By default, dd outputs data to STDOUT, so here I’m piping it into the gzip command, which compresses data. If you don’t compress data, then you might end up with a file which is bigger than you expect when reading raw filesystems, due to the way filesystems lay down data with sector markers and whatnot.

Finally, I’m redirecting the output from gzip into a file, and I’m making sure that it has an extension which describes what kind of file it is. But you can call it whatever you want.

The first time I tried this command, I used bzip2 because it has better data compression routines and therefore makes smaller files. But better compression also means more CPU computations, and that alone slowed down performance to an unacceptable level. Why an unacceptable level? This server that I was working on had to be fully online by 7am. I took it offline and started work around midnight. This gave me about 6 hours to copy the drives. I gave myself an extra hour to debug and get it put back together and installed where it was in the rack, which should be plenty of time.

Using the above dd command with bzip2, I was able to copy about 9-11 MB / sec. This put the total time for one drive at about 6-8 hours. I didn’t have that much time.

Using the above dd command with gzip, I was able to copy about 40-45 MB / sec. This put the total time for one drive at about 90-105 minutes. Now I had enough time in one night to copy the drives. Additionally, the files were very small with gzip, so bzip2 would not have made a significant difference but would have taken much, much longer.

How did I know the rate of copy using dd?

kill -USR1 (pid of dd)

First, find out what the Process ID is for the running dd command. To do this, Ctrl-Alt-F2 into the second virtual terminal and run:

ps aux | grep "dd if"

The PID is the first number on the dd line. Once you’ve got that, use the above kill -USR1 command and dd will output several lines of useful information (in the virtual terminal where it’s running), including how much data has been transferred and the rate.

I set it up so that it would do this automatically every 5 minutes like this:

while kill -USR1 (pid of dd) ; do sleep 300 ; done

This while loop runs the stated command until that command returns false. The body of the while loop simply sleeps for 5 minutes. After starting this loop, Ctrl-Alt-F1 back into the first virtual terminal to watch dd‘s progress.

I was fully successful in copying the two drives to files and had the server back in operation before 5am.

Once I got home, I ordered up two new drives exactly like the old ones. Well, mostly exactly like the old ones. Once they arrived, I connected one to my SATA-to-USB drive adapter and ran this:

zcat (filename) | dd of=/dev/sdx

MAKE SURE YOU KNOW WHAT /dev/sdx is! Running this command on the wrong device will wipe out whatever is on that device, including your main linux filesystem!

How to find out the device you just plugged in?

dmesg | tail

From the output of the above command, you should be able to figure it out.

Decompressing on the fly and writing to a drive like this via a SATA-to-USB interface will be very slow. The max rate I achieved was 4M / sec. More commonly, I achieved 3.8M / sec, which is a little more than 1M every 5 minutes. That puts the total time up around 18 hours per drive. But hey, I was in no hurry. The drive finished up over night and then I did the other one. It was the same command but with a different filename.

Hope this helps.

Leave a Reply

This blog is kept spam free by WP-SpamFree.

Archives and Links