btrfs RAID
These are notes describing briefly how to deal with btrfs
RAID issues.
What is btrfs
?
btrfs
?btrfs
is a modern Linux file system that offers RAID support without additional third party software tools or RAID cards. Whilst not as fast as hardware RAID, btrfs
file systems are reliable and easy manage. Remote management is also easy. StitchIt has been tested on platter btrfs
RAID 1+0. IO heavy functions read and write in parallel to leverage the advantages of the file system.
What file system am I using?
Run df -Th
to list disk usage and also file system type.
Which disks to use?
Western Digital Gold or Black disks have a long warranty and seem to be reliable. 4 TB or 6 TB disks should be adequate. Use four drives at a minimum; six if you anticipate very heavy work loads. Unless you routinely generate samples in excess of 1 TB, then likely about 12 TB of storage is adequate. Figure that each sample will transiently occupy about 4 times the final stitched data size: raw data, plus compressed raw data, plus original stitched images, plus cropped stitched image. Perhaps you keep 10 to 20 samples on there at any one time whilst they await processing and transfer to the server. On a multi-user system, disk usage will always expand to fill the available space. It can become a headache to manage a large server with dozens of samples (especially if there is a disk failure), so err on the side of less space.
What does btrfs
mean for me?
btrfs
mean for me?Not much as long as it's working well. However, heavily used platter drives start to fail after about 3 years and you'll need to keep an eye out for this. The larger your RAID array the more disks you have and so the greater the likely of a disk failure. You will at minimum need to know how to identify whether a problem exists, which is the problem disk, and how to replace it.
Your first point of information for working with btrfs
is the btrfs wiki.
Setting up notes
The following are remarks about btrfs
on Ubuntu 16.04:
mkfs.btrfs -L data /dev/<drive1> /dev/<drive2>
... (where<drive1>
,<drive2>
etc. stands for the full drives used in the RAID pool) make a RAID0 volume not a redundant RAID1 or RAID10. usemkfs.btrfs -L data -d raid1 /dev/<drive1> /dev/<drive2>
... to get RAID1 for data at creation time. Use-m raid1
to get RAID1 for metadata too (seems to be the default actually).To check the RAID type and usage, the command
btrfs filesystem usage /mnt/data
gives a nice summary.To change RAID type (after creation), use balancing operation
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/data
(here to convert data and metadata to RAID1). No need to unmount/mnt/data
. It can take quite some time, so running the command in atmux
session or in background is in general a good idea.btrfs-RAID1 just ensures that data is duplicated (2 copies), which is different from other conventional RAID1 (which keep N copies, N >= 2). Apart from that, it is quite extensible, i.e. more drives can be added to a mounted partition and you can use the machine while the new drive is getting integrated to the RAID array.
RAID10 necessitates at least 4 drives. Depending on the benchmarks.
Diagnosing problems
Often the first indication that something is wrong is that IO becomes very slow. Probably it's a good idea to keep an eye out for things before this point, but if you do notice slow IO check for a RAID problem. First of all look in dmesg
:
That will bring up any errors and also indicate on which drives the errors are happening.
You can find the serial number of the drive (say, /dev/sda
) as follows:
With the serial number you can physically ID the drive in the machine.
But what to do if there is a problem?
First start a scrub
sudo btrfs scrub start <mount point>
and wait a long time (e.g. 7H for 4x4 TB drives).Check the scrub status:
sudo btrfs scrub status <mount point>
to see the number of unrecoverable errorsFind affected files in dmesg messages:
dmesg | grep BTRFS | grep path
Check drives health with
smartctl
(installed viasudo apt install smartmontools
):sudo smartctl -t short <dev path>
orsudo smartctl -t long <dev path>
to start short or long test, in the backgroundsudo smartctl -a <dev path>
or sudosmartctl -x <dev path>
to get short or long report about drive and test outcomes
If the errors can not be recovered, if the above tests indicate a sick disk, or if drive is older than about three years (smartctl --all /dev/sdc | grep Power_On_Hours
) then you should likely change the disk. Ideally you want your PC to have an empty hot-swap SATA bay. Into this you can plug the new drive without powering down. Then:
If needed, you can wipe filesystem informations from this new drive using
sudo wipefs -a <dev path of new drive>
(CAREFUL!)Use replace command,
sudo btrfs replace start <ID> <dev new> <mount point> where <ID>
is the btrfs number for the device to replace (can be obtained usingsudo btrfs device usage <mount point>
for example)Do not use
btrfs
device delete to remove the problematic drive!btrfs
will try to re-duplicate data elsewhere, it will take ages and may not succeed depending on the actual remaining space, and this is not interruptible.Re-balance data across the RAID volume using
sudo btrfs balance start <mount point>
(use -dusage option, to avoid a full balancing that can take a very long time) and usesudo btrfs balance status <mount point>
to monitor it.More information in the link above.
Last updated