Disks

Discussed more in depth here:

Hard Disk Drives (HDDs)
Anatomy of a Disk

Disks are made up of circular surfaces called “platters.” The surface of these platters contain a large number of concentric tracks, which are each divided up into sectors divided by gaps. A sector is the smallest unit of data that can be read from or written to a disk. A cylinder is a collection of tracks on multiple surfaces that are located at the same radius. There are two read/write heads per platter.

Early HDDs needed the physical address of a sector to be specified in the cylinder-head-sector (CHS) scheme. To a modern operating system, the HDD is a collection of logical sectors.

E.g., if we have a 300 GB drive with sectors[0:585,939,499], each sector would be 512 B

These use the Logical Block Address (LBA) scheme.

$LBA = (C \times HPC + H) \times SPT + S$ , where $HPC$ is the number of heads per cylinder and $SPT$ is the number of sectors per track. The disk controller calculates the corresponding $(C, H, S)$ tuple from the $LBA$ .

Zone Bit Recording

The length of a track is proportional to its radius $r$ . The angular velocity $ω$ of all the tracks is the same (think back to high school physics). Therefore, the linear velocity $v = ω r$ of an outer track is greater than that of an inner track (again, back to high school physics). This means there can be more sectors on an outer track than an inner track.

In reality, we don’t do this for every track, but the number of sectors per track for a group of adjacent tracks. This grouping is called a zone.

Disk throughput is constant within a zone, and decreases from outer to inner tracks.

Capacity

There are a few parameters to keep track of when determining the capacity of an HDD:

Recording Density: Bits per inch (BPI) - number of bits on a 1” (inch) circumferential segment of track.

Track Density: Tracks per inch (TPI) - number of tracks on 1” (inch) radial segment of disk

Areal Density: Bits per inch $^{2}$ - recording $density \times track density$ .

Capacity Formula
$C = \frac{# bytes}{sector} \times \frac{(avg # sectors)}{track} \times \frac{# tracks}{surface} \times \frac{# surfaces}{platter} \times \frac{# platters}{disk}$
Example

Five double-sided platters; 512 B sectors; 20,000 tracks/surface; 300 sectors/track on average

What is the capacity?
$\frac{512 B}{sector} \times \frac{300 sector}{track} \times \frac{20 , 000 track}{surface} \times \frac{2 surface}{platter} \times 5 platter = 30, 720, 000, 000 B = 30.72 \times 1 0^{9} B = 30.72 GB$
Make sure you use Gigabytes|, not Gibibytes.

Access Time

Components of Access Time

Access time = seek time + rotational latency + transfer time.

Seek time: $T_{seek}$ is the time taken to move the arm to the track containing the desired sector (overhead). This depends on the previous position of the head.

Rotational latency: $T_{rot}$ is the time taken for the target sector to arrive under the head (overhead).

Transfer time: $T_{tsr}$ is the time taken to read/write the contents of the sector (productive).

Access Time Formulas

$T_{rot} (MAX) = \frac{1}{RPM} \times \frac{60 s}{1 min}$

$T_{rot} (AVG) = \frac{1}{2} \times T_{rot} (MAX)$

$T_{tsr} (AVG) = \frac{1}{RPM} \times \frac{1}{average # of sectors per track} \times \frac{60 s}{1 min}$

Example

Disk parameters: 512 B sectors, 400 sectors/track on average; 7,200 rpm; average seek time of 9 ms.

$T_{rot} (AVG) = \frac{1}{2} \times \frac{1 min}{7 , 200 rev} \times \frac{60 , 000 ms}{1 min} = 4.17 ms$

$T_{tsr} (AVG) = \frac{1 min}{7 , 200 rev} \times \frac{60 , 000 ms}{1 min} \times \frac{1 track}{400 sectors} = 0.02 ms$

$T_{access} (AVG)$ :
$T_{access} (AVG) = T_{seek} + T_{rot} + T_{tsr} = (9 + 4.17 + 0.02) ms = 13.19 ms$
Average Head Seek Distance

For a disk of $N$ tracks, the average seek distance is
$\frac{1}{3} \times (N - \frac{1}{N})$
Reliability

Mean Time To Failure (MTTF): usually in hundreds of thousands or millions of hours

Sectors contain Error Correction Codes (ECC) (think Hamming Codes for example)

Dual porting for host-failure: two machines could be connected to the same disk

Bad Sectors: The disk often comes with ad sectors, the system has to account for them when setting up the disk. It needs to remap those sectors into another space. As a result, the interface masks them.

Contiguous Sectors

Physically contiguous sectors cannot be read in one singular seek, there is data transfer overhead.

However, logically contiguous sectors are usually physically spaced apart (e.g., every other or every third)

Organizing data such that they can be read sequentially is key to the performance in disk systems.

Disk Head Scheduling

Assuming there are several requests available, there are a few schemes to schedule how the disk head moves:

First Come First Serve (FCFS)

Shortest Seek First (SSF): go to the nearest track first

SCAN: Start from track 0, handle the requests in order until the outer track, and restart from 0 again (essentially a loop from in to out). Not amazing, keeps going to the end of the track even when there are no more requests

Look: Same as SCAN, but stops once there are no more requests on the outer.

Elevator: The same as SCAN, but reverse motion after reaching the last request, and starts at the first requested track.

Priority Based (very rare): Different priorities, used in things that need real time requirements

Redundant Array of Inexpensive Disks (RAID)

Single RAID controller: externally a single disk interface, internally, works as parallel data transfer with redundancy built in.

Multiple paradigms and levels for RAID systems with a single RAID controller:

Level 0: Data striping across disks, parallel seeks and reads. Put data across disks.

Now there is a problem, if a single disk fails, and we have data across disks, how can we prevent failure and data loss?

Level 1: Disk Mirroring (solves for the problem in level 0 with reliability). Completely mirror the data on the disks.

New problem here: we have extremely high overhead, and a limited capacity to a single disk size.

Level 01 (0 + 1): Data replication across disk pairs, data striping across pairs. This way, we can replicate some disks, and stripe across others.

Level 2: Bit Striping. Don’t use. Never used. Makes no sense. A lot of XORs.

Level 3: Byte Striping. Still don’t use, a little useful compared to bit striping but still nonsensical in most cases.

Level 4: Blocks are read and written independently, one parity disk for reliability. Very nice, parallel reads, writes not exactly parallel (need to do the parity), individual reads, AND a parity disk for error checking and correcting.

Problem: the parity disk is written to every single time.

Level 5: Same as Level 4 but with parity spread across disks (one block of parity per disk). This allows us to make the writes to all disks equivalent. No bottlenecks either, because the parity is spread across. Very popular configuration.

Level 6: Same as Level 5 but with two blocks of parity across disks. Uses Reed-Solomon encoding to recover from failure. This is the de-facto standard.

Level Z: Blocks are read and written independently. Used by the ZFS file system (used on Linux, macOS, Solaris, etc.). RAID-Z1, Z2, Z3 provide one, two, or three parity blocks, respectively. Uses checksums to protect against data corruption, and dynamically adjustable block sizes.

Link to original

However, file systems are designed with the benefit of having to write with minimal latency at contiguous sectors in mind.

We want to make sure that our logically contiguous sectors are far enough apart to be quick to read, but close enough to not require additional overhead.

Notes

Explorer

Hard Disk Drives (HDDs)

Anatomy of a Disk

Zone Bit Recording

Capacity

Capacity Formula

Example

Access Time

Components of Access Time

Access Time Formulas

Example

Average Head Seek Distance

Reliability

Contiguous Sectors

Disk Head Scheduling

Redundant Array of Inexpensive Disks (RAID)

Graph View

Backlinks