[PLUG] Disk IO in Linux?

Steve Bonds 1s7k8uhcd001 at sneakemail.com
Thu Oct 10 01:24:46 UTC 2002


Disclaimer:  Some of this is based on general UNIX architecture and not
specifically on Linux.  I look to the others on the list to provide
gentle correction where needed.  ;-)

On Wed, 9 Oct 2002, Alex Daniloff wrote:

> Linux kernel uses unbuffered IO file access 
> if a program (e.g. DB engine) reads and writes 
> to the raw partition/device.

> I assume it's because DB engine handles its
> data transaction on raw device/partition 
> without going through the kernel IO calls.

No, the DB will still need to use kernel I/O calls.  (I.e read() or
write() system calls.)  Without those calls the DB would need to know how
to talk directly to the hardware, and that would defeat much of the
purpose of the OS.  ;-)  I.e. your database would need to know your disk
SCSI ID, drive type, SCSI card, etc. etc.

> Does all above apply to the case if drives support DMA?

This happens below the level at which an database interfaces with the
OS.  Databases have no knowledge of whether your drives use DMA or
not.  You can have both raw and block divices on any type of hard drive,
DMA or no DMA.

Here's a description of the data flow from your database to your disk, on
a sample disk write.  For you nitpickers, keep in mind this is appreviated
for clarity and is not 100% technically perfect.  ;-)

1) database
2) system call (i.e. "write()")
3) OS kernel system call interface
4) OS filesystem driver (unless DB is on raw/block disk, then skip this)
5) OS buffer cache (unless DB is on raw disk, then skip this)
6) OS block device driver [not sure if this is before or after RAID/LVM]
7) LVM driver (if used)
8) RAID device driver (if used)
9) SCSI/IDE driver
10) SCSI/IDE hardware (this is where DMA comes in)
11) hard drive firmware
12) bits on a platter

> Why it's nessesary to bind raw devices to block devices
> ( bind /dev/raw/raw1 to /dev/hdb1 )
> if a database engine reads and writes to the raw partition
> without such binding?

Linux doesn't bind a particular raw device to a particular block
device.  Most other unixes use something like /dev/dsk/<block> +
/dev/rdsk/<char> where the <block> and <char> device names are the same.

I think the actual command used is "raw /dev/raw/raw1 /dev/hdb1" based on
your example.

> Will this binding improve performance of the DB engine or
> this needs to be done only in order to read from Linux what
> has been written on a raw device?

This is needed to tell linux unambiguously where to find the data for that
raw device.  ;-)  Look at "man raw" for more info.  It does not appear to
be optional, so it's not a performance question.

My take on the raw interface is that it's kind of klugy.  Linux was built
with block devices from the beginning, and working around them is likely
to result in finding more bugs than you might find otherwise.  ;-)

  -- Steve





More information about the PLUG mailing list