Unix for Busy People - File systems, files, directories and devices
File systems, files, directories and devices
Unix sees devices as either a stream or a block oriented. A stream access is a sequential stream of data sent in one direction or bi-directional from the CPU to the device. A block device on the other hand can be accessed randomly in any order and transfers data via chunks of data referred to as blocks. Blocks are the smallest addressable unit of the device. Typical examples of a block device include the hard disk, a DVD or any kind of disk storage device including solid state. A stream device would be the network, serial or printer port or a USB port when not used for data storage.
Regardless of orientation all devices are identified by a major and minor number in the kernel although in newer versions of both Linux and Solaris this identification scheme is largely symbolic now.
Devices are bound (loaded into) to the kernel via software called a device driver. They are accessed logically though the device driver. The device driver then communicates directly (though the cpu, the bus and sometimes DMA or Direct Memory Access) to the device.
All devices appear as a file system (we will explain file systems in a moment) and are usually rooted in the /dev directory. An example of a SCSI disk in Linux might be /dev/sdb1 and in Solaris /dev/dsk/c0t0d0s1. Normally as a user you needn't worry much about devices only to be aware that they exist. As a non-root user it would normally be impossible to write to our otherwise do any damage to a device (if the system is configured properly). Access to devices are always through several layers of software (Utility, API, file system, kernel, device driver, etc) and then though several layers of hardware (buffer, cache, cpu, bus and finally the device itself and any remotely attached devices as in SCSI LUN's).
A fairly decent (albeit technical) explanation of devices and device drivers (if you want to know the gory details) is locate [].
/dev/null -- Also known as the "bit bucket" or "black hole", this virtual file discards all contents written to it. This is typically used to throw away unwanted data streams, such as log files.
/dev/random -- This is a virtual file which contains random numbers (subject to the limitations of Random Number Generators in Computing). It uses system noise to generate random numbers and blocks if not enough entropy in the noise is available. Random is commonly used more by programs that absolutely need high quality random data (such as SSH to generate an encryption key).
/dev/urandom -- Same as /dev/random, except it always returns random numbers, even if there is not enough entropy in the system noise available. In the latter case, pseudorandom numbers are generated, which are based on an algorithm, depending on the type of Unix system.
The Unix terminal is a simple device that acts much like a file. Terminal emulation is still used by telnet, ssh, xterm, since hardwired terminals are rarely used today. Use tty to tell the name of the current terminal device. Try cat /etc/motd > terminal-device-name or any other Unix command to read or write to the device. Unlike other devices (particularly stream devices) terminal drivers perform a lot of additional processing to be more adaptable to humans such as buffering, terminal addressing using escape codes, line disciplines (XON/XOFF) and modem control and terminal characteristic such as local echo and synchronous and asynchronous operation.
Simply put storage is anything the computer can use to maintain state. This state can be permanent or semi-permanent. Semi-permanent would include random-access memory. Permanent storage is what we will be talking about here and these are typically used for storing logical chunks of data known as files. A file is a collection of data that the smallest unit usable by humans. Think of a file as a papers in a file folder. They are all referenced, bound with and accessed from that file folder. So it is with computer files. There are a number of technologies in Unix for storing files. These can include tape, disk, CD-ROM, DVD and solid state drives like USB or SD and Compact Flash cards. Some drives are logical and are accessed across the network such as NAS or Network Attached Storage or are virtual such as a virtual disk file in some virtual host environments. There are also larger scale attached drives like SCSI and SAN Storage Area Networks. HMH deploys almost all of these technologies.
Disks also known as HDD or hard disk drives (as opposed to floppy drives in ye olden days) is a non-volatile (essentially permanent) storage device that stores digitally encoded data on rapidly rotating rigid (i.e. hard) platters with magnetic surfaces. Strictly speaking, "drive" refers to the motorized mechanical aspect that is distinct from its medium, such as a tape drive and its tape, or a floppy disk drive and its floppy disk. Early HDDs had removable media; however, an HDD today is typically a sealed unit (except for a filtered vent hole to equalize air pressure) with fixed media.
Disk geometry and characteristics
HDDs record data by magnetizing ferromagnetic material directionally, to represent either a 0 or a 1 binary digit (state). They read the data back by detecting the magnetization of the material. A typical HDD design consists of a spindle that holds one or more flat circular disks called platters, onto which the data is recorded. The platters are made from a non-magnetic material, usually aluminum alloy or glass, and are coated with a thin layer of magnetic material.
The platters are spun at very high speeds. Information is written to a platter as it rotates past devices called read-and-write heads that operate very close over the magnetic surface. The read-and-write head is used to detect and modify the magnetization of the material immediately under it. There is one head for each magnetic platter surface on the spindle, mounted on a common arm. An actuator arm (or access arm) moves the heads on an arc (roughly radially) across the platters as they spin, allowing each head to access almost the entire surface of the platter as it spins. The arm is moved using a voice coil actuator or in some older designs a stepper motor.
Cylinder-head-sector, also known as CHS, was an early method of mapping the geometric coordinate (cylinder/head/sector) of data on a disk's surface and the addressing system used by the disk's filesystem (linear base address or LBA). Though CHS values no longer have a direct physical relationship to the data stored on disks, pseudo CHS values (which can be translated by disk electronics or software) are still being used by many utility programs.
Logical block addressing
Data on single disks are now addressed using LBA or Logical block addressing.
Zone Bit Recording
I am not a disk "geek" and I don't want to get to deep into this however, note that current disk drives use Zone Bit Recording, where the number of sectors per track depends on the track number. The disk drive will report a SPT or number of sectors per track for the disk to provide for these calculations, but which has little to do with the disk drive's true geometry.
The spindle of a hard disk is the spinning axle on which the platters are mounted.
In modern computing it is advantageous to group storage disks into collections known as volumes. These then become known as "logical volumes" and usually appear to the operating system as one single disk but deep down inside at the hardware level are multiple spindles, read-write heads, cylinders and sectors all appearing as one logical drives. This helps boost performance hence why this ability exists. In ye olden days it was common for database administrators (particularly Oracle) to try to separate database storage onto separate spindles for performance gains. This thinking is no longer requested due to volumes and volume management. Volumes also gave way to using RAID or redundant array of inexpensive disks which is a technology that allowed computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy. "RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple hard disk drives. The different schemes/architectures are named by the word RAID followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs involve two key design goals: increase data reliability and/or increase input/output performance. When multiple physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across multiple disks, but like a disk volume the array is seen by the computer user and operating system as one single disk. RAID can be set up to serve several different purposes.
Like MS-DOS based systems (Windows) Unix disks are divided into logical groups called partitions. Partitons may use a portion or the entire disk storage. In Unix-based and Unix-like operating systems such as Linux and Mac OS X, it is possible to create multiple partitions (also known in the Solaris operating system and the BSD based operating systems as "slices") on a disk device. Each partition can be used for a file system or as a swap partition.
Multiple partitions allow directories such as /tmp, /usr, /var, or home directory space to be allocated their own file system. Such a scheme has a number of potential advantages: if one file system gets corrupted, the rest of the data (the other file systems) stay intact, minimizing data loss; specific file systems can be mounted read-only, or with the execution of setuid files disabled (thus enhancing security); performance may be enhanced due to less disk head travel. However, the disadvantage of subdividing the drive into fixed-size partitions is that a file system in one partition may become full, even though other file systems still have plenty of usable space.
A good partitioning scheme requires the user to predict how much space each partition will need, which may be a difficult task; especially for new users. Logical Volume Management, often used in servers, increases flexibility by allowing data in volumes to expand into separate physical disks (which can be added when needed); another option is to resize existing partitions when necessary.
The Unix file system (often also written as filesystem) is a method of storing and organizing computer files and the data they contain to make it easy to find and access them. Unix File systems usually use a data storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files, they might provide access to data on a file server by acting as clients for a network protocol (e.g., NFS (NAS), SMB, or NAS clients), or they may be virtual and exist only as an access method for virtual data (e.g., procfs). It is distinguished from a directory service and registry. It is the file systems job to remember where you stored your files and be able to retrieve them for you on demand. At HMH there are several file systems in use including NAS or Network Attached Storage which uses a protocol called NFS or Network File System.
Each file system is stored in a separate whole disk partition.
In Unix-like operating systems, the Unix directory structure is a convention of organization within a file system.
To use the example of a physical file cabinet, if the separate drawers in the file cabinet are represented as the highest level of sub-directories in the file system or system prompt, then the room the file cabinet is in, may be represented as the root directory.
The directory structure is hierarchical and begins with the root file system and extends downward using the forward slash "/" as the delimiter in the path name. The further down you go the moe slashes are used.
Directories can contain files or other directories called sub-directories.
Sub directories are directories under the root (/) directory or other directories below that level. Directories can be created or renamed only by the the system administrator in the root file system. Normal non-privileged users cannot create directories directly under the root file system but are usually assigned to a lower directory in the structure such as /home or /export/home. Then a directory will be created under one of those directories usually with your login name. For example a person named Andy Johnson would be assigned a username johnsona and given a "home" directory of /home/johnsona or /export/home/johnsona. This is considered your "home" directory and is where you land whenever you login. It is normally where all the files you create are stored and is also where your environment files live.
Root directory or root file system
The root file system is the primary file system on a Unix system. As the name implies it contains the primary file system ion which the operating system is stored and uses for file storage. There is a special command called chroot which can change this for a given login (job) session.
The root directory is the directory on Unix-like operating systems that contains all other directories and files on the system and which is designated by a forward slash ( / ).
The use of the word root in this context derives from the fact that this directory is at the very top of the directory tree diagram (which resembles an inverted tree) that is commonly used to represent a filesystem. Strictly speaking, there is only one root directory in your system, which is denoted by / (forward slash). It is root of your entire file system and can not be renamed or deleted.
On Linux (and some AT&T derived Unixes), there is also a directory which is named /root. Confusingly, it is not a root directory in the sense of this article, but rather the home directory of the Superuser login "root". We will talk about the root login in a later class.
Linux directory structure
/bin - Stands for "binaries"; Contains some fundamental utilities needed by a system administrator. As a failsafe, these were placed in a separate directory so that they could be placed on a separate disk or disk partition in case the main drive failed.
/sbin - Statically linked binaries also meant originally to be a seperate partition.
/usr - Holds executables, libraries, and shared resources that are not system critical: X11, KDE, PERL, etc. The name "Unix System Resources" is a post hoc backronym.)
/boot - Usually a seperate partition which contain boot-strap files needed at boot time.
/dev - short for devices. Contains file representations of every peripheral device attached to the system.
/etc - Contains configuration files and some system databases.
/home - contains the home directories for the users. On Solaris this is usually in /export/home.
/lib - This is the depository of all integral UNIX system libraries.
/lost+found - Each partition has its own lost+found directory. It's purpose as it's name implies is to become a storage bin for files that become lost from their original directory. Only the system administrator needs to worry about this directory.
/mnt - Temporarily mounted filesystems.
/media - Mount points for removable media such as CD-ROMs and PEN drives.
/var - Short for "variable." A place for files that may change often, such as the storage to a database, the contents of a database, log files (usually stored in /var/log), email stored on a server, etc.
/opt - This originally meant optional software applications but has really become to mean any software that is installed that did not come with your Linux distribution so as to avoid contention with file names or software patches being applied in the root file system. There are several schools of thought on this directory and some system administrators (myself included) and some distributions use /usr/local for the same purpose. /proc - This is a special directory used by the kernel. Well, actually /proc is just a virtual directory, because it doesn't exist really. It contains some info about the kernel itself. There's a bunch of numbered entries that correspond to all processes running on the system, and there are also named entries that permit access to the current configuration of the system. Many of these entries can be viewed as text files.
/root - The home directory for the superuser root.
/sys - Modern Linux distributions include a /sys directory as a virtual filesystem (Sysfs, comparable to /proc, which is a Procfs), which stores and allows modification of the devices connected to the system.
/tmp - A place for temporary files. Most Unix systems clear this directory upon start up.
Solaris directory structure
Much like Linux above however also contains some additional psuedo directories which include /net which is used for the Automounter a NFS based network file mounting system not unlike Microsoft Windows UNC drive linking. In Solaris /sys is replaced by /system and /platform is a hardware specific set of system libraries supporting certain hardware architectures.
Every item in a UNIX file system can de defined as belonging to one of four possible types:
Ordinary files can contain text, data, or program information. An ordinary file cannot contain another file, or directory. An ordinary file can be thought of as a one-dimensional array of bytes.
As previously mentioned directories are containers that can hold files, and other directories. A directory is actually implemented as a file that has one line for each item contained within the directory. Each line in a directory file contains only the name of the item, and a numerical reference to the location of the item.
Special files represent input/output (i/o) devices, like a tty (terminal), a disk drive, or a printer. As mentioned before Unix treats such devices like files.
A link is a pointer to another file. Think of links as aliases or another name to locate a file. Without getting too deep into this there are two types of links in a Unix file system: symbolic and hard. A hard link essentially appears to be for all purposes to be the same as the file that it references. Hard links sometimes can be difficult to locate because they share the same identification (inode) as the file they reference and for that reason can be dangerous and should generally be avoided. Files cannot be deleted if it is still be referenced and this includes hard links. Also for this reason hard links can only exist on the same partition. Also known as "symlinks") symbolic or "soft" links on the other hand are merely pointers to another file can be easily located since their file type appears as an "l" in directory listings. Since symlinks are pointers (or guide posts) they can exist in different file systems or even partitions. Symlinks are handy when an application is expecting a directory to exist in a certain path but because of disk space limitations a new file system was created to accommodate the directory or for some reason had to be moved in some way. Think of it as a logical detour sign.