Unix for Busy People - File and disk management and locating files

From Public wiki of Kevin P. Inscoe
Jump to navigation Jump to search

File and disk management and locating files

Disks and mount points

Last week we talked about devices and the basics of disks, file systems and files.

If you recall all devices on Unix systems are accessed as a file in a directory structure just as files are.

Mounting a disk is an ancient computing term coming from the mainframe days. It comes from the days when disks were large removable packs roughly the size of small tires which would be mounted on a hub in a disk cabinet and then "mounted" on the system for use much like you would a tape in a tape drive. The act of mounting a disk makes it available to the system for general use. On some operating systems mounting a disk can be made available to a limited set of users or even just a single user however in Unix since mounts are appear as files in directories all users can access them.

Notes: http://en.wikipedia.org/wiki/Mount_%28computing%29#Mount_point

Access to a file system can controlled by permissions (which we will talk about in more detail in Class 6). If you recall disks are generally useless in their raw format but when they have a file system applied on top of them they can then store files and directories. But how do we access the file system using the "raw" disk device file name? Along comes the need for mounting. In Unix-like systems, the mount point is the location in the operating system's directory structure where a mounted file system appears. Normally the mount point is in the root file system but it does not have to be.

Ufbp 2nd disk mp.gif

An example of a mount points in the root file system:

Ufbp mp exa2.gif

How can we determine what directory we are working with is mounted on what file system?

In Linux we can issue the mount point command:

$ mountpoint /boot
/boot is a mountpoint
$ mountpoint /home
/home is not a mountpoint


In Unix, all data are stored in repositories called files. You could also use files to store any reports that you write or to save the e-mail that you receive.

Like in the "real" world, it is not a good idea to have lots of files "lying around" in a disorganized manner. Unix allows you to organize your files into directories. A "directory" is a location where files are kept in a list. For instance, you could create a directory to store all your files for the first lab and call it Lab1. You could create another directory called Lab2 to store your files for the second lab. If you are already familiar with either Apple Macintosh computers or Windows File Manager, just think of Unix directories as being the same as folders.

Unix has commands that you can use to create and delete files and directories within your home directory. Unix also gives you commands to change from your home directory to other directories. The directory that you're in at any given point of time is called your current working directory.

The command pwd (or print working directory) displays our current directory and the command cd (or change directory) allows you to change your current working directory or just current directory.

You can use the mkdir command to create new directories. In the home directory, let us create a directory called test1:

$ mkdir test1

You can use the rmdir command to remove directories. So let's now remove the directory we just created:

$ rmdir test1

New commands: pwd, cd, mkdir and rmdir

Home directory

When you login to a Unix system, the system puts you in your home directory. Your home directory is the directory that is assigned to you to store all of your files. The home directory not unlike the same concept as the home page that your browser always starts up with.

In your home directory as mentioned you have a number of customized files that exist only for your use such as .login, .profile or .bashrc depending on the shell your using. We will talk more about shells in Class 11.

Relative directories

There are several shortcut notations for directory navigation. A single dot indicates the current directory and two dots .. indicate the directory above the one we are in. You can actually navigate using this nomenclature. For instance in our mount point example above let's say I was sitting in the /home/kinscoe/book1 directory but instead needed to move over to the /home/kinscoe/tools directory I could simply do

$ pwd
$ cd ../tools
$ pwd

~ means your home directory, so cd ~ will move you back to your home directory. Just typing a plain cd will also bring you back to your home directory.


Notes: http://www.cyberciti.biz/faq/linuxunix-rules-for-naming-file-and-directory-names/

Notes: http://freeengineer.org/learnUNIXin10minutes.html#Listing

As mentioned last week there are three types of files. Review what are the three file types?

When naming files, you can use uppercase and lowercase letters, numbers, and certain special characters. It's a really good idea to stick with letters, numbers, and the dash, dot, and underscore characters to avoid trouble and confusion. Particularly avoid using spaces when naming Unix files. Using spaces even multiple spaces is legal in a Unix file name. The thing to remember is that some non-alphanumeric characters such as the back slashes (\) or question marks are interpreted by the shell which can cause confusion when trying to name a file so these characters should be avoided. In general avoid the following characters: asterisks, backslashes, spaces,pipe character ("\"), brackets of any kind or question marks. In short only use dots, dashes and underscores and personally I recommend only using underscores to denote spacing between words. Never use a dash as the first character of the filename as the shell might interpret this as a command flag.

Files starting with a dot are hidden files. They behave just like any other file, except that the ls (list files) command will not display them unless you explicitly request it to do so. Your .profile file in your home directory is an example of a hidden file.

Also remember that Linux file names are case sensitive, which can be difficult to get used to if you have a DOS background. Linux allows you to have unique files named goodstuff, GOODSTUFF, and GoodStuff are all different file names.

It's best to always use lowercase in Unix unless you can think of a good reason to use uppercase or mixed case. Most Unix people use lowercase almost exclusively, but aside from this cultural point, there's another good reason to use lowercase. If you're sharing or accessing a MS-DOS file system with Unix, MS-DOS will not be able to see the files that have uppercase or mixed-case file names.

Unlike under MS-DOS, the dot character (.) has no special meaning. You're not limited to the eight dot three (xxxxxxxx.yyy) style of naming because Unix treats the dot just like any other character; you can name a file Some.Yummy.CHEESECAKE.Recipes if you're so inclined.

Along these lines, Unix executable programs files do not need or use a special extension such as .exe or .bat. Unix will happily run a program file named zippity just as readily as it will run DOODAH.EXE.

File commands

The touch command updates the date of a file to the current date and time. This is most useful when a file is used for it's time stamp in a sequence of events or as a flag when another event is supposed to occur. Most often you see the touch command used with the make command when compiling or building software. The make command is out of the scope of this class however you can fine more about it in the book [Projects with make].

$ ls -l archive.zip
-rw-r--r-- 1 kinscoe kinscoe 50220007 2010-02-15 18:38 archive.zip
$ touch --date 2007-05-01 archive.zip
$ ls -l archive.zip                  
-rw-r--r-- 1 kinscoe kinscoe 50220007 2007-05-01 00:00 archive.zip
$ touch archive.zip
$ ls -l archive.zip
-rw-r--r-- 1 kinscoe kinscoe 50220007 2010-02-15 18:39 archive.zip

On some versions of Unix (most notably those using the GNU version of the touch command) you can set a file date to an arbitrary value.

$ touch --date 2009-10-09 archive.zip
$ ls -l archive.zip                  
-rw-r--r-- 1 kinscoe kinscoe 50220007 2009-10-09 00:00 archive.zip

ls allows you list the files in your current or some other directory if you specify it on the command line.

$ ls -l
total 507408
drwxr-xr-x 5 kinscoe kinscoe      4096 2008-11-01 13:21 20060601
drwxr-xr-x 2 kinscoe kinscoe      4096 2008-11-01 13:21 abook
drwxr-xr-x 3 kinscoe kinscoe      4096 2008-11-01 13:21 archives
drwxr-xr-x 3 kinscoe kinscoe      4096 2008-12-02 17:47 bin
drwxr-xr-x 6 kinscoe kinscoe      4096 2008-11-01 13:21 bnc
drwxr-xr-x 9 kinscoe kinscoe      4096 2008-11-01 12:52 burn
-rw-r--r-- 1 kinscoe kinscoe       173 2008-11-01 16:54 cron.txt
drwxr-xr-x 3 kinscoe kinscoe      4096 2008-11-01 12:52 cvs
drwxr-xr-x 4 kinscoe kinscoe      4096 2008-11-01 12:52 dev
drwxr-xr-x 2 kinscoe kinscoe      4096 2008-11-01 13:28 downloads
-rw-r--r-- 1 kinscoe kinscoe    114960 2004-06-27 19:58 FldDay.adi
-rw-r--r-- 1 kinscoe kinscoe  20465973 2006-06-02 11:35 june1win.tar.gz
-rw-r--r-- 1 kinscoe kinscoe   4403200 2009-02-07 08:55 kb2kskype-0.3.8.tar
-rw-r--r-- 1 kinscoe kinscoe    323477 2006-06-02 11:35 kevin_20060601.tar.gz
-rw-r--r-- 1 kinscoe kinscoe  10349819 2006-06-02 14:09 kevin_bat.zip
-rw-r--r-- 1 kinscoe kinscoe     30698 2004-06-27 20:01 LogBckUp.dat
drwxr-xr-x 2 kinscoe kinscoe      4096 2008-11-01 13:29 logs

The cp command copies files from one location to another or as a different name.

$ touch test.cc
$ mkdir temp
$ cp test.cc temp/.

The mv command works like the cp command except it moves the file instead of copying it making the change permanent. Think of mv as rename (like in MS-DOS). You can mv files across different file systems with most modern Unix systems.

$ mv temp/test.cc .
$ mv test.cc file1.cc

The rm command removes a file from the disk (so be careful with this command!).

$ rm file1.cc

Most modern versions of rm allow a confirmation to be requested when deleting files which is a good idea if you are unsure of what you are doing (and even if you are sure!).

$ rm -i file1.cc
rm: remove regular empty file `file1.cc'? y

The shred command only works on Linux systems and allows a file to be over written when deleted for security purposes if the file is sensitive enough to warrant it. Think of shred like you would shredding a bill or personal documents you don't want prying eyes to see.

$ shred $ echo "top secret" > file1.cc
$ cat file1.cc                
top secret
$ od -c file1.cc | head -n 1  
0000000   t   o   p       s   e   c   r   e   t  \n
$ shred -v file1.cc
shred: file1.cc: pass 1/3 (random)...
shred: file1.cc: pass 2/3 (random)...
shred: file1.cc: pass 3/3 (random)...
$ od -c file1.cc | head -n 1
0000000   &   +   I 030  \n   ®   @   b 230   «   ¬   » 004   ÿ 020   ²

Because the shred command overwrote file1.cc with garbage characters some of which of unintentional consequences on terminal screens I used the od (or octal dump) command to display the files characters so as to filter out non-ASCII characters. You then remove the file using the rm command or shred -d command knowing the file contents have been overwritten with garbage three times to obscure the original contents on all but every expensive magnetic reading equipment the stuff of spy agencies would use.

The ln as we mentioned in the previous class on file types create a software link (or alias) from an original file name to an additional alias name. Since we covered software links in class 3 I won't go into more detail here.

Further notes on the ln command: http://en.wikipedia.org/wiki/Ln_%28Unix%29

The file command displays a files type based on what is referred to as 'file magic'. To make a long story short file magic is a small database of what are called 'magic numbers' that uniquely identify certain file types based on the firs through the third lines of a given file. The file can be either binary of ASCII in nature. Typically a file like /etc/magic is what is considered when looking up file magic. In shell scripts typically the [[1]] in the first line of the script also acts as a special magic number.

Looking at a C-shell script:

$ file bin/get_where.sh
bin/get_where.sh: C shell script text executable
$ head -n 1 bin/get_where.sh
#!/bin/csh -f

Looking at a shell script that has commands but no shebang:

$ head -1 $ file test_script.sh
test_script.sh: ASCII C program text
$ head -n 1 test_script.sh
#ident  "@(#)profile    1.18    98/10/03 SMI"   /* SVr4.0 1.3   */

Looking at a binary file that is meant to be run as a command:

$ file /usr/bin/cc
/usr/bin/cc: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped

Notes on Unix file magic: http://en.wikipedia.org/wiki/File_format#Magic_number

The locate command finds a file name within a pre-built database using the slocate command (typically each night via cron job). Unfortunately it is not usual for the slocate (or updatedb on some Unix systems) command to be run nightly off the shelf in most Unix installations. Unless the system administrator knows about the slocate command and knows how to configure it to run on a regular basis the locate command won't be of much use to you the user. However if you find it is available and updated all the better for you! Basically the locate command does a text search of the slocate database and returns all matching results. Since depending on how unique the text string you use you could back a lot of results. For this reason it's usually best to filter the results using a command such as grep. We will talk more about grep n a future class.

$ locate gcc | grep /bin

If the file name you are searching for cannot be located you simply get back nothing in return:

$ locate supercalifragilisticexpialidocious

!!!Disk space:

Disk storage like everything else in the universe is a finite resource and much like that bedroom closet can over time become unorganized and quite frankly a cluttered mess. So ever now and then we mist go cleaning up or removing no longer needed files first before we go asking for more storage (this has been a plea from your friendly neighborhood storage group). Let's look at some tools or command we can use to help us with this task:

First we need to know how much space we may be using.

The df command displays how much disk free:
$ df -k
Filesystem            kbytes    used   avail capacity  Mounted on
                    41545728 3736996 35461338    10%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                 2247336     328 2247008     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
                    39198334 3736996 35461338    10%    /lib/libc.so.1
fd                         0       0       0     0%    /dev/fd
swap                 2247080      72 2247008     1%    /tmp
swap                 2247056      48 2247008     1%    /var/run
rpool/export         41545728      22 35461338     1%    /export
rpool/export/home    41545728      21 35461338     1%    /export/home
                    41545728     849 35461338     1%    /export/home/kinscoe
rpool                41545728      81 35461338     1%    /rpool
                    74770560 21155368 49817028    30%    /export/class

We should also be aware how much disk space our account is consuming:

$ cd $HOME; du -sk .
282808  .

The shell variable $HOME is shorthand for our home directory. Many commands that display disk space information allow a -k for kilobytes. Linux has a -h for human readable format as well. So from the above we know we are consuming 282.808KB or almost 283MB of disk space. The dot refers to our current directory.

!!!Finding files:

Another good command to use when try to weed out offending file space hogs is the find command. The find command locates files within a directory based on certain criteria such as file size, permissions, owner or group membership, etc.. Find has many flags (a command flag is the options passed to a command preceded by a single or double dash) and can differ some what in capabilities between Linux and Solaris in it's functionality but it's core use remains the same.

Some find examples:

Notes: http://www.athabascau.ca/html/depts/compserv/webunit/HOWTO/find.htm

To find all files over 10mb in size:

$ find . -size +10M

!!!Finding out more about find.

Most Unix commands (actually all except some home made commands or ones you compile yourself) have an entry in the manual pages. We will go into manual pages in a later class but for now know that you can use the command man and a command name to get more information on it's use.

$ man find
FIND(1)                                                            FIND(1)
      find - search for files in a directory hierarchy
      find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]
      This  manual  page  documents  the  GNU  version of find.  GNU find
      searches the directory tree rooted at each given file name by eval-
      uating  the  given  expression from left to right, according to the
      rules of precedence (see section OPERATORS), until the  outcome  is
      known  (the  left  hand  side is false for and operations, true for
      or), at which point find moves on to the next file name.


Many Unix commands also have a help option which is usually specified as -h or on Posix commands (mostly Linux) --help.

$ find -h
find -h
find: unknown predicate `-h'
kinscoe@newbob:/home/kinscoe/mp3/christmas> find --help
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
default path is the current directory; default expression is -print
expression may consist of: operators, options, tests, and actions:
operators (decreasing precedence; -and is implicit where no others are given):
     ( EXPR )   ! EXPR   -not EXPR   EXPR1 -a EXPR2   EXPR1 -and EXPR2
     EXPR1 -o EXPR2   EXPR1 -or EXPR2   EXPR1 , EXPR2
positional options (always true): -daystart -follow -regextype
normal options (always true, specified before other expressions):
     -depth --help -maxdepth LEVELS -mindepth LEVELS -mount -noleaf
     --version -xdev -ignore_readdir_race -noignore_readdir_race
tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N
     -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group NAME
     -ilname PATTERN -iname PATTERN -inum N -iwholename PATTERN -iregex PATTERN
     -links N -lname PATTERN -mmin N -mtime N -name PATTERN -newer FILE
     -nouser -nogroup -path PATTERN -perm [+-]MODE -regex PATTERN
     -readable -writable -executable
     -wholename PATTERN -size N[bcwkMG] -true -type [bcdpflsD] -uid N
     -used N -user NAME -xtype [bcdpfls]
actions: -delete -print0 -printf FORMAT -fprintf FILE FORMAT -print 
     -fprint0 FILE -fprint FILE -ls -fls FILE -prune -quit
     -exec COMMAND ; -exec COMMAND {} + -ok COMMAND ;
     -execdir COMMAND ; -execdir COMMAND {} + -okdir COMMAND ;
Report (and track progress on fixing) bugs via the findutils bug-reporting
page at http://savannah.gnu.org/ or, if you have no web access, by sending
email to <bug-findutils@gnu.org>.

Some find examples

To find all files created more then 30 days:

$ find . -mtime +30 -ls
 271    1 drwxr-xr-x   3 root     sys           512 Nov 16  2008 .
 272    1 drwxr-xr-x   2 root     sys           512 Nov 16  2008 ./amd64
 676   15 -rwxr-xr-x   1 root     sys         15184 Aug 14  2007 ./amd64/javaexec
 677   16 -rwxr-xr-x   1 root     sys         15572 Aug 14  2007 ./javaexec

To find all files that have the word 'Micron' some where in the name:

$ find . -name '*Micron*' -ls
1459935    4 drwxr-xr-x  21 kinscoe  kinscoe      4096 Nov  1  2008 ./burn/inscoe200311/Kevin/downloads/Micron_LT