Linux: Looking for Large Folders with du
-
One of the most common tasks of the system administration is locating "what is using up the disk space" on a machine. You might look at a filesystem using df and find that it is using more space than you expected and you want to find out where that space is being used.
Filesystem Size Used Avail Use% Mounted on /dev/sda1 114G 18G 91G 17% /
Using the du command, and combining it with a few, simple command line tools, we can quickly and manually explore the filesystem to look for important space wasters. We start by doing a summary du on the root of the filesystem in question, which is the root / in this case.
# du -smx --exclude=proc /* | sort -n | tail -n 5 705 /opt 1050 /root 2755 /var 5572 /usr 7135 /home
Wow, that seems like a long command. We should break it down to understand what we just did. First the du portion. We start with the -smx flags. This mean to summarize each directory that we encounter, display output in megabytes and the x means to limit the recursive file discovery to only the current filesystem (any other filesystem mounted under one of those locations will be skipped.) The --exclude=proc portion tells the du command not to read the
/proc
folder as that is not an on disk system and will cause errors and delays in the command unnecessarily. The directory /* option denotes to read everything (the * wildcard) under the root / mount point. Then the output of that statement is piped (see our lesson on BASH redirection and pipes) into the sort command where we use the -n option to make it sort numerically instead of alphabetically. Finally we pipe that output into the tail command where we limit the output to the final (or largest) five items discovered by the initial du command. It is because of the sorting that we need to use megabytes instead of human readable form in the initial command.That might seem like a lot at first but once you know the simple building blocks of du, sort and tail along with BASH command structures it is quite simple and straightforward and similar to many tasks that we will do as system administrators using standard tools.
Now, given the output of the command that we just saw, we can delve deeper into the directory structure to narrow down where culprits may exist. One of the reasons that we often do this task manually is that it is just simply quick and easy and does not require a more complicated tool, but also because we can easily massage the data to take into account things that we know about the system, like that the
/home
directory contains things that we cannot delete and investigating it is a waste of time (that's an example and would not normally be true.) In this case, we will assume that/var
is using more space that we feel is appropriate and we will look there to see what is taking up the space that it is.We will change directory into the folder in question and run the original command again (removing the absolute path starting point to make it generic so that we can run it again and again.)
# cd /var # du -smx --exclude=proc * | sort -n | tail -n 5 5 log 14 backups 176 tmp 297 lib 2265 cache
From this we now see that the cache is the big user of space within the
/var
directory. We can learn more about what is using space within that by repeating our steps from above.# cd cache # du -smx --exclude=proc * | sort -n | tail -n 5 2 man 7 cups 7 debconf 87 apt-xapian-index 2163 apt
And now we see that the
apt
directory (its absolute path at this point is/var/cache/apt
) is what is using nearly all of the space not only of cache but of var above it.# cd apt # du -smx --exclude=proc * | sort -n | tail -n 5 45 pkgcache.bin 45 srcpkgcache.bin 2074 archives
Going down into
apt
we see thatarchives
is nearly all of the space used withapt
. We are learning a lot from a single, simple exercise. One more level, we will find what is going on:# cd archives # du -smx --exclude=proc * | sort -n | tail -n 5 62 chromium-browser_48.0.2564.116-0ubuntu0.14.04.1.1111_amd64.deb 66 chromium-browser_49.0.2623.108-0ubuntu0.14.04.1.1113_amd64.deb 66 chromium-browser_49.0.2623.87-0ubuntu0.14.04.1.1112_amd64.deb 79 duck_4.9.2.19773_amd64.deb 82 duck_4.7.5.18825_amd64.deb # pwd /var/cache/apt/archives
At this bottom most level our command turns up individual files that are of roughly the same size; this tells us that the final directory that we have arrived at (as shown with the
pwd
command) contains a large number of small files that together add up to take up the large amount of space that we had observed. We can verify this of course using either the ls or du commands, but we already know it to be true. We can also do a quick count of the files in the directory to understand the scope:# ls | wc -l 1251
That is a lot of files, no wonder that even being generally pretty small that they are taking up so much space.
I recommend doing this as an exercise on your own system. Use du to delve into the filesystem and see what is taking up a large amount of space in different areas.
Part of a series on Linux Systems Administration by Scott Alan Miller
-
I've always just used
-h
for human readable. I never realized-m
would give you the MB size. -
@johnhooks said in Linux: Looking for Large Folders with du:
I've always just used
-h
for human readable. I never realized-m
would give you the MB size.I have the "advantage" of having learned this stuff before the human readable flag was added