Yesterday, at work, we had the typical case where
df would say there is (almost) no space left on some device, while
du doesn’t see as much data present as you would expect from this situation. This happens when you delete a file that another process has opened (and, obviously, not yet closed).
In typical UNIX filesystems, files are actually only entries in a directory, pointing (linking) to the real information about the content, the inode.
The inode contains the information about how many such links exist on the filesystem, the link count. When you create a hard link (
-s), you create another file entry in some directory, linking to the same inode as the “original” file. You also increase the link count for the inode.
Likewise, when removing a file, the entry in the directory is removed (though most of the time, really only skipped, but that’s another story), and the link count decreased. When the link count is zero, usually, the inode is marked as deleted.
Except when the usage count is not zero.
When a process opens a file, the kernel keeps a usage count for the corresponding inode in memory. When some process is reading from a file, it doesn’t really expect it to disappear suddenly. So, as long as the usage count is not null, even when the link count in the inode is zero, the content is kept on the disk and still takes space on the filesystem.
On the other hand, since there is no entry left in any directory linking to the inode, the size for this content can’t be added to
Back to our problem, the origin was that someone had to free some space on a 1GB filesystem, and thought a good idea would be to delete that 860MB log file that nobody cares about. Except that it didn’t really remove it, but he didn’t really check.
Later, the “filesystem full” problem came back at someone else, who came to ask me what files from a small list he could remove. But the files were pretty small, and that wouldn’t have freed enough space. That gave me the feeling that we probably were in this typical case I introduced this post with, which
du -sk confirmed: 970MB used on the filesystem according to
df, but only 110MB worth of data…
Just in case you would need to find the pid of the process having the deleted file still opened, or even better, get access to the file itself, you can use the following command:
find -L /proc/*/fd -type f -links 0
(this works on Linux ; remove
-L on recent Solaris ; on other OSes, you can find the pid with
Each path this command returns can be opened and its content accessed with a program, such as
cat. That will give access to the deleted content.
I already adressed how to re-link such a file, which somehow works under Linux, but in my case, all that mattered was to really remove the file, this time. But we didn’t know if it was safe to stop the process still holding the file, nor how to properly restart it. We were left without a possible resolution, but still needed to come up with something before the filesystem gets really full while waiting to be able to deal with the root of the problem.
The first crazy idea I had was to attach a debugger to the process, and use it to close the file descriptor and open a new file instead (I think you can find some examples with google). But there was no debugger installed.
So, I had this other crazy idea: would
truncate() work on these
You know what? It does work. So I bought us some time by running:
perl -e 'truncate("/proc/$pid/fd/$fd", 0);'
(somehow, there is no standard executable to do a
truncate(), so I always resort to
Afterwards, I also verified the same works under Linux (where you wouldn’t really know what it’d do with these files that are symbolic links to somewhere that doesn’t exist).
The even simpler following command works, too.
open() with O_WRONLY | O_CREAT | O_TRUNC, and
close() right after (to simplify), which has the same effect.
Good to know, isn’t it?