Find the Largest Files

The other day I was working on a server and needed the LARGEST files on some directory – including its subdirectories.

As it turns out, it’s a very simple task limiting the file size you want with the output of the find tool.

The -size argument will define the borders of your output. Let’s say you want to find the files smaller than 50 MB on your server,

find / -type f -size -50M

Well, this will print out the full path and the file names, you won’t know which file is at what size. So to improve this, we can execute an ls command on each output,

find / -type f -size -50M -exec ls -lh {} \;

On this command, the {} refers to the output of each find command, and the \; is mandatory since we need to tell find where our -exec line ends, hence the escape character. (\)

Even though this looks good, we can keep improving by printing out the file size all in the same units. (Let’s say, megabytes) The problem with that is, the ls command can printout with the specified block size limit but it will take that block size quantised, meaning the output will only be the exact multiplicands of that block size. So if our block size is set to 1 MB and a file is 900 KB, ls will output it as 1 MB.

Although this is not so accurate, we can always work it around using awk to calculate the numeric values for us. Since ls normally prints out the file size in bytes, we can divide them to become actual megabytes. The line below will printout ls with actual megabytes.

ls -l | awk '{print $1 " " $2 " " $3 " " $4 " " $5/1048576 " " $6 " " $7 " " $8 " " $9}'

Well, we only need the 5th column and the 9th column which are the size and the path respectively, so the command below will suffice :

ls -l | awk '{ print $5/1048576 " " $9 }'

As you can see, we had to use a pipe to get things done here. So we need to use this pipe in our exec part of our find command, which is another problem. Well, the work around for this is to -exec a shell instance and pass the whole ls and awk line including the pipes so that new shell instance will handle things for us.

find / -type f -size -50M -size +20M -exec sh -c "ls -l '{}'|awk '{print \$5/1048576 " MB: " \$9}'" \;

Ok, let’s have a look at the command above. As you can see we narrowed our limits further, by getting only the files smaller than 50 MB and larger than 20 MB. We also passed our whole command with a shell instance. On this instance, the argument for ls was passed with the {} method. We apostrophized it (”) due to the possibility of having spaces in the filename, which would have caused a problem. After that, we’ve piped our output to awk, divided the bytes, and added a string ” MB: ” right before printing the 9th column which is the file path. Don’t forget that we should escape the $5 and $9 using the escape character \ since we don’t want the whole find line to process it before our awk does.

Well, the good thing is we have necessary output, the bad thing is that, it isn’t in order! So let’s make things even prettier and sort them, while making the output of each “MB:” bold to get some eye candy.

find / -type f -size -50M -size +20M -exec sh -c "ls -l '{}'|awk '{print \$5/1048576 \" \033[1mMB:\033[0;0m \" \$9}'" \; | sort -nr -k1

As you can see here, we’ve piped the find command to the sort, not the shell instance that we invoked in the find command, that’s why the pipe is right after our \; character.

On this command, find will also search inside /proc directory, which is a living directory, so during the search some files will be created and destroyed rapidly, which will cause some annoying outputs saying “file not found”. To avoid that, let’s tell find NOT to search the /proc directory using the -prune argument.

find / -path '/proc' -prune -o -type f -size -50M -size +20M -exec sh -c "ls -l '{}'|awk '{print \$5/1048576 \" \033[1mMB:\033[0;0m \" \$9}'" \; | sort -nr -k1

You can add new directories to prune with the -path ‘/new/directory/to/prune’ -prune -o method.

Hope this helps.

awk, centos, console, file, find, large, linux, ssh, terminal, unix

Find the Largest Files

Leave a Reply Cancel reply

Articles

Find us on Map

Popular Knowledge Base Articles

Legal

Enterprise DataCenter serving to EMEA