Thursday, March 1, 2012

Getting folder sizes with PowerShell

Today was a good day.  I had an excuse to do some scripting.  One of the Systems Administration rules to live by is "Automate Everything".  Code is reusable, time spent clicking buttons in a GUI to get information is just that, time spent.  Time invested in writing a script to get information for you in a way that is repeatable is time invested.  It may seem like the same amount of time the first time around, but it will pay dividends the next time you don't have to spend time at the GUI clicking, not to mention that you can easily capture the information that you are looking for for further analysis.  Another nice thing about scripting is that you can schedule the collection of information and have it delivered to you.

PowerShell is an interesting environment.  It reminds me somewhat of a Linux shell and it can be scripted like Bash and Perl.  That's what the PowerShell developers were going for, I know, but it does make it much more likeable and useable than batch files and vbScript.  The other thing that I like about PowerShell is that it can easily give me access to .Net namespaces and their properties and methods.  I'm no C# programmer, but I've played with it a bit and it seems that PowerShell is the scripting version of C#.

So my task for today was to get the sizes of a bunch of folders on one of my file servers.  I wanted to know which ones would give up the most space if they were moved to another virtual-disk on that virtual machine.  I had one hard disk that was getting full and I would rather create another disk and move folders to an "archive" location than grow the disk or add another disk under a mount point.  Luckily for me the folders I am dealing with are already sorted by year, so it's just a matter of going back far enough to get the space I want without moving newer files that would inconvenience my users.

So I put together a little script, tested it and when I was satisfied that it would do what I wanted I set it running and went to lunch.  When I got back I had the data that I wanted.  The primary function in the script was something that I came across a couple of years ago and tucked away in a code snippet file.  I honestly can't remember where I found it otherwise I'd give credit where credit is due.  Here is the script:

#==========================================================================================
# PowerShell Script
#
# Name:    CalculateFolderSize.ps1
# Purpose: calculate the size of a folder and its subfolders
#          and return data in csv format
# Author:  Matthew Sanaker, matthew.sanaker.com
# Date:    3/1/2012
#
#==========================================================================================
#
#    USAGE:  CalculateFolderSize.ps1 calculates the size of a folder and its subfolders
#            the root folder and output file are passed as command-line arguments
#            data is returned in two comma delimited fields: folder name, size in GB
#            the output is one row per subfolder 

#
#==========================================================================================

param (
[parameter(Mandatory = $true)][system.IO.DirectoryInfo]$folder,
[parameter(Mandatory = $true)][string]$outFile,

[parameter(Mandatory = $false)][switch]$help
)

$showHelp = " `
    USAGE:  CalculateFolderSize.ps1 calculates the size of a folder and its subfolders
            the root folder and output file are passed as command-line arguments
            data is returned in two comma delimited fields:  folder name, size in GB
            the output is one row per subfolder
           
            example:  ./CalculateFolderSize.ps1 -folder C:\Data -outFile C:\dataSize.txt"
           
if ($help)
{
    $showHelp;
    break;
}

function Get-DirSize {
    param ([system.IO.DirectoryInfo] $dir)
    [decimal] $Size = 0;


    $files = $dir.GetFiles();
    foreach ($file in $files)
    {
        $size += $file.Length;
    }


    $dirs = $dir.GetDirectories()
    foreach ($d in $dirs)
    {
        $size += Get-DirSize($d);
    }
    return $Size;
}


try
{
    $subDirectories = $folder.GetDirectories()

    foreach ($dir in $subDirectories)
        {
        $size = Get-DirSize $dir;
        $GB = $size / 1GB;
        $foldername = $dir.FullName;
        $foldername + "," + $GB | Out-File -FilePath $outFile -Append -NoClobber;
        }
}
catch
{
    "Something does not compute, please check your input"
    "PowerShell Error Message: `
    "
    $error[0];
    $showHelp;
    return;
}

I'll now to go over this a bit to explain what I did and why, so that if it doesn't suit your particular needs you will have a good idea of where to start taking it apart and changing it.

I always start my scripts out from a template with a header and I like to set parameters from the command line instead of hard-coding things.  Setting "[parameter(Mandatory = $true)]" will cause PowerShell to prompt the user for parameters if they are not given when the script is called.  The first parameter, "[system.IO.DirectoryInfo]$folder" is parent folder that you want to start searching from.  I used the .Net class as the object type because it seemed more straight-forward than getting input as a string and converting it later to the object type that I want to work with.  The second parameter is "[string]$outFile" which I will use as the name of the output file.  The next bit is "[switch]$help" which I like to use to pass a friendly help message.

The function that does all of the work is "Get-DirSize" which uses the "system.IO.DirectoryInfo" .Net class to work with the filesystem.  For some reason the "Get-ChildItem" cmdlet doesn't give you folder sizes in an intuitive way, so using the .Net class is actually more direct.  First each folder is entered and all of the files lengths are added up, then each subfolder is entered and the function is run recursively adding the file lengths to the over-all size that is kept in the variable "$size" which is finally returned as the value of the function.

First I pass our "$folder" parameter to the "system.IO.DirectoryInfo.GetDirectories()"method to get our list of subfolders that I keep in the variable "$subDirectories".  I then run a "foreach" loop over the array of sub-directory objects which processes each folder through the Get-DirSize function to return the size of each folder.  Before going on to the next sub-directory object I convert the size which is returned in bytes to something more useful, which for me today was gigabytes by saying "$GB = $size / 1GB".  Next I pull the name of the folder including the path using the "system.IO.DirectoryInfo.FullName" property of the sub-directory object.  Finally I concatenate the folder name, a comma and the folder size and pipe them out to "OutFile" passing it the "$outfile" parameter that I specified on the command line with the "-Append" and "-Noclobber" switches so that when my script is done I have a nice little csv file.  The last thing that I want to point out is the error handling.  It's just a simple "try" and "catch" block which could give you the opportunity to attempt to do something other than fail with an error.  In this case I display the error, show the help message and exit the script.  For a small script like this it's probably overkill, but I keep it in my template as a reminder for when I want to do something more complicated.  Now I can open that file in Excel to run auto-sum on all of the folder sizes, strip off meaningless decimal places or otherwise manipulate my data to get the answer that I'm looking for.

No comments:

Post a Comment