Saturday, March 29, 2014

Stuff you didn’t know about ForEach in PowerShell

ForEach is a very useful tool.  I don’t have many scripts that don’t use it.  I’ve used it for years in hundreds of scripts.  But there is always more to learn.  I recently discovered some new things about it, and found that a lot of the information is not very discoverable and not well known.  So I thought I should share what I know about it.

This article is a mix of basic information, advanced techniques, and esoteric weirdness you’ll never need to know.  So take what’s useful to you and ignore the rest.

The first fun fact about ForEach is that it is actually two different things with the same name.  This was by design, as they intended for the two things to provide related behavior in different circumstances, and this was easier than trying to make one thing do everything.

ForEach #1 is an alias for the cmdlet ForEach-Object.  If you run Get-Help ForEach, you get the help file for ForEach-Object.

ForEach #2 is a command, as opposed to a cmdlet.  (Don’t ask me.  I didn’t come up with these stupid terms.  I normally refuse to use “cmdlet,” but the distinction needs to be made here.)  To get the help file for this one, you have to run Get-Help About_ForEach.

The help file for #2 discusses both of the ForEach’s, but the help file for #1 doesn’t mention #2.  This is unfortunate, as help file #1 is the one you get when you ask for help in the most logical way.  (And some of the information about them doesn’t really appear anywhere.  Hence this article.)

Mostly you don’t have to care about which is which.  It’s like nouns in English.  You generally don’t worry about whether you are using a noun as the subject of a sentence or a direct object, it’s the same word either way.  But once in a while, you have to know the difference between “who” and “whom”.

The ForEach’s are both used for running the same bit of code for each item in a collection.  The one is generally used in the middle of a sentence, and the other at the beginning of a sentence or paragraph.

ForEach #1 (short for ForEach-Object, or long for % ) is normally used within a pipeline.

Well, the syntax is available to put it at the beginning, like this:

ForEach-Object -InputObject $SQLServices -Process { $_.Stop() }

(If you do you this, you can’t use the alias ForEach; you have to spell out ForEach-Object to avoid confusion with the ForEach that normally goes at the beginning of sentences.)

But don’t do that.

Normally you will see it like this:

$SQLServices = Get-Service SQL* -Running
$SQLServices | ForEach { $_.Stop() }

ForEach in this context just means that if there is a bunch of stuff coming down the line, send the items one at a time through the script block (the part between the curly braces).  Within the curly braces, you refer to the item as $_

ForEach #2 goes at the beginning of a ForEach loop.

ForEach ( $Service in $SQLServices ) { $Service.Stop() }

This usage makes it more understandable by giving us a variable name we can use for the item being processed, instead of using $_.  Plus, $_ can be used in several circumstances in PowerShell, and if you nest two of those circumstances, you may not be referencing the $_ you think you are.

ForEach loops are not confined to a single line, and in fact are usually only used for bigger loops.

For example:

ForEach ( $Service in $SQLServices )
    {
    If ( $Service.Status -eq “Running” )
        {
        “Stopping service $($Service.Name).”
        $Service.Stop()
        }
    Else
        {
        “Service $($Service.Name) already not running.”
        }
    }

Note that because this ForEach is NOT short for “ForEach-Object”, if you try to use “ForEach-Object” here, you will break it.

Now let’s get complicated.

Use ForEach loops instead of For loops

There are also things called For loops.  The syntax for For loops is ugly and non-intuitive and ugly.  Fortunately, you almost never need to use them.

For loops are mostly used for going through a loop a certain number of times.

The For loop for doing something ten times looks like this:

For ( $i=0; $i -lt 10 ; $i++ )
    {
    “How about now?”
    Sleep -Seconds 10
    }


Yikes.  If you didn’t know what that did, you would never know what that did.

So don’t do that.  Do this instead:

ForEach ( $i in 1..10 )
    {
    "How about now?"
    Sleep -Seconds 10
    }

If you haven’t seen it before, 1..10 is a wonderful PowerShell syntax which represents an array of the integers from 1 to 10.  Need to count backwards?  Simply use 10..1

You can also use calculated values by putting them in parentheses, like 0..( $Files.Count - 1 )

If you need to go through each item and compare it to the one before, you can’t do that in a pipeline.  But we can reference the previous item in a ForEach loop by iterating through index numbers instead of through the items themselves.

In this example, the index numbers of the items in the $WeatherStats array are 0 through one less than the number of items in the array.  We use ForEach to loop through all of them, with the variable $Day holding the number we are currently on.  We are starting with 1 instead of 0 because we are looking at two-day totals.

“Two day precipitation running totals”
ForEach ( $Day in 1..( $WeatherStats.Count - 1 ) )
    {
    $WeatherStats[$i].Date.ToString() + “`t” + ( $WeatherStats[$i].Rainfall + $WeatherStats[$i-1].Rainfall )
    }

-Begin and -End

The script block in a standard pipeline ForEach is actually just one of three you can have.  The one we normally use by itself, which is processed once for each of the items in the pipeline, is called the -Process script block.  You can also add two others.

The -Begin script block is run just once, before the -Process script block starts processing the pipeline.  The -End script block runs just once, after all of the -Process script block iterations are finished.  $_ is not used because it’s empty within the -Begin and -End script blocks, as they are not processing individual items.

$Computers | ForEach -Begin { “Computer inventory” } -Process { $_.HostName } -End { “Total: “ + $Computers.Count }

It also works to leave out the parameter names.

$Computers | ForEach { “Computer inventory” } { $_.HostName } { “Total: “ + $Computers.Count }

Not that there is much need to ever use these.  Usually it’s much easier to just do that stuff before and after that line, instead of doing it in the middle of the pipeline.  I mostly put this section in because it comes up in the next one.

-RemainingScripts

If you are looking at the help file reading about -Begin and -End, you will see there is a fourth parameter that takes a script block, -RemainingScripts.  You will also see there is no useful explanation for what it does, and no examples that use it.  Googling isn’t very helpful either.  What is it and what can I use it for?

Well, you don’t use it.  It’s not there for you to use intentionally.  It’s there as a programming cheat to compensate for a syntax problem you probably didn’t notice in my example above.

When you leave out parameter names in PowerShell, it makes assumptions about what you meant based on the order of the parameters.

For example, when you say

Add-ADGroupMember $FinanceGroup $PurchasingGroup

PowerShell assumes you want the second group added to the membership of the first group.

And when you say

$Computers | ForEach { $_.HostName }

PowerShell assumes the script block is a -Process script block.

The problem is when you want to say

$Computers | ForEach { “Computer inventory” } { $_.HostName } { “Total: “ + $Computers.Count }

PowerShell again assumes that the first script block is the -Process script block.  But we want the -Begin block to be first, because that makes more sense.  So how do the -Begin and -End parameters know which positional parameters to take?

They don’t.  In fact, they don’t take any of the positional parameters.  -Process takes the first unspecified script block.  -RemainingScripts takes any additional script blocks.  If -RemainingScripts gets a block, some shuffling occurs:  The block -Process picked up gets moved to -Begin.  The first block -RemainingScripts picked up gets moved to -Process.  And if -RemainingScripts got a second block, it moves it to -End.

It’s kind of a neat solution, but it’s a little messy because it’s not completely invisible and not explained, leading me to spend a wasted hour trying to figure out what -RemainingScripts does.

It also leads to unexpected results in specific circumstances.

For example, this DOES NOT WORK as expected:

$Computers | ForEach { “Computer inventory” } -Process { $_.HostName }

Because -Process gets its named script block, the first block goes to -RemainingScripts.  That causes the -Process block to be moved to -Begin, and the block we wanted to go to begin goes to -Process.

So don’t mix named and positional parameters in ForEach.  Name them all or leave them all anonymous.

Slowing the flow; Workflow and ForEach -Parallel -ThrottleLimit

ForEach loops have a couple extra, interesting, very poorly documented parameters when they are used within a workflow.

PowerShell 3.0 introduced PowerShell workflows.  One of the benefits of a workflow is the ability to run processes in parallel, and one of the places you can do so is in a ForEach loop.

Simply add on a -Parallel parameter, and the ForEach doesn’t just run once for each item, it launches additional threads and runs all of the iterations simultaneously.

Workflow Test1
    {
    $Reports = Get-ChildItem D:\Reports
    ForEach -Parallel ( $Report in $Reports)
        {
        Process-Report $Report
        }
    }

This workflow will launch a separate thread for each $Report and process them all simultaneously.

Very efficient.  Until some idiot dumps 3000 reports in there.  That might be a problem.

If you are still on PowerShell 3.0, the problem (and the solution) are “fixed” for you, but not very well.  In PowerShell 3.0 (more accurately, in the version of Window Workflows that PowerShell 3.0 calls), there is a hard coded thread limit of 5.

So in PowerShell 3.0, it doesn’t really run all of them at the same time, it runs them in batches of 5.  If that’s too many or too few, tough.  It’s 5.  And it won’t start on report number 6 until all 5 in the batch are finished.

You can resort to code tricks to adjust the behavior.  If you want just two threads at a time, run two non-parallel ForEach loops in parallel, each with half the load.  To run more than 5 threads, you can run parallel processes within parallel process.

But the better solution is to upgrade to 4.0.  There is no reason not to.  Do it.  Do it now.  If some IT manager won’t let you, politely walk up to them and kick them in the shin.  Do it.  Do it now.

In 4.0 workflows, they opened the floodgates and they let the work flow.  Fortunately, they also gave us a control to slow the flow.  The -ThrottleLimit parameter specifies the maximum number of threads the ForEach loop can use.  They also made it smart enough to put threads to use with a new iteration as soon as it becomes available, instead of waiting for entire batches.

Workflow Test1
    {
    $Reports = Get-ChildItem D:\Reports
    ForEach -Parallel -ThrottleLimit 50 ( $Report in $Reports)
        {
        Process-Report $Report
        }
    }

Twin Cities PowerShell User Group

ForEach ( $Meeting in $SecondTuesdayOfTheMonth )
    {
    If ( $You -in @( "Minneapolis area" ) )
        {
        $Information = Start-Process http://tcposhug.com
        $Please -join $Us
        }
    }

3 comments:

  1. How sure are you this is improved in PowerShell v4?

    I wrote a workflow, and it appears to me that it is limited to 5 threads. The script is for executing a function against all DBs on each SQL Server in a list. I have two nested ForEach -Parallel loops in the workflow. It should execute simultaneously on all SQL Servers, but limit itself to 10 DBs per server at a time. Take this snippet:

    ForEach -Parallel -ThrottleLimit $ServerList.Count ( $Server in $ServerList )
    {
    Write-Verbose "Starting $Server"
    # some logic
    ForEach -Parallel -ThrottleLimit 10 ( $database in $databaseServers )
    {
    Write-Verbose "Starting $database on $Server"
    Start-Sleep -seconds 10
    }
    }

    With the above script, I would get:

    Starting ServerA
    Starting ServerB
    Starting ServerC
    ....
    Starting ServerD

    Then I would Get

    Starting DB1 on Server1
    Starting DB2 on Server1
    Starting DB3 on Server1
    Starting DB4 on Server1
    Starting DB5 on Server1

    Then, my output would pause for the 10 second sleep, and then continue:

    Starting DB6 on Server1
    Starting DB7 on Server1
    ....
    Starting DB10 on Server1

    Then stop.

    This is on v4.0, and it clearly indicates that it is only doing up to 5 threads at a time.

    ReplyDelete
  2. I (somewhat) retract my previous comment.

    It turns out the behavior only occurs if there is a InlineScript{} block in the ForEach. But, if there is, it does limit the execution to 5 threads.

    ReplyDelete
  3. Mr. Safety,

    Your script works for me. I get ten threads per server between each 10 seconds pause.

    Tim Curwick
    MadWithPowerShell

    ReplyDelete