Saturday, March 29, 2014

Stuff you didn’t know about ForEach in PowerShell

ForEach is a very useful tool.  I don’t have many scripts that don’t use it.  I’ve used it for years in hundreds of scripts.  But there is always more to learn.  I recently discovered some new things about it, and found that a lot of the information is not very discoverable and not well known.  So I thought I should share what I know about it.

This article is a mix of basic information, advanced techniques, and esoteric weirdness you’ll never need to know.  So take what’s useful to you and ignore the rest.

The first fun fact about ForEach is that it is actually two different things with the same name.  This was by design, as they intended for the two things to provide related behavior in different circumstances, and this was easier than trying to make one thing do everything.

ForEach #1 is an alias for the cmdlet ForEach-Object.  If you run Get-Help ForEach, you get the help file for ForEach-Object.

ForEach #2 is a command, as opposed to a cmdlet.  (Don’t ask me.  I didn’t come up with these stupid terms.  I normally refuse to use “cmdlet,” but the distinction needs to be made here.)  To get the help file for this one, you have to run Get-Help About_ForEach.

The help file for #2 discusses both of the ForEach’s, but the help file for #1 doesn’t mention #2.  This is unfortunate, as help file #1 is the one you get when you ask for help in the most logical way.  (And some of the information about them doesn’t really appear anywhere.  Hence this article.)

Mostly you don’t have to care about which is which.  It’s like nouns in English.  You generally don’t worry about whether you are using a noun as the subject of a sentence or a direct object, it’s the same word either way.  But once in a while, you have to know the difference between “who” and “whom”.

The ForEach’s are both used for running the same bit of code for each item in a collection.  The one is generally used in the middle of a sentence, and the other at the beginning of a sentence or paragraph.

ForEach #1 (short for ForEach-Object, or long for % ) is normally used within a pipeline.

Well, the syntax is available to put it at the beginning, like this:

ForEach-Object -InputObject $SQLServices -Process { $_.Stop() }

(If you do you this, you can’t use the alias ForEach; you have to spell out ForEach-Object to avoid confusion with the ForEach that normally goes at the beginning of sentences.)

But don’t do that.

Normally you will see it like this:

$SQLServices = Get-Service SQL* -Running
$SQLServices | ForEach { $_.Stop() }

ForEach in this context just means that if there is a bunch of stuff coming down the line, send the items one at a time through the script block (the part between the curly braces).  Within the curly braces, you refer to the item as $_

ForEach #2 goes at the beginning of a ForEach loop.

ForEach ( $Service in $SQLServices ) { $Service.Stop() }

This usage makes it more understandable by giving us a variable name we can use for the item being processed, instead of using $_.  Plus, $_ can be used in several circumstances in PowerShell, and if you nest two of those circumstances, you may not be referencing the $_ you think you are.

ForEach loops are not confined to a single line, and in fact are usually only used for bigger loops.

For example:

ForEach ( $Service in $SQLServices )
    {
    If ( $Service.Status -eq “Running” )
        {
        “Stopping service $($Service.Name).”
        $Service.Stop()
        }
    Else
        {
        “Service $($Service.Name) already not running.”
        }
    }

Note that because this ForEach is NOT short for “ForEach-Object”, if you try to use “ForEach-Object” here, you will break it.

Now let’s get complicated.

Use ForEach loops instead of For loops

There are also things called For loops.  The syntax for For loops is ugly and non-intuitive and ugly.  Fortunately, you almost never need to use them.

For loops are mostly used for going through a loop a certain number of times.

The For loop for doing something ten times looks like this:

For ( $i=0; $i -lt 10 ; $i++ )
    {
    “How about now?”
    Sleep -Seconds 10
    }


Yikes.  If you didn’t know what that did, you would never know what that did.

So don’t do that.  Do this instead:

ForEach ( $i in 1..10 )
    {
    "How about now?"
    Sleep -Seconds 10
    }

If you haven’t seen it before, 1..10 is a wonderful PowerShell syntax which represents an array of the integers from 1 to 10.  Need to count backwards?  Simply use 10..1

You can also use calculated values by putting them in parentheses, like 0..( $Files.Count - 1 )

If you need to go through each item and compare it to the one before, you can’t do that in a pipeline.  But we can reference the previous item in a ForEach loop by iterating through index numbers instead of through the items themselves.

In this example, the index numbers of the items in the $WeatherStats array are 0 through one less than the number of items in the array.  We use ForEach to loop through all of them, with the variable $Day holding the number we are currently on.  We are starting with 1 instead of 0 because we are looking at two-day totals.

“Two day precipitation running totals”
ForEach ( $Day in 1..( $WeatherStats.Count - 1 ) )
    {
    $WeatherStats[$i].Date.ToString() + “`t” + ( $WeatherStats[$i].Rainfall + $WeatherStats[$i-1].Rainfall )
    }

-Begin and -End

The script block in a standard pipeline ForEach is actually just one of three you can have.  The one we normally use by itself, which is processed once for each of the items in the pipeline, is called the -Process script block.  You can also add two others.

The -Begin script block is run just once, before the -Process script block starts processing the pipeline.  The -End script block runs just once, after all of the -Process script block iterations are finished.  $_ is not used because it’s empty within the -Begin and -End script blocks, as they are not processing individual items.

$Computers | ForEach -Begin { “Computer inventory” } -Process { $_.HostName } -End { “Total: “ + $Computers.Count }

It also works to leave out the parameter names.

$Computers | ForEach { “Computer inventory” } { $_.HostName } { “Total: “ + $Computers.Count }

Not that there is much need to ever use these.  Usually it’s much easier to just do that stuff before and after that line, instead of doing it in the middle of the pipeline.  I mostly put this section in because it comes up in the next one.

-RemainingScripts

If you are looking at the help file reading about -Begin and -End, you will see there is a fourth parameter that takes a script block, -RemainingScripts.  You will also see there is no useful explanation for what it does, and no examples that use it.  Googling isn’t very helpful either.  What is it and what can I use it for?

Well, you don’t use it.  It’s not there for you to use intentionally.  It’s there as a programming cheat to compensate for a syntax problem you probably didn’t notice in my example above.

When you leave out parameter names in PowerShell, it makes assumptions about what you meant based on the order of the parameters.

For example, when you say

Add-ADGroupMember $FinanceGroup $PurchasingGroup

PowerShell assumes you want the second group added to the membership of the first group.

And when you say

$Computers | ForEach { $_.HostName }

PowerShell assumes the script block is a -Process script block.

The problem is when you want to say

$Computers | ForEach { “Computer inventory” } { $_.HostName } { “Total: “ + $Computers.Count }

PowerShell again assumes that the first script block is the -Process script block.  But we want the -Begin block to be first, because that makes more sense.  So how do the -Begin and -End parameters know which positional parameters to take?

They don’t.  In fact, they don’t take any of the positional parameters.  -Process takes the first unspecified script block.  -RemainingScripts takes any additional script blocks.  If -RemainingScripts gets a block, some shuffling occurs:  The block -Process picked up gets moved to -Begin.  The first block -RemainingScripts picked up gets moved to -Process.  And if -RemainingScripts got a second block, it moves it to -End.

It’s kind of a neat solution, but it’s a little messy because it’s not completely invisible and not explained, leading me to spend a wasted hour trying to figure out what -RemainingScripts does.

It also leads to unexpected results in specific circumstances.

For example, this DOES NOT WORK as expected:

$Computers | ForEach { “Computer inventory” } -Process { $_.HostName }

Because -Process gets its named script block, the first block goes to -RemainingScripts.  That causes the -Process block to be moved to -Begin, and the block we wanted to go to begin goes to -Process.

So don’t mix named and positional parameters in ForEach.  Name them all or leave them all anonymous.

Slowing the flow; Workflow and ForEach -Parallel -ThrottleLimit

ForEach loops have a couple extra, interesting, very poorly documented parameters when they are used within a workflow.

PowerShell 3.0 introduced PowerShell workflows.  One of the benefits of a workflow is the ability to run processes in parallel, and one of the places you can do so is in a ForEach loop.

Simply add on a -Parallel parameter, and the ForEach doesn’t just run once for each item, it launches additional threads and runs all of the iterations simultaneously.

Workflow Test1
    {
    $Reports = Get-ChildItem D:\Reports
    ForEach -Parallel ( $Report in $Reports)
        {
        Process-Report $Report
        }
    }

This workflow will launch a separate thread for each $Report and process them all simultaneously.

Very efficient.  Until some idiot dumps 3000 reports in there.  That might be a problem.

If you are still on PowerShell 3.0, the problem (and the solution) are “fixed” for you, but not very well.  In PowerShell 3.0 (more accurately, in the version of Window Workflows that PowerShell 3.0 calls), there is a hard coded thread limit of 5.

So in PowerShell 3.0, it doesn’t really run all of them at the same time, it runs them in batches of 5.  If that’s too many or too few, tough.  It’s 5.  And it won’t start on report number 6 until all 5 in the batch are finished.

You can resort to code tricks to adjust the behavior.  If you want just two threads at a time, run two non-parallel ForEach loops in parallel, each with half the load.  To run more than 5 threads, you can run parallel processes within parallel process.

But the better solution is to upgrade to 4.0.  There is no reason not to.  Do it.  Do it now.  If some IT manager won’t let you, politely walk up to them and kick them in the shin.  Do it.  Do it now.

In 4.0 workflows, they opened the floodgates and they let the work flow.  Fortunately, they also gave us a control to slow the flow.  The -ThrottleLimit parameter specifies the maximum number of threads the ForEach loop can use.  They also made it smart enough to put threads to use with a new iteration as soon as it becomes available, instead of waiting for entire batches.

Workflow Test1
    {
    $Reports = Get-ChildItem D:\Reports
    ForEach -Parallel -ThrottleLimit 50 ( $Report in $Reports)
        {
        Process-Report $Report
        }
    }

Twin Cities PowerShell User Group

ForEach ( $Meeting in $SecondTuesdayOfTheMonth )
    {
    If ( $You -in @( "Minneapolis area" ) )
        {
        $Information = Start-Process http://tcposhug.com
        $Please -join $Us
        }
    }

Saturday, March 15, 2014

Ensuring your AD commands run against a local DC in PowerShell

When they finally, finally upgraded our workstations from Windows XP to Windows 7, I immediately installed all of the RSAT tools (Remote Server Administration Tools), including the PowerShell components and started playing with them.

It was nice to have native AD commands instead of having to install a third-party tool to fill the gap (so that we don't have to type the Q anymore, I guess).  But it was slow.  Slow, slow, slow.

But it wasn't the commands, it was my network.  I was working on a new network we were setting up, with slow WAN connections, and DNS had not yet been configured to preferentially return site-based results.  So my queries were sometimes going to domain controllers in Europe or Asia, or even the deep South.

I had to find a PowerShell trick to keep it local.

If we look for a domain controller first, the Get-ADDomainController command has a -Discover switch, which uses the DCLocator service instead of just DNS, and it's more intelligent about finding the closest DC.  I can't guarantee it will work in any environment, but for me it reliably returned a local DC.

So, query for a local DC, and then specify the DC in your subsequent AD queries.  As we had multiple domains, I also specified which one we wanted.

$DC = ( Get-ADDomainController -Discover -Domain Contoso.local ).HostName[0]
Get-ADUser -Filter { Name -like "Tim*" } -Server $DC

This trick came in handy recently to simplify a different problem.

I have been working with Microsoft Service Management Automation, which is PowerShell workflow based.  There are many interesting challenges to scripting in that environment.  One is that different parts of your script might run in different workflow or PowerShell instances (sometimes even on different “runbook worker” servers).  That means different AD commands may run against different domain controllers, which adds the complication of AD replication latency.  The simple solution is to use the command above to pick a single domain controller to run all of your related commands against.

The purpose of scripting: Using why we script to inform how we script in PowerShell

To properly understand how to write a given script, we have to understand why we are writing it.

That's easy, you say.  I'm writing this script to collect data about my servers.  Or to perform daily maintenance tasks.  Or to build new servers.  Or to automate adding new users to our systems.

But those answers are incomplete.  They don't address the big picture.

I'm writing this script to collect data about my servers so that I don't have to do it manually.  Or to perform daily maintenance tasks more reliably and at a time of day when my staff is asleep.  Or to build new servers quickly, consistently, and cheaply.  Or to automate adding new users to our systems directly empowering my HR staff and freeing up IT support resources for more challenging tasks.

That can be generalized into a standard definition of why we do what we do.

The purpose of scripting is to assist in performing some function by optimizing the resources of the system in which the function is performed.

That sentence is a little dense, but it is important to let it all sink in, because this what we all do professionally, and it is what everything else here flows from.  So let me say that again a little slower.

The purpose of scripting
is to assist
in performing some function
by optimizing the resources
of the system in which the function is performed.

This allows us to analyze our purpose, and to develop a way of thinking about our scripting that will help us make better, easier decisions about our scripts.

The key to understanding what that means and the implications it has on us and our scripts, is the fact that the system is not just the box.  It isn't just the box on our desk or the big box of a datacenter with lots of smaller boxes within.  The system includes the people.  It includes the people for whom the function is performed.  It includes the people responsible for performing the function.  And it includes the people who create and maintain the script.  It includes us.

So when I sometimes say that scripters are lazy, that is just playful shorthand for saying we optimize the resources of the system.

Software developers do this as well, but they balance the requirements differently than scripters, resulting in different design decisions and coding styles.

Here are the four and a half guiding principles supporting the purpose of scripting.

1. The script needs to reliably perform the desired function.
2. The script needs to be easily writeable.
3. The script needs to be easily readable, understandable, and maintainable.
4. The script needs to optimize the resources it uses to run.

These are the requirements you need to balance when designing and writing your script.  You can't do them all perfectly.  You focus on just one at your peril.  You can't balance them the same way on every script; each script is different, each function is different, and each system is different.

But once you understand the principles, and understand that they each only exist to serve the purpose, the reason we are scripting in the first place, it becomes much easier to make the many decisions and trade-offs we make with every script.

1. The script needs to perform the desired function.

The script need to work.  It needs to work well.  It need to work reliably.  That's obvious.

But how well, how thoroughly, and how reliably?  That's different for every script, as the balance with the other requirements is different in every script.

If a script is going to run unattended in the middle of the night on a customer-facing production system, it needs to be well-tested and have lots of built-in error handling.  But if I'm sitting at my desk watching it run, with time to tweak it and run it again if it fails, it doesn't need to be as rigorous.

How much time I spend making the output look pretty varies greatly between output that gets automatically emailed to the CEO and output that I'm personally reading off the screen.

In some cases, the script does not need to perform the entire function.  For an ad hoc report, it might be faster to just have the script give me the raw data, and I can pretty it up more effectively and efficiently in Excel.

2. The script need to be easily writeable.

The whole point of a script is to automate something, to make it less work for me.  If writing the script is a lot of work, that defeats the whole purpose of the script.

Jeffrey Snover and his team and their successors did a wonderful job creating PowerShell with scripters in mind, making it powerful but easy to use, comprehensive but discoverable.  Paradoxically, one of the ways they made it easier for us, was by making it possible to do things the hard way.  This allows us on the ground to decide when and where it is appropriate and necessary to put in the extra work or extra complexity, and where we can do things more simply.

Commands and snippets that are easy to remember, easy to type, forgiving of variations of syntax or runtime environment, and easy to troubleshoot are preferable to those that are not.  Readability is not just about the future reader of the script (as in the next section).  It's also about working with your script while you are writing it.  Effective use of white space can help considerably when you are looking for an errant parenthesis.

Using aliases on the PowerShell command line can be a wonderful time saver.  Writing elegant, dense one-liners is uniquely satisfying.   But when writing scripts, we save more time by not using most aliases, by writing out the commands in more human-friendly language, by adding white space and line continuations that make it easier to work on the script.

Always using the right variable types makes your scripts go faster.  But always using a variable type that is easy to remember, easy to type, and easy to read, makes me go faster.  Which is more valuable and which is the better resource to expend on this project:  200 milliseconds of CPU at runtime or 20 minutes of my time?

In some cases, it is better to spend the 20 minutes shaving 200 milliseconds off of a subroutine.  For software developers, this is usually the case.  For scripters, this is almost never the case.

3. The script needs to be easily readable, understandable, and maintainable.

The system doesn't just include the people writing and using the script.  It also includes the person that has to pull up the script next week or next year and figure out what it does or why it doesn't work anymore or how to adapt it to new requirements or how to borrow chunks of it to adapt for other scripts.

Everything you do while writing your script needs to balance the needs of whoever might someday read or modify your script.

If you are working on the command line or working on an ad hoc script that is going to be deleted as soon as you are done with it, feel free to write it in whatever way is easiest for you to write.  Though I think you will find it easier in the long run to use a single writing style for everything.  My only concessions to these are fewer comments, less consistent capitalization, and simpler variable names.

If you are writing a script that will be read again--even if it's just by you picking up the next day where you left off--you need to write for the reader.

Use comments.  A competent, experienced scripter can look at line of code and figure out what it does.  But that takes time and effort.  It takes less time and effort to add that information to the script while you are writing it.  What is the script assuming about the environment it is running in or its inputs or how it will be used or what the results will be used for?  Add comments explaining all of that.  You will save hours or days of troubleshooting time when the assumptions are no longer valid next year.

Use white space.  Vertical and horizontal.  Use line continuations to break up dense commands.  Make it easy to see the sub clauses of complex statements, to see what goes together in chunks, to see the hierarchy of your code.

Use human-style words and syntax wherever possible.  Use variable names that not only describe their use well, but also help make complete sentences in your commands.  Spell out commands and parameter names instead of using aliases, except for those rare cases where the alias is easier to read and understand than the full command.

Given a choice and all else being equal, use the command or syntax that is more intuitive to the person modifying it later.  You already have a comment stating what you are doing; if it isn't obvious how you are accomplishing that, explain it in the comment.  If you made an unusual choice for a non-obvious reason, explain it in a comment, so that there are not unintended consequences later when someone "fixes" it.  If the need for a section of code or chosen method will become obsolete in the future and can or should be changed when circumstances change, put it in a comment.

If your comments are so numerous or wordy or redundant that they start to interfere with reading and understanding the script, cut them back.  Everything in balance.

(That's the one I'm counting as one and a half if you're wondering.  The three needs covered are too small and overlapping to be discussed or counted separately, but together they are too big to be just one.)

4. The script needs to optimize the resources it uses to run.

Don't use up all of the resources on the computer.  Don't use too much memory, disk or CPU.  Use what you need from what's available, but don't be a pig about it.  Don't use so much resources that you are interfering with other things happening on the server or virtual host or SAN or network.  Rarely is your script the most important thing happening in the system.

But don't be shy about using the resources you have to do the job you need to do.  Moore's Law is forever giving us more power.  Use it.  The whole point of the script is to use those resources instead of more expensive resources.

The server is a resource.  The user is a resource.  You are a resource.  The future is a resource.  Write your script in such a way that you are optimizing your function across all of the resources of the system.

I'll talk about specific examples and scripting styles and decisions in future articles.