Saturday, September 16, 2017

It’s always 5 o’clock somewhere: Using .Net and PowerShell’s Extended Type System to find out where

Last Friday, as work-related conversation was slowing down on the PowerShell Slack, someone wondered aloud if it was too early to start drinking. (I assume they were talking about tea.) Someone else answered with the cliché, “It’s always 5 o’clock somewhere,” but these being PowerShell geeks, instead of saying it literally, they said so with a PowerShell script.

That first draft had some bugs and inefficiencies, so I spent too much time tweaking and refining and optimizing it. As Tom Scott can tell you, perfect accuracy when working with time zones and daylight saving time was historically almost impossible in a full application, much less a one-liner, but we have the advantage of being able to rely on functionality built into Windows and .Net.

The goal is to display the areas where it is currently between 5 PM and 6 PM.

The .Net class [System.TimeZoneInfo] has a static method called ::GetSystemTimeZones(). This method queries the registry method of the local computer. It doesn’t just give us a list of the 24+ time zones in the world, it gives us a list of the (as of today) 134 regions with unique time zone offset and daylight saving time rules. As Microsoft has OS’s in each of the 134 regions, they regularly update the list whenever local TZ or DST laws change, keeping it a very accurate information source.

[System.TimeZoneInfo]::GetSystemTimeZones()

But working with the results wasn’t as simple as I thought it should be. A [TimeZoneInfo] object can be used to calculate the current time in a particular time zone, but shouldn’t it be able to do the calculation itself without us having to all the work?

A [TimeZoneInfo] object tells us both the standard name and daylight saving time name for the time zone, and it can be used to calculate whether it is currently in daylight saving time, but, again, shouldn’t it be able to tell us which name is the current name without us having to do the calculation?

We can use PowerShell to add those calculations to the [TimeZoneInfo] class.

When the PowerShell team was first designing PowerShell on top of .Net, .Net was new and still kind of sucked. So they created the Extended Type System, which allowed them to enhance .Net objects. And it allows us to enhance .Net objects. Cool.

So let’s enhance [TimeZoneInfo].

The [datetime] class has a static property ::UTCNow which gives us the current UTC datetime. The [TimeZoneInfo] class has a static method ::ConvertTimeBySystemTimeZoneId() that can convert a datetime to the local time in a given time zone. So let’s define a new script property in [TimeZoneInfo] called .Time which gets the current UTC time and converts it to the local time as defined in the [TimeZoneInfo] object.

We do that using Update-TypeData. The parameters we’re using below are fairly self explanatory. Within a class definition, $This is how we tell an object to look at itself.

Update-TypeData `
    -TypeName   System.TimeZoneInfo `
    -MemberName Time `
    -MemberType ScriptProperty `
    -Value { [System.TimeZoneInfo]::ConvertTimeBySystemTimeZoneId( [System.DateTime]::UtcNow, $This.Id ) }  `
    -Force

Now let’s define another script property called .Name to give us the appropriate current choice between the value of the .StandardName and the .DaylightName properties.

To do that, we first build an array with the two options, with the .StandardName in position 0 and the .DaylightName in position 1. Then we use the .IsDaylightSavingTime() method of the [TimeZoneInfo] object to tell us if the value in .Time (which we defined above) is in daylight saving time, giving us a $True or $False, which will be dynamically converted to a 1 or a 0, which we use to index into the array and give us the correct option.

Update-TypeData `
    -TypeName System.TimeZoneInfo `
    -MemberName Name `
    -MemberType ScriptProperty `
    -Value { ( $This.StandardName, $This.DaylightName )[($This.IsDaylightSavingTime( $This.Time ))] } `
    -Force

Now when we call ::GetSystemTimeZones(), it gives us all of the information we need, with no further calculation necessary. So we simply filter on those time zones where the hour is 17 (time between 5 PM and 6 PM), and select the desired properties to display.

[System.TimeZoneInfo]::GetSystemTimeZones() |
    Where-Object { $_.Time.Hour -eq 17 } |
    Select-Object -Property Name, DisplayName, Time

Here it is all together.

Update-TypeData `
    -TypeName   System.TimeZoneInfo `
    -MemberName Time `
    -MemberType ScriptProperty `
    -Value { [System.TimeZoneInfo]::ConvertTimeBySystemTimeZoneId( [System.DateTime]::UtcNow, $This.Id ) } `
    -Force

Update-TypeData `
    -TypeName System.TimeZoneInfo `
    -MemberName Name `
    -MemberType ScriptProperty `
    -Value { ( $This.StandardName, $This.DaylightName )[($This.IsDaylightSavingTime( $This.Time ))] } `
    -Force

[System.TimeZoneInfo]::GetSystemTimeZones() |
    Where-Object { $_.Time.Hour -eq 17 } |
    Select-Object -Property Name, DisplayName, Time

And here are what the results look like when run at 11:01 AM in September in Minnesota in the US.

Name                            DisplayName                                   Time
----                            -----------                                   ----
Morocco Daylight Time           (UTC+00:00) Casablanca                        9/16/2017 5:01:40 PM
GMT Daylight Time               (UTC+00:00) Dublin, Edinburgh, Lisbon, London 9/16/2017 5:01:40 PM
W. Central Africa Standard Time (UTC+01:00) West Central Africa               9/16/2017 5:01:40 PM

Friday, September 1, 2017

Remove comments and whitespace from PowerShell scripts

A PowerShell.Slack.com user asked if it was possible to easily remove the comments and whitespace from a scriptblock to reduce the size. I was intrigued by the challenge, and came up with this function.

I also immediately put it to use. I manage a PowerShell GUI application that I wrap in an executable using Sapien PowerShell Studio. Comments and whitespace don’t server any function in the final wrapped package, so my build script now runs this function on the code before running the Sapien build. The resulting executable is 31% smaller than it used to be.

If you use this function, test the results thoroughly. I am reasonably sure that this will work fine with most code, but I can’t guarantee that it won’t break your code. This will break any comment-based help if it uses multiple #comments instead of <#multiline comments#>, as only the section headings will be recognized and left untouched.


I realized we can use the Tokenize function of the PowerShell parser to split up the scriptblock into identified chunks. This also effectively strips out all horizontal whitespace, as the parser just ignores it. Then we can take all of the tokens that are not functional comments, and put them back together again into a leaner scriptblock. It’s slightly more complicated than that, but not much.

The ::Tokenize() method can work with a scriptblock, a string, or an array of strings, so we will similarly accept pretty much anything as input for our function. It would be convenient to be able to use the function in a pipeline, so let’s turn that on.

function Remove-CommentsAndWhiteSpace
    {
    # We are not restricting scriptblock type as Tokenize() can take several types
    Param (
        [parameter( ValueFromPipeline = $True )]
        $Scriptblock
        )

We want to accept pipeline input, but we need to process the script as a whole, not as individual lines. So we just use the Process block to collect all of the input in a single collection.

    Begin
        {
        # Intialize collection
        $Items = @()
        }

    Process
        {
        # Collect all of the inputs together
        $Items += $Scriptblock
        }

    End
        {
        ## Process the script as a single unit

And then despite ::Tokenize()’ ability to handle almost anything, we’re going to turn whatever comes in into a single string anyway, so that we can come back to it later to grab parts of it. We use a new variable leaving the input variable untouched, so we can later base our output type on the input type. The -join operator forces $Scriptblock to convert to a string or array of strings if needed, and then concatenates them with interspersed lines breaks if needed.

        # Convert input to a single string if needed
        $OldScript = $Items -join [environment]::NewLine

If the input is just white space, there is nothing to do.

        # If no work to do
        # We're done
        If ( -not $OldScript.Trim( " `n`r`t" ) ) { return }

We use the ::Tokenize() to parse the script and turn it into “tokens”, identified as commands, comments, strings, variables, etc. The method requires a reference variable for dumping parsing errors. We don’t need those, so we give it the odd construction [ref]$Null to tell it to send those nowhere.

        # Use the PowerShell tokenizer to break the script into identified tokens
        $Tokens = [System.Management.Automation.PSParser]::Tokenize( $OldScript, [ref]$Null )

The resulting $Tokens do not contain any horizontal whitespace, as the parser just ignored them.
We don’t want any comments, so we strip those out. But not quite all of them. Comment-based help and #requires statements need to stay in to keep that functionality. We’ll identify comments to keep by looking at the first word in the comment, so we define a list of words that identify allowed comments.

        # Define useful, allowed comments
        $AllowedComments = @(
            'requires'
            '.SYNOPSIS'
            '.DESCRIPTION'
            '.PARAMETER'
            '.EXAMPLE'
            '.INPUTS'
            '.OUTPUTS'
            '.NOTES'
            '.LINK'
            '.COMPONENT'
            '.ROLE'
            '.FUNCTIONALITY'
            '.FORWARDHELPCATEGORY'
            '.REMOTEHELPRUNSPACE'
            '.EXTERNALHELP' )

If a token is not a comment, we pass it through to keep. If a token is a comment, we parse the .Content, again leveraging the smarts of ::Tokenize(), to find the first word in the comment. If it’s in the allowed list, we pass it through to keep.

        # Strip out the Comments, but not useful comments
        # (Bug: This will break comment-based help that uses leading # instead of multiline <#,
        # because only the headings will be left behind.)

        $Tokens = $Tokens.ForEach{
            If ( $_.Type -ne 'Comment' )
                {
                $_
                }
            Else
                {
                $CommentText = $_.Content.Substring( $_.Content.IndexOf( '#' ) + 1 )
                $FirstInnerToken = [System.Management.Automation.PSParser]::Tokenize( $CommentText, [ref]$Null ) |
                    Where-Object { $_.Type -ne 'NewLine' } |
                    Select-Object -First 1
                If ( $FirstInnerToken.Content -in $AllowedComments )
                    {
                    $_
                    }
                } }

Our new version of the script starts as an empty string.

        # Initialize script string
        $NewScriptText = ''
        $SkipNext = $False

Then we loop through each token except for the last one. We are looping through index numbers instead of the tokens themselves so that we can more easily reference the following token when making decisions. We’ll save the last token for later, as it won’t have a following token to reference, and it is most efficient to handle it separately.

        # If there are at least 2 tokens to process...
        If ( $Tokens.Count -gt 1 )
            {
            # For each token (except the last one)...
            ForEach ( $i in ( 0..($Tokens.Count-2) ) )
                {

If we decided on the previous loop that we should skip this token and not include it in the script, we do so. If the token is a line continuation, we are going to skip it and not include it in the new script. If this token is a new line or a semicolon and the following token is a new line or a semicolon or a close parenthesis or a close curly brace, we are going to skip this one as redundant.

                # If token is not a line continuation and not a repeated new line or semicolon...
                If (    -not $SkipNext -and
                        $Tokens[$i  ].Type -ne 'LineContinuation' -and (
                        $Tokens[$i  ].Type -notin ( 'NewLine', 'StatementSeparator' ) -or
                        $Tokens[$i+1].Type -notin ( 'NewLine', 'StatementSeparator', 'GroupEnd' ) ) )
                    {

Then we add the token to the new script. For most tokens, we just use the .Content of the $Token object, but for variables and strings, we go back to the old script and pull them out of there. The token content does not include $ for variables, because the $ is just an indicator that what follows is a variable name, not part of the variable name itself. And the token content does not include the quotes for strings for a similar reason. For variables, we could simply put the $ back in manually, but for strings we don’t know whether to use single quotes, double quotes, and/or here-string quotes, so we just grab the original and don’t have to think about it.

                    # Add Token to new script
                    # For string and variable, reference old script to include $ and quotes
                    If ( $Tokens[$i].Type -in ( 'String', 'Variable' ) )
                        {
                        $NewScriptText += $OldScript.Substring( $Tokens[$i].Start, $Tokens[$i].Length )
                        }
                    Else
                        {
                        $NewScriptText += $Tokens[$i].Content
                        }

And then we have to do some serious thinking about what to put between this token and the next one. Some code will break if you add a space. (
$X.Name -> $X .Name
 Other code will break if you take a space out. (
Get-Item -Path -> Get-Item-Path
) So we look at the original and see if there was white space (or comments) between them before. If so, we put in a single space.
…Unless we are before or after a NewLine or a semicolon, or inside of and next to a parenthesis, or curly brace, in which case a space is not needed, and we leave it out.

                    # If the token does not never require a trailing space
                    # And the next token does not never require a leading space
                    # And this token and the next are on the same line
                    # And this token and the next had white space between them in the original...
                    If (    $Tokens[$i  ].Type -notin ( 'NewLine', 'GroupStart', 'StatementSeparator' ) -and
                            $Tokens[$i+1].Type -notin ( 'NewLine', 'GroupEnd', 'StatementSeparator' ) -and
                            $Tokens[$i].EndLine -eq $Tokens[$i+1].StartLine -and
                            $Tokens[$i+1].StartColumn - $Tokens[$i].EndColumn -gt 0 )
                        {
                        # Add a space to new script
                        $NewScriptText += ' '
                        }

We check to see if the next token should be skipped based on this token. Specifically, if this token is an open parenthesis or an open curly brace and the next token is a new line or a semicolon, we skip the next token. Or if the current token was skipped for the same reason, we check if the next token should also be skipped.

                    # If the next token is a new line or semicolon following
                    # an open parenthesis or curly brace, skip it
                    $SkipNext = $Tokens[$i].Type -eq 'GroupStart' -and $Tokens[$i+1].Type -in ( 'NewLine', 'StatementSeparator' )
                    }

                # Else (Token is a line continuation or a repeated new line or semicolon)...
                Else
                    {
                    # [Do not include it in the new script]

                    # If the next token is a new line or semicolon following
                    # an open parenthesis or curly brace, skip it
                    $SkipNext = $SkipNext -and $Tokens[$i+1].Type -in ( 'NewLine', 'StatementSeparator' )
                    }
                }
            }

Add the last token to the new script, again referencing the old script for a variable or string.

        # If there is a last token to process...
        If ( $Tokens )
            {
            # Add last token to new script
            # For string and variable, reference old script to include $ and quotes
            If ( $Tokens[$i].Type -in ( 'String', 'Variable' ) )
                {
                $NewScriptText += $OldScript.Substring( $Tokens[-1].Start, $Tokens[-1].Length )
                }
            Else
                {
                $NewScriptText += $Tokens[-1].Content
                }
            }

If we ended up with a NewLine or StatementSeparator at the beginning, trim it off. (If we ended up with one at the end, we’ll leave it in as best practice.)

        # Trim any leading new lines from the new script
        $NewScriptText = $NewScriptText.TrimStart( "`n`r;" )

And then we return the result in the same format as it came in.
If it came back as a scriptblock, convert the new script string to a scriptblock and return.

        # Return the new script as the same type as the input
        If ( $Items.Count -eq 1 )
            {
            If ( $Items[0] -is [scriptblock] )
                {
                # Return single scriptblock
                return [scriptblock]::Create( $NewScriptText )
                }

If it came in as a single string (or something we converted to a string), return a single string.

            Else
                {
                # Return single string
                return $NewScriptText
                }
            }

Otherwise, it was an array of strings (or an array of things we converted to strings). Split it at the line breaks and return.

        Else
            {
            # Return array of strings
            return $NewScriptText.Split( "`n`r", [System.StringSplitOptions]::RemoveEmptyEntries )
            }
        }
    }


Full function

Here it is all together.

function Remove-CommentsAndWhiteSpace
    {
    # We are not restricting scriptblock type as Tokenize() can take several types
    Param (
        [parameter( ValueFromPipeline = $True )]
        $Scriptblock
        )

    Begin
        {
        # Intialize collection
        $Items = @()
        }

    Process
        {
        # Collect all of the inputs together
        $Items += $Scriptblock
        }

    End
        {
        ## Process the script as a single unit

        # Convert input to a single string if needed
        $OldScript = $Items -join [environment]::NewLine

        # If no work to do
        # We're done
        If ( -not $OldScript.Trim( " `n`r`t" ) ) { return }

        # Use the PowerShell tokenizer to break the script into identified tokens
        $Tokens = [System.Management.Automation.PSParser]::Tokenize( $OldScript, [ref]$Null )

        # Define useful, allowed comments
        $AllowedComments = @(
            'requires'
            '.SYNOPSIS'
            '.DESCRIPTION'
            '.PARAMETER'
            '.EXAMPLE'
            '.INPUTS'
            '.OUTPUTS'
            '.NOTES'
            '.LINK'
            '.COMPONENT'
            '.ROLE'
            '.FUNCTIONALITY'
            '.FORWARDHELPCATEGORY'
            '.REMOTEHELPRUNSPACE'
            '.EXTERNALHELP' )

        # Strip out the Comments, but not useful comments
        # (Bug: This will break comment-based help that uses leading # instead of multiline <#,
        # because only the headings will be left behind.)

        $Tokens = $Tokens.ForEach{
            If ( $_.Type -ne 'Comment' )
                {
                $_
                }
            Else
                {
                $CommentText = $_.Content.Substring( $_.Content.IndexOf( '#' ) + 1 )
                $FirstInnerToken = [System.Management.Automation.PSParser]::Tokenize( $CommentText, [ref]$Null ) |
                    Where-Object { $_.Type -ne 'NewLine' } |
                    Select-Object -First 1
                If ( $FirstInnerToken.Content -in $AllowedComments )
                    {
                    $_
                    }
                } }

        # Initialize script string
        $NewScriptText = ''
        $SkipNext = $False

        # If there are at least 2 tokens to process...
        If ( $Tokens.Count -gt 1 )
            {
            # For each token (except the last one)...
            ForEach ( $i in ( 0..($Tokens.Count-2) ) )
                {
                # If token is not a line continuation and not a repeated new line or semicolon...
                If (    -not $SkipNext -and
                        $Tokens[$i  ].Type -ne 'LineContinuation' -and (
                        $Tokens[$i  ].Type -notin ( 'NewLine', 'StatementSeparator' ) -or
                        $Tokens[$i+1].Type -notin ( 'NewLine', 'StatementSeparator', 'GroupEnd' ) ) )
                    {
                    # Add Token to new script
                    # For string and variable, reference old script to include $ and quotes
                    If ( $Tokens[$i].Type -in ( 'String', 'Variable' ) )
                        {
                        $NewScriptText += $OldScript.Substring( $Tokens[$i].Start, $Tokens[$i].Length )
                        }
                    Else
                        {
                        $NewScriptText += $Tokens[$i].Content
                        }

                    # If the token does not never require a trailing space
                    # And the next token does not never require a leading space
                    # And this token and the next are on the same line
                    # And this token and the next had white space between them in the original...
                    If (    $Tokens[$i  ].Type -notin ( 'NewLine', 'GroupStart', 'StatementSeparator' ) -and
                            $Tokens[$i+1].Type -notin ( 'NewLine', 'GroupEnd', 'StatementSeparator' ) -and
                            $Tokens[$i].EndLine -eq $Tokens[$i+1].StartLine -and
                            $Tokens[$i+1].StartColumn - $Tokens[$i].EndColumn -gt 0 )
                        {
                        # Add a space to new script
                        $NewScriptText += ' '
                        }

                    # If the next token is a new line or semicolon following
                    # an open parenthesis or curly brace, skip it
                    $SkipNext = $Tokens[$i].Type -eq 'GroupStart' -and $Tokens[$i+1].Type -in ( 'NewLine', 'StatementSeparator' )
                    }

                # Else (Token is a line continuation or a repeated new line or semicolon)...
                Else
                    {
                    # [Do not include it in the new script]

                    # If the next token is a new line or semicolon following
                    # an open parenthesis or curly brace, skip it
                    $SkipNext = $SkipNext -and $Tokens[$i+1].Type -in ( 'NewLine', 'StatementSeparator' )
                    }
                }
            }

        # If there is a last token to process...
        If ( $Tokens )
            {
            # Add last token to new script
            # For string and variable, reference old script to include $ and quotes
            If ( $Tokens[$i].Type -in ( 'String', 'Variable' ) )
                {
                $NewScriptText += $OldScript.Substring( $Tokens[-1].Start, $Tokens[-1].Length )
                }
            Else
                {
                $NewScriptText += $Tokens[-1].Content
                }
            }

        # Trim any leading new lines from the new script
        $NewScriptText = $NewScriptText.TrimStart( "`n`r;" )

        # Return the new script as the same type as the input
        If ( $Items.Count -eq 1 )
            {
            If ( $Items[0] -is [scriptblock] )
                {
                # Return single scriptblock
                return [scriptblock]::Create( $NewScriptText )
                }
            Else
                {
                # Return single string
                return $NewScriptText
                }
            }
        Else
            {
            # Return array of strings
            return $NewScriptText.Split( "`n`r", [System.StringSplitOptions]::RemoveEmptyEntries )
            }
        }
    }


Output

And as example output, here is the result of running it against itself.

function Remove-CommentsAndWhiteSpace
{Param ([parameter(ValueFromPipeline = $True)]
$Scriptblock)
Begin
{$Items = @()}
Process
{$Items += $Scriptblock}
End
{$OldScript = $Items -join [environment]::NewLine
If (-not $OldScript.Trim(" `n`r`t")) {return}
$Tokens = [System.Management.Automation.PSParser]::Tokenize($OldScript, [ref]$Null)
$AllowedComments = @('requires'
'.SYNOPSIS'
'.DESCRIPTION'
'.PARAMETER'
'.EXAMPLE'
'.INPUTS'
'.OUTPUTS'
'.NOTES'
'.LINK'
'.COMPONENT'
'.ROLE'
'.FUNCTIONALITY'
'.FORWARDHELPCATEGORY'
'.REMOTEHELPRUNSPACE'
'.EXTERNALHELP')
$Tokens = $Tokens.ForEach{If ($_.Type -ne 'Comment')
{$_}
Else
{$CommentText = $_.Content.Substring($_.Content.IndexOf('#'+ 1)
$FirstInnerToken = [System.Management.Automation.PSParser]::Tokenize($CommentText, [ref]$Null|
Where-Object {$_.Type -ne 'NewLine'|
Select-Object -First 1
If ($FirstInnerToken.Content -in $AllowedComments)
{$_}}}
$NewScriptText = ''
$SkipNext = $False
If ($Tokens.Count -gt 1)
{ForEach ($i in (0..($Tokens.Count-2)))
{If (-not $SkipNext -and
$Tokens[$i ].Type -ne 'LineContinuation' -and ($Tokens[$i ].Type -notin ('NewLine', 'StatementSeparator'-or
$Tokens[$i+1].Type -notin ('NewLine', 'StatementSeparator', 'GroupEnd')))
{If ($Tokens[$i].Type -in ('String', 'Variable'))
{$NewScriptText += $OldScript.Substring($Tokens[$i].Start, $Tokens[$i].Length)}
Else
{$NewScriptText += $Tokens[$i].Content}
If ($Tokens[$i ].Type -notin ('NewLine', 'GroupStart', 'StatementSeparator'-and
$Tokens[$i+1].Type -notin ('NewLine', 'GroupEnd', 'StatementSeparator'-and
$Tokens[$i].EndLine -eq $Tokens[$i+1].StartLine -and
$Tokens[$i+1].StartColumn - $Tokens[$i].EndColumn -gt 0)
{$NewScriptText += ' '}
$SkipNext = $Tokens[$i].Type -eq 'GroupStart' -and $Tokens[$i+1].Type -in ('NewLine', 'StatementSeparator')}
Else
{$SkipNext = $SkipNext -and $Tokens[$i+1].Type -in ('NewLine', 'StatementSeparator')}}}
If ($Tokens)
{If ($Tokens[$i].Type -in ('String', 'Variable'))
{$NewScriptText += $OldScript.Substring($Tokens[-1].Start, $Tokens[-1].Length)}
Else
{$NewScriptText += $Tokens[-1].Content}}
$NewScriptText = $NewScriptText.TrimStart("`n`r;")
If ($Items.Count -eq 1)
{If ($Items[0] -is [scriptblock])
{return [scriptblock]::Create($NewScriptText)}
Else
{return $NewScriptText}}
Else
{return $NewScriptText.Split("`n`r", [System.StringSplitOptions]::RemoveEmptyEntries)}}}