Thursday, August 21, 2014

Exchange Health Manager bug

You hopefully know that Exchange Health Manager monitor the Exchange server by sending Probes. The result is picked up by Monitors which decide if a Responder needs to do something about the problem.

The other day I found that a HealthSet on the server was UnHealthy. This was found by running this commands.

Get-HealthReport -Server ex1 | where {$_.AlertValue -eq 'UnHealthy'} | ft –AutoSize

Result showed that “OutlookMapiHttp” HealthSet was to blame.
You can trigger a probe to execute at any time, first we need to know what the probe name is.

Get-MonitoringItemIdentity -Server ex1 -Identity OutlookMapiHttp | ft Identity, ItemType, Name

Result show the probe name for the OutlookMapiHttp HealthSet, “OutlookMapiHttpCtpProbe”.

Since we are dealing with outlook connectivity problem here we should use the Test-OutlookConnectivity cmdlet to trigger the probe to run.

Test-OutlookConnectivity -RunFromServerId ex1 -ProbeIdentity OutlookMapiHttpCtpProbe

Another way of trigger the probe would be
Invoke-MonitoringProbe -Server ex1 -Identity OutlookMapiHttp\OutlookMapiHttpCtpProbe

Both cmdlets took a about 90 seconds before they come back with a result saying “Failed to find proberesult”
Could this be a performance issue? This server showed no peformance related problem, it is actually a oversized box.
I fired up event viewer and looked in different eventlogs under Microsoft\Exchange\ManagedAvailability and Microsoft\Exchange\ActiveMonitoring. Events here show probe definitions and firing, monitors reading probe results and what responders do and the result of responders as well. The best way of reading this info is to switch from General to Details and filter on Errors.

I could find snippet of text “The remote server returned an error (404). Not Found and a reference to a Health Mailbox.
This was strange. monitoring mailboxes is created on demand when the Health Manager service do its thing, so I trigger a restart of it but no difference. waited several hours and did a reboot but no difference even when waiting until the next day.
Discussed this with en engineer at Microsoft and he eventually suggested to change the time zone on the server from the current GMT+1 to a time zone used in America. Did that and nothing happened. In desperation I rebooted the server again and bang, now the “OutlookMapiHttp” healthset showed Healthy.
I played around with time zone and restart of Exchange Health Management service and could make the problem come and go as I changed time zone.

I haven’t tried all time zones but it looks like if you’re using GMT+something this bug will surface.
f you don’t want to change time zone on server you can create on monitoring overide.

Add-ServerMonitoringOverride -Identity OutlookMapiHttp\OutlookMapiHttpCtpMonitor -ItemType monitor -PropertyName Enabled -PropertyValue 0 -Server ex1 -ApplyVersion 15.0.913.22

In a few minutes you would see your healthset go healthy. together with the OutlookMapiHttp healthset there is also another healthset that also shows unhealthy “OutlookMapiHttp.Proxy” and is subjct to same bug with time zone. to make an override for it use.

Add-ServerMonitoringOverride -Identity OutlookMapiHttp.Proxy\OutlookMapiHttpProxyTestMonitor\MSExchangeMapiFrontEndAppPool -ItemType monitor -PropertyName Enabled -PropertyValue 0 -Server ex1 -ApplyVersion 15.0.913.22

We can only hope that this is fixed in the next CU of Exchange 2013.