vRealize Operations Certificate Installation Error

vRealize Operations Certificate Installation Error

There is a good bit of information available for installing certificates for vROps 6.x. I encountered an issue that required a VMware Support case and uncovered some changes in vRealize Operations Manager 6.3 and 6.4.

My environment was using an internally signed SHA1 (128-bit) certificate. This was being updated to a 256-bit certificate.

The process to generate a certificate for vROps is more complex than typical, and is poorly documented by VMware. There are some articles with better information on how to create and install vROps certificates:
http://www.kanap.net/2015/02/install-a-custom-certificate-for-vrealize-operations-manager/
http://virtuallyeuc.com/vrealize-operations-manager-6-0-certificate-creation/

After carefully following the process to create the CSR, having the certificate signed, and building the .pem file, vROps refused to accept the new cert:

Operation failed. If the error persists, contact VMware support.

Review casa.log [/var/log/casa_logs/casa.log on your master vROps node] and look for something similar to:

2016-12-06 17:45:44,102 [x2000UIE] [ajp-nio-127.0.0.1-8011-exec-7] INFO support.subprocess.GeneralCommand:255 - Command '/usr/lib/vmware-vcopssuite/python/bin/python /usr/lib/vmware-casa/bin/vropsCertificateTool.py -i /storage/db/tmp/uploaded_cert.tmp --no_describe --json --level NONE' threw exception: CommandLineExitException: key=general.failure; args=1,; cause=
2016-12-06 17:45:44,103 [x2000UIE] [ajp-nio-127.0.0.1-8011-exec-7] INFO casa.security.SecurityService:946 - validateCertificateFile script's STDOUT:
{"errors": [], "exitCode": 0, "warnings": [{"warning_args": "/C=US/ST=State/L=City/O=Org/OU=Domain/CN=CA", "warning_code": "WARN-2", "warning_message": "Issuer and subject names match (/C=US/ST=State/L=City/O=Org/OU=Domain/CN=CA), but the issuer key does not. This is likely the wrong issuer certificate."}]}

So here’s what happened:
Starting in vROps 6.3, a change was made to how vROps handles the upload of a certificate.  The service used for the handling of the chain file has a file limit of 8192 bytes.
– My certificate chain (.pem file) was 10.7 KB, due to the fact that we have 2 intermediate certificates (Root Cert->Intermediate 1->Intermediate 2->Server Cert)
– vROps was essentially truncating the file, causing it to interpret the first intermediate certificate as the root, causing the “likely the wrong issuer certificate” error.
– I confirmed the issue exists on 6.3 and 6.4; the TSE was not able to replicate on 6.2.1
The vRealize Operations Manager supports custom security certificates with key length up to 8192 bits. An error is displayed when you try to upload a security certificate generated with a stronger key length beyond 8192 bits.

Solutions:
1) Use a certificate with only one intermediate, which results in a .pem file size of ~7 KB
– I had the option to request a GlobalSign certificate at minimal cost; we were in the process of requesting this anyway to rule out our internal CA as the problem. It is of course understood that this will not be an option for many users.
2) “The workaround is to edit the ‘maxInMemorySize’ value for the spring service which handles the upload of the cert. This should prevent the use of the temp file which in turn will bypass the part of the code that reads the temp file with the 10k limit.” – Provided by VMware TSE. I did not pursue this workaround as we already had the new GlobalSign certificate ready to test on the same day this workaround was proposed. I have asked for additional information on this workaround and will update if it is provided.

Hopefully this information helps diagnose failures in certificate installation in vRealize Operations 6.3 or 6.4. I’m not sure why 8 KB was the file size limit chosen, but it is too small for many environments with multiple intermediate CAs. Hopefully this will be resolved in the next release.

Update:
1) To edit ‘maxInMemorySize’ value for the spring service to allow a larger certificate upload, the following information was provided by VMware support:
a. Stop casa.
service vmware-casa stop

b. Edit /usr/lib/vmware-casa/casa-webapp/webapps/casa/WEB-INF/casa-servlet.xml
Locate this text around line 256:
<bean id="multipartResolver" class="org.springframework.web.multipart.commons.CommonsMultipartResolver">
<property name="uploadTempDir" value="file:///${application.data?/storage}/db/tmp"/>
</bean>

Add the following within the <bean> definition:
<property name="maxInMemorySize" value="30720"/>

c. Restart casa.
service vmware-casa start

2) I have been advised that a fix for this issue will be included in the next release, although I cannot confirm that and have no information on the vROps release schedule.

I want to publicly acknowledge VMware Global Support Services (GSS) Business Critical Support (BCS) and the excellent experience working with them to diagnose and resolve this issue, as well as VMware Technical Account Manager (TAM) Services. While not cheap, these service offerings provide high value.

* Update 3/3/2017
vRealize Operations 6.5 was released on 3/2/2017. The 8192 bit certificate size limit is still reflected in the documentation .

VCSA – Moving Behind A Firewall

VCSA – Moving Behind A Firewall

We have a situation where we have to deploy VMware vCenter Server Appliance (VCSA), then move it to a secured network behind a firewall.

I have not (yet) had success changing the hostname, including the FQDN, so make sure you can create proper DNS entries on both sides of the firewall and can do the initial deployment with the intended final FQDN. Our VCSA’s FQDN is of the format “vcenterserver.securednet.companynet.org” so this was achievable (the segmented domain is a subdomain of the company domain)

Deploy the appliance, then access the VCSA console and press F2, then login as root. At this point you may also need to Edit the VM Settings and change it to the new network (behind the firewall, in my case).
– Configure Management Network
– Set IP Configuration, IPv6 Configuration, DNS Configuration, Custom DNS Suffixes as required.
– (Esc) Exit, (Y) Yes Apply changes and restart management network

At this point, I got the following 2 screens. IPv6 is understandable; we disabled it. The DNS error appears even though we have validated the DNS entries/servers.

Now, to access the shell console. Login as root.

Command> shell.set --enabled True
Command> shell
# vi /etc/sysconfig/networking/devices/ifcfg-eth0

Quick vi reminder: hit (Insert) to edit the line, then (Esc), :wq! to save and exit.

Make sure all the IP information is correct in this file. I had to update the broadcast & gateway for sure. Then,

# vi /etc/hosts

Fix the entry here to reflect the new IP.

Validate settings:

# ifconfig eth0
# route -n

At this point, we were missing the default gateway in the route table, even thought it was defined in the ifcfg-eth0 file and in the IP Configuration screen of the console. Also the Broadcast address still reflects the old network, but this is less of a problem.

To temporarily fix the default gateway, do # route add default gateway XXX.YYY.ZZZ.1, using the correct gateway IP obviously.

To permanently resolve the gateway issue, log into the web console (https://vcenterserver:443/vsphere-client/) as Administrator@vsphere.local or whatever you defined as your SSO domain.
– Go to System Configuration under Administration in the main panel
– Nodes (in left column, under System Configuration)
– Select your vCenter server under Nodes
– “Edit” in the upper right corner of the main panel
– Expand nic0 & define Default gateway
– Ok
– Reboot to validate configuration persists

VMware: Multi-NIC vMotion Configuration with PowerShell

VMware: Multi-NIC vMotion Configuration with PowerShell

Changed configuration of numerous VMware clusters to utilize multi-NIC vMotion. As with all my PS, it’s quick & dirty and cobbled from a couple of example scripts. Assumes existing vmkernel port group named “vMotion”

<#
.SYNOPSIS
   Configure multi-NIC vMotion
.DESCRIPTION
   Miklm.com 08/19/2015
   Credit to examples from http://www.yellow-bricks.com/2011/09/17/multiple-nic-vmotion-in-vsphere-5/#comment-28153 & http://hostilecoding.blogspot.com/2014/01/vmware-virtual-standard-switch.html
#>

$viserver = "vCenter.local"
$cluster = "CLUSTERNAME"
$vswitchName = "vSwitch1"
$VMotionVLan = "908"
$vSwitch2Nics = "vmnic1","vmnic3"
$VMotionPortGroupName1 = "vMotion-01"
$VMotionPortGroupName2 = "vMotion-02"

Connect-VIServer $viserver

Foreach ($esxhost in (Get-Cluster -name $cluster | Get-VMHost))
{
  $vswitch = Get-VirtualSwitch -VMHost $esxhost -Name $vswitchName

  # Remove the original "vMotion" VMkernel Port group
  Remove-VMHostNetworkAdapter -Nic (Get-VMHostNetworkAdapter -VMHost $esxhost | where {$_.PortGroupName -eq "vMotion"}) -Confirm:$false
  Remove-VirtualPortGroup -VirtualPortGroup (Get-VirtualPortGroup -Name "vMotion" -vmhost $esxhost) -Confirm:$false

  # Configure virtual switch for Multi-NIC vMotion
  $vportgroup1 = New-VirtualPortGroup -VirtualSwitch $vswitch -Name $VMotionPortGroupName1 -VLanId $VMotionVLan
  $vportgroup2 = New-VirtualPortGroup -VirtualSwitch $vswitch -Name $VMotionPortGroupName2 -VLanId $VMotionVLan

  $vNic1 = New-VMHostNetworkAdapter -VMHost $esxhost -VirtualSwitch $vswitch -PortGroup $vportgroup1 -VMotionEnabled:$true
  $vNic2 = New-VMHostNetworkAdapter -VMHost $esxhost -VirtualSwitch $vswitch -PortGroup $vportgroup2 -VMotionEnabled:$true

  $vportgroup1 | Get-NicTeamingPolicy | Set-NicTeamingPolicy -MakeNicActive $vSwitch2Nics[0] -MakeNicStandBy $vSwitch2Nics[1]
  $vportgroup2 | Get-NicTeamingPolicy | Set-NicTeamingPolicy -MakeNicActive $vSwitch2Nics[1] -MakeNicStandBy $vSwitch2Nics[0]
}

 

Cleaning up ARP tables – VMware ESXi 5.1

Cleaning up ARP tables – VMware ESXi 5.1

We had a recent situation where a virtual appliance (Avamar proxy) was mistakenly deployed configured with the IP address of the gateway. This caused the hosts to be unreachable from vCenter. After finding the offending VM and shutting it down from the console, the ARP tables either had to be cleaned up (you could also allow them to time out but that could take up to 20 minutes)

ESXi 5.1
esxcli network ip neighbor list
vsish -e set /net/tcpip/v4/neighbor del IPADDRESS

New functionality added in 5.5 to remove ARP entries through esxcli

Find a VM by MAC in vCenter with PowerCLI
Get-vm | Select Name, @{N=“Network“;E={$_ | Get-networkAdapter | ? {$_.macaddress -eq“00:50:56:A4:22:F4“}}} |Where {$_.Network-ne “”}

Also you may need to clean up your Windows vCenter server from the command prompt
arp -a
arp -d IPADDRESS

Or,
netsh interface ip delete arpcache
Also a good idea to:
ipconfig /flushdns