Troubleshooting Deployment
This guide describes common problems encountered when deploying applications.
YAML deployment errors
The error Invalid YAML: Plan not in acceptable format: Cannot convert ...
means that the text is not
valid YAML. Common reasons include that the indentation is incorrect, or that there are non-matching
brackets.
The error Unrecognized application blueprint format: no services defined
means that the services:
section is missing.
An error like the one shown below means that the given entity type (in this case com.acme.Foo) is not in the catalog or on the classpath:
An error like the one shown below means that the given location (in this case aws-ec3) was unknown:
This means it does not match any of the named locations/reference in brooklyn.properties, nor any of the clouds enabled in the jclouds support, nor any of the locations/reference added dynamically through the catalog API.
VM Provisioning Failures
There are many stages at which VM provisioning can fail! An error Failure running task provisioning
means there was some problem obtaining or connecting to the machine.
An error like ... Not authorized to access cloud ...
usually means the wrong identity/credential was used.
AWS requires a X-Amz-Date header which contains the date of the Cloudsoft AMP AWS client.
If the date on the server is wrong, for example several minutes behind you will get an
Authorization Exception. This is to prevent replay attacks. Please be sure to set the clock
correctly on the machine running Cloudsoft AMP. To set the time on Linux you can use the ntp
client (e.g. sudo ntpdate pool.ntp.org
). We advise running the
ntp daemon so that the clock is kept
continually in sync.
An error like Unable to match required VM template constraints
means that a matching image (e.g. AMI in AWS terminology) could not be found. This
could be because an incorrect explicit image id was supplied, or because the match-criteria could not
be satisfied using the given images available in the given cloud. The first time this error is
encountered, a listing of all images in that cloud/region will be written to the debug log.
Failure to form an ssh connection to the newly provisioned VM can be reported in several different ways, depending on the nature of the error. This breaks down into failures at different points:
- Failure to reach the ssh port (e.g.
... could not connect to any ip address port 22 on node ...
). - Failure to do the very initial ssh login (e.g.
... Exhausted available authentication methods ...
). - Failure to ssh using the newly created user.
There are many possible reasons for this ssh failure, which include:
- The VM was “dead on arrival” (DOA) - sometimes a cloud will return an unusable VM. One can work around
this using the
machineCreateAttempts
configuration option, to automatically retry with a new VM. - Local network restrictions. On some guest wifis, external access to port 22 is forbidden. Check by manually trying to reach port 22 on a different machine that you have access it.
- NAT rules not set up correctly. On some clouds that have only private IPs, AMP can automatically create NAT rules to provide access to port 22. If this NAT rule creation fails for some reason, then AMP will not be able to reach the VM. If NAT rules are being created for your cloud, then check the logs for warnings or errors about the NAT rule creation.
- ssh credentials incorrectly configured. The AMP configuration is very flexible in how ssh credentials can be configured. However, if a more advanced configuration is used incorrectly (e.g. the wrong login user, or invalid ssh keys) then this will fail.
- Wrong login user. The initial login user to use when first logging into the new VM is inferred from
the metadata provided by the cloud provider about that image. This can sometimes be incomplete, so
the wrong user may be used. This can be explicitly set using the
loginUser
configuration option. An example of this is with some Ubuntu VMs, where the “ubuntu” user should be used. However, on some clouds it defaults to trying to ssh as “root”. - Bad choice of user. By default, AMP will create a user with the same name as the user running the AMP process; the choice of user name is configurable. If this user already exists on the machine, then the user setup will not behave as expected. Subsequent attempts to ssh using this user could then fail.
- Custom credentials on the VM. Most clouds will automatically set the ssh login details (e.g. in AWS using
the key-pair, or in CloudStack by auto-generating a password). However, with some custom images the VM will have hard-coded credentials that must be used. If AMP’s configuration does not match that, then it will fail. - Guest customisation by the cloud. On some clouds (e.g. vCloud Air), the VM can be configured to do guest customisation immediately after the VM starts. This can include changing the root password. If AMP is not configured with the expected changed password, then the VM provisioning may fail (depending if AMP connects before or after the password is changed!).
A very useful debug configuration is to set destroyOnFailure
to false. This will allow ssh failures to
be more easily investigated.
java.security.KeyException when Provisioning VM
The exception java.security.KeyException
can be thrown when jclouds is attempting the SSL handshake,
to make cloud API calls. This can happen if the version of nss is older than 3.16 - the nss package
includes the SSL library.
To fix this on CentOS, run:
For a discussion of investigating this kind of issue, see this Backslasher blog.
The full stacktrace is shown below:
Timeout Waiting For Service-Up
A common generic error message is that there was a timeout waiting for service-up.
This just means that the entity did not get to service-up in the pre-defined time period (the default is
two minutes, and can be configured using the start.timeout
config key; the timer begins after the
start tasks are completed).
See the overview for where to find additional information, especially the section on “Entity’s Error Status”.
Invalid packet error
If you receive an error message similar to the one below when provisioning a VM, it means that the wrong username is being used for ssh’ing to the machine. The “invalid packet” is because a response such as “Please login as the ubuntu user rather than root user.” is being sent back.
You can workaround the issue by explicitly setting the user that AMP should use to login to the VM (typically the OS default user).
An example of how to explicitly set the user is shown below (when defining a Location) by using ‘loginUser’:
SSLException close_notify Exception
The following error, when deploying a blueprint, has been shown to be caused by issues with DNS provided by your ISP or traffic filtering such as child-safe type filtering:
Caused by: javax.net.ssl.SSLException: Received fatal alert: close_notify
To resolve this try disabling traffic filtering and setting your DNS to a public server such as 8.8.8.8 to use google DNS. See here for details on how to configure this.
Download with Curl Fails on CentOS 7.0 due to TLS Negotiation
When downloading an install artifact with Curl, using CentOS 7.0, one can get the failure shown below:
curl: (35) Peer reports incompatible or unsupported protocol version.
This can be caused by incompatible TLS negotiation with the web server (e.g. with github). For more details, see Red Hat bug 1170339, “use the default min/max TLS version provided by NSS [RHEL-7]”.
To confirm this is the issue, try running the failing curl command on the same machine with curl -v
for verbose output.
You should see a more detailed error such as:
NSS error -12286 (SSL_ERROR_NO_CYPHER_OVERLAP)
Cannot communicate securely with peer: no common encryption algorithm(s).
Closing connection 1
Possible workarounds include:
-
Use a more recent version of CentOS. On AWS, a good choice is the most recent centos.org image from the AWS marketplace. However, this involves first subscribing to it in the marketplace. The Amazon Linux AMI is another good choice, but this is not a normal CentOS image so it depends what distro(s) the entity was developed/tested against.
-
Change your blueprint to first do
sudo yum update -y curl nss
, before the curl command is executed.