Skip to content

Application life-cycle

Factor 9 of the 12 factor app states that apps must be disposable and start quickly. This means that apps must perform a graceful shutdown in a way that allows the hosting platform to make the action invisible to the client. Furthermore when running on a PaaS, an app instance must be able to signal to it's hosting platform it is in a healthy state and ready to take traffic.

This doc will explain at a high level the actions which must be taken when engineering an app to run on Cloud Foundry, such that a user or client will not notice an interruption in service during app push, scale, restart and platform upgrades.

What happens when an app starts

The start process applies to the following events:

  • cf push - new instance of a new version of the app are started to replace the old version
  • cf scale - new instance of the app are started
  • cf restart (rolling strategy) - all instances of the app are replaced
  • Platform upgrade - Diego cells follow a rolling replacement

At a very high level the app start process follows:

  • Schedule decides a instance is needed and instructs the system to start a container
  • Container starts on a Diego cell (worker)
  • The Diego cell starts a health check process
  • Application health check passes
  • The Gorouter is instructed to add the app into rotation

Application health checks

Having a health check which only returns true when the app is ready to serve traffic is critical to ensure that adding a container does not cause a client to receive a HTTP error.

Cloud Foundry support 3 types of health checks:

  • http - a http request sent to a specific endpoint of the app, with 200 OK expected as the response
  • port - a TCP can be made on a designated port or ports. This is the default.
  • process - the process is running. E.g the python interpreter is running

It is recommended to use http as this is the only option that can ensure the app is ready for traffic. In the case of web apps, both port and process health checks will only confirm that the web server is online, but not that the underlying software is able to respond to traffic. In addition should an app become unresponsive a port health check may return even though the underlying app is no longer able to respond to a request.

What happens when an app crashes

In the case that an app becomes unresponsive the process is as follows:

In the case that the Gorouter is still able to make a TCP request to an app, for example if a web service is listening, but not able to respond to the request, the Gorouter will continue to send traffic to the instance. To mitigate this it is recommended to to modify the http health check interval below the default of 30 seconds. Depending on the CPU cost of the health check there could be an impact on the platform if the value is set too low.

What happens when an app stops

The stop process applies to the following events:

  • cf push - old instances of an app are stopped on a rolling basis and replaced by a new version
  • cf restart (rolling strategy) - old instances of an app are stopped on a rolling basis
  • cf stop - all instances of the app are stopped
  • Platform scale down - Diego cells are drained and removed
  • Platform upgrade - Diego cells follow a rolling replacement

At a very high level the app shutdown process is as follows:

  • The Gorouter removes the app from its routing table, meaning that no new request will be sent, but outstanding request responses will be honoured
  • The scheduler instructs the Diego cell to stop the app
  • The container is sent the SIGTERM signal, which the app should treat as a soft shutdown event and gracefully complete outstanding requests before stopping cleanly
  • If after 10 seconds the container has not exited, Diego then sends a SIGKILL which will terminate all processes

Should there be the need to extend the time that apps are given to shutdown this can be set system wide but will have the effect that Diego maintenance events could take longer.

Each language will have a different way to respond SIGTERM.

Java shutdown

Java allows the developer to configure pre-shutdown hooks, to insert logic into the shutdown process.

The default behaviour in Java is as follows:

The last point is critical, as the JVM will not exit until all theads complete, meaning the app should be designed to take this into account.

Spring annotation

Spring apps can use the @pre-destroy annoation to ensure a function is called before exiting.

For Java 9+ the following dependency needs to be added.

<dependency>
    <groupId>javax.annotation</groupId>
    <artifactId>javax.annotation-api</artifactId>
    <version>1.3.2</version>
</dependency>

Detecting a SIGKILL

If the following line appears in app logs, then it is proof that an app was forcully shutdown by the system after the app did not respond properly to a SIGTERM.

OUT Exit status 137 (exceeded 10s graceful shutdown interval)

Testing app behaviour

Should an app team need to test the behaviour to ensure the stop and start events are transparent to a client it is recommended to run cf restart --strategy rolling in a dev environment whilst the app is under load. If the app is coded, configured and scaled correctly, then the operation will be invisible to the client.