Cloud Foundry Installation
This section covers how to install Spring Cloud Data Flow on Cloud Foundry.
Backing Services
Spring Cloud Data Flow requires a few data services to perform streaming and task or batch processing. You have two options when you provision Spring Cloud Data Flow and related services on Cloud Foundry:
- The simplest (and automated) method is to use the Spring Cloud Data Flow for PCF tile. This is an opinionated tile for Pivotal Cloud Foundry. It automatically provisions the server and the required data services, thus simplifying the overall getting-started experience. You can read more about the installation here.
- Alternatively, you can provision all the components manually.
The following section goes into the specifics of how to install manually.
Provisioning a Rabbit Service Instance
RabbitMQ is used as a messaging middleware between streaming apps and is available as a PCF tile.
You can use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example, you can use Pivotal Web Services, as the following example shows:
cf create-service cloudamqp lemur rabbit
Provision a PostgreSQL Service Instance
An RDBMS is used to persist Data Flow state, such as stream and task definitions, deployments, and executions.
You can use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup. For example, you can use Pivotal Web Services, as the following example shows:
cf create-service elephantsql panda my_postgres
Database Connection Limits
If you intend to create and run batch-jobs as Task pipelines in SCDF, you must ensure that the underlying database instance includes enough connections capacity so that the batch-jobs, Task, and SCDF can concurrently connect to the same database instance without running into connection limits. This usually means you can't use any free plans.
Manifest based installation on Cloud Foundry
To install Cloud Foundry:
-
Download the Data Flow server and shell applications, by running the following example commands:
wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server/2.6.3/spring-cloud-dataflow-server-2.6.3.jar wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/2.6.3/spring-cloud-dataflow-shell-2.6.3.jar
-
Download Skipper, to which Data Flow delegates stream lifecycle operations, such as deployment, upgrading and rolling back. To do so, use the following command:
wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-skipper-server/2.5.2/spring-cloud-skipper-server-2.5.2.jar
-
Push Skipper to Cloud Foundry
Once you have installed Cloud Foundry, you can push Skipper to Cloud Foundry. To do so, you need to create a manifest for Skipper.
You will use the "default" deployment platform
deployment.services
setting in theSkipper Server
configuration, as shown below, to configure Skipper to bind the RabbitMQ service to all deployed streaming applications. Note "rabbitmq" is the name of the service instance in this case.The following example shows a typical manifest for Skipper:
--- applications: - name: skipper-server host: skipper-server memory: 1G disk_quota: 1G instances: 1 timeout: 180 buildpack: java_buildpack path: <PATH TO THE DOWNLOADED SKIPPER SERVER UBER-JAR> env: SPRING_APPLICATION_NAME: skipper-server SPRING_PROFILES_ACTIVE: cloud JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}' SPRING_APPLICATION_JSON: |- { "spring.cloud.skipper.server" : { "platform.cloudfoundry.accounts": { "default": { "connection" : { "url" : <cf-api-url>, "domain" : <cf-apps-domain>, "org" : <org>, "space" : <space>, "username": <email>, "password" : <password>, "skipSsValidation" : false }, "deployment" : { "deleteRoutes" : false, "services" : "rabbitmq", "enableRandomAppNamePrefix" : false, "memory" : 2048 } } } } } services: - <services>
You need to fill in
<org>
,<space>
,<email>
,<password>
,<serviceName>
(RabbitMQ or Apache Kafka) and<services>
(such as PostgresSQL) before running these commands. Once you have the desired config values inmanifest.yml
, you can run thecf push
command to provision the skipper-server.SSL ValidationSet Skip SSL Validation to
true
only if you run on a Cloud Foundry instance by using self-signed certificates (for example, in development). Do not use self-signed certificates for production.BuildpacksWhen specifying the
buildpack
, our examples typically specifyjava_buildpack
orjava_buildpack_offline
. Use the CF commandcf buildpacks
to get a listing of available relevant buildpacks for your environment. - Configure and run the Data Flow Server.
One of the most important configuration details is providing credentials to the Cloud Foundry instance so that the server can itself spawn applications. You can use any Spring Boot-compatible configuration mechanism (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, and others), although some may prove more practicable than others, depending on how you typically deploy applications to Cloud Foundry.
Before installing there some general configuration details you should be aware of to update your manifest file as needed.
General Configuration
This section covers some things to be aware of when you install into Cloud Foundry.
Unique names
You must use a unique name for your application. An application with the same name in the same organization causes your deployment to fail.
Memory Settings
The recommended minimum memory setting for the server is 2G. Also, to push apps to PCF and obtain application property metadata, the server downloads applications to a Maven repository hosted on the local disk. While you can specify up to 2G as a typical maximum value for disk space on a PCF installation, you can increase this to 10G. Read about the maximum disk quota for information on how to configure this PCF property. Also, the Data Flow server itself implements a Last-Recently-Used algorithm to free disk space when it falls below a low-water-mark value.
Routing
If you push to a space with multiple users (for example, on PWS), the route you chose for your application name may already be taken.
You can use the --random-route
option to avoid this when you push the server application.
Maven repositories
If you need to configure multiple Maven repositories, a proxy, or authorization for a private repository, see Maven Configuration.
Installing using a Manifest
As an alternative to setting environment variables with the cf set-env
command, you can curate all the relevant environment variables in a
manifest.yml
file and use the cf push
command to provision the server.
The following example shows such a manifest file. Note that "postgresSQL" is the name of the database service instance:
---
applications:
- name: data-flow-server
host: data-flow-server
memory: 2G
disk_quota: 2G
instances: 1
path: { PATH TO SERVER UBER-JAR }
env:
SPRING_APPLICATION_NAME: data-flow-server
SPRING_PROFILES_ACTIVE: cloud
JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI: https://<skipper-host-name>/api
SPRING_APPLICATION_JSON: |-
{
"maven" : {
"remoteRepositories" : {
"repo1" : {
"url" : "https://repo.spring.io/libs-snapshot"
}
}
},
"spring.cloud.dataflow" : {
"task.platform.cloudfoundry.accounts" : {
"default" : {
"connection" : {
"url" : <cf-api-url>,
"domain" : <cf-apps-domain>,
"org" : <org>,
"space" : <space>,
"username" : <email>,
"password" : <password>,
"skipSsValidation" : true
},
"deployment" : {
"services" : "postgresSQL"
}
}
}
}
}
services:
- postgresSQL
You must deploy Skipper first and then configure the URI location where the Skipper server runs.
Configuration for Prometheus
If you have installed the Prometheus and Grafana on Cloud Foundry or have a separate installation of them on another cluster, update the Data Flow Server's manifest file so that the SPRING_APPLICATION_JSON
environment variable contains a section that configures all stream applications to send metrics data to the Prometheus RSocket gateway.
The snippets of YAML specific to this configuration is shown below.
---
applications:
- name: data-flow-server
...
env:
...
SPRING_APPLICATION_JSON: |-
{
...
"spring.cloud.dataflow" : {
...
"applicationProperties" : {
"stream.management.metrics.export.prometheus" : {
"enabled" : true,
"rsocket.enabled" : true,
"rsocket.host" : <prometheus-rsocket-proxy host>,
"rsocket.port" : <prometheus-rsocket-proxy TCP or Websocket port>
},
},
"grafana-info.url": <grafana root URL>
}
}
services:
- postgresSQL
Similarly if you want to configure metrics collection for tasks, update the Data Flow Server's manifest file so that the SPRING_APPLICATION_JSON
environment variable contains a section that configures all task applications to send metrics data to the Prometheus RSocket gateway.
The snippits of YAML specific to this configuration is shown below.
---
applications:
- name: data-flow-server
...
env:
...
SPRING_APPLICATION_JSON: |-
{
...
"spring.cloud.dataflow" : {
...
"applicationProperties" : {
"task.management.metrics.export.prometheus" : {
"enabled" : true,
"rsocket.enabled" : true,
"rsocket.host" : <prometheus-rsocket-proxy host>,
"rsocket.port" : <prometheus-rsocket-proxy TCP or Websocket port>
},
},
"grafana-info.url": <grafana root URL>
}
}
services:
- postgresSQL
Configuration for Wavefront
If you have a Wavefront SaaS account, you can enable the Task and Stream metrics. For which, you need to extend the Data Flow server manifest by adding following JSON to the SPRING_APPLICATION_JSON
environment variable:
"management.metrics.export.wavefront": {
"enabled": true,
"api-token": "<YOUR API Token>",
"uri": "<YOUR WAVEFRONT URI>",
"source": "your-scdf-cf-source-id"
},
"spring.cloud.dataflow.applicationProperties": {
"task.management.metrics.export.wavefront": {
"enabled": true,
"api-token": "<YOUR API Token>",
"uri": "<YOUR WAVEFRONT URI>",
"source": "your-scdf-cf-source-id"
},
"stream.management.metrics.export.wavefront": {
"enabled": true,
"api-token": "<YOUR API Token>",
"uri": "<YOUR WAVEFRONT URI>",
"source": "your-scdf-cf-source-id"
}
}
Check the Wavefront Actuator endpoint for more details about the Wavefront-specific options supported through the management.metrics.export.wavefront.XXX
properties.
Once you are ready with the relevant properties in your manifest file,
you can issue a cf push
command from the directory where this file is
stored.
Configuration for InfluxDB
If you have installed the InfluxDB and Grafana on Cloud Foundry or have a separate installation of them on another cluster, to enable the Task and Stream metrics integration you need to extend the Data Flow server manifest by adding following JSON to the SPRING_APPLICATION_JSON
environment variable:
"spring.cloud.dataflow.applicationProperties": {
"task.management.metrics.export.influx": {
"enabled": true,
"db": "defaultdb",
"autoCreateDb": false,
"uri": "https://influx-uri:port",
"userName": "guest",
"password": "******"
},
"stream.management.metrics.export.influx": {
"enabled": true,
"db": "defaultdb",
"autoCreateDb": false,
"uri": "https://influx-uri:port",
"userName": "guest",
"password": "******"
},
"spring.cloud.dataflow.grafana-info.url": "https://grafana-uri:port"
}
Check the Influx Actuator properties for further details about the management.metrics.export.influx.XXX
properties.
Once you are ready with the relevant properties in your manifest file,
you can issue a cf push
command from the directory where this file is
stored.
Shell
The following example shows how to start the Data Flow Shell:
java -jar spring-cloud-dataflow-shell-{scdf-core-version}.jar
Since the Data Flow Server and shell are not running on the same host, you can point the shell to the Data Flow server URL by using the dataflow config server
command in Shell.
server-unknown:>dataflow config server https://<data-flow-server-route-in-cf>
Successfully targeted https://<data-flow-server-route-in-cf>
Register prebuilt applications
All the prebuilt streaming applications:
- Are available as Apache Maven artifacts or Docker images.
- Use RabbitMQ or Apache Kafka.
- Support monitoring via Prometheus and InfluxDB.
- Contain metadata for application properties used in the UI and code completion in the shell.
Applications can be registered individually using the app register
functionality or as a group using the app import
functionality.
There are also dataflow.spring.io
links that represent the group of prebuilt applications for a specific release which is useful for getting started.
You can register applications using the UI or the shell.
Since the Cloud Foundry installation guide uses RabbitMQ as the messaging middleware, register the RabbitMQ version of the applications.
dataflow:>app import --uri https://dataflow.spring.io/rabbitmq-maven-latest