Connect to External Kafka Cluster
Pivotal Cloud Foundry does not have Apache Kafka as a managed service in the Marketplace. However, it is common for developers to develop and deploy applications to Cloud Foundry that interact with an external Kafka cluster. This recipe specifically walks through the developer expectations from Spring Cloud Stream, Spring Cloud Data Flow, and as well as the Spring Cloud Data Flow for PCF Tile.
We will review the required Spring Cloud Stream properties and how they are translated over to the applications for the following deployment options in Cloud Foundry.
- Applications run as standalone app instances.
- Applications deployed as part of a streaming data pipeline through open-source SCDF.
- Applications deployed as part of a streaming data pipeline through SCDF for PCF tile.
Prerequisite
Let's start with preparing the external Kafka cluster credentials.
Typically, a series of Kafka brokers, collectively are referred to as a Kafka cluster. The brokers individually can be reached through its external IP addresses or a well-defined DNS route could be available to use as well.
For this walk-through, though, we will stick to a simpler setup of 3-broker cluster with their DNS addresses being foo0.broker.foo
, foo1.broker.foo
, and foo2.broker.foo
. The default port of the brokers is 9092
.
If the cluster is secured, depending on the security option in use at the broker, different properties are expected to be supplied for when applications attempt to connect to the external cluster.
Again, for simplicity, we will use Kafka's JAAS set up of PlainLoginModule
with username as test
and password as bestest
.
User-provided Services vs. Spring Boot Properties
The next question Cloud Foundry developers stumble upon is whether or not to set up Kafka connection as a Cloud Foundry custom user-provided service (CUPS) or simply pass connection credentials as Spring Boot properties.
In Cloud Foundry, there isn't a Spring Cloud Connector or CF-JavaEnv support for Kafka, so by service-binding the Kafka CUPS with the application, you will not automatically be able to parse VCAP_SERVICES
and pass the connection credentials to the applications at runtime.
Even with CUPS in place, it is your responsibility to parse the VCAP_SERVICES
JSON and pass them as Boot properties, so there's no automation in place for Kafka.
For the curious, you can see an example of CUPS in action in the Spring Cloud Data Flow's reference guide.
For this walk-through, we will stick to the Spring Boot properties.
Standalone Streaming Apps
The typical Cloud Foundry deployment of an application includes a manifest.yml
file.
We will use the source-sample
source application to highlight the configurations to connect to an external Kafka cluster.
---
applications:
- name: source-sample
host: source-sample
memory: 1G
disk_quota: 1G
instances: 1
path: source-sample.jar
env:
... # other application properties
... # other application properties
SPRING_APPLICATION_JSON: |-
{
"spring.cloud.stream.kafka.binder": {
"brokers": "foo0.broker.foo,foo1.broker.foo,foo2.broker.foo",
"jaas.options": {
"username": "test",
"password":"bestest"
},
"jaas.loginModule":"org.apache.kafka.common.security.plain.PlainLoginModule"
},
"spring.cloud.stream.bindings.output.destination":"fooTopic"
}
With the above setting, when the source-sample
source is deployed to Cloud Foundry, it should be able to connect to the external cluster.
You can verify the connection credentials by accessing source-sample
's /configprops
actuator endpoint.
Likewise, you will also see the connection credentials printed in the app logs.
The Kafka connection credentials are supplied through the Spring Cloud Stream Kafka binder properties, which in
this case are all the properties with the spring.spring.cloud.stream.kafka.binder.*
prefix.
Alternatively, instead of supplying the properties through SPRING_APPLICATION_JSON
, these properties can be supplied as plain
env-vars as well.
Streaming Data Pipeline in SCDF (Open Source)
Deploying a streaming data pipeline in SCDF requires at least two applications. We will use the out-of-the-box time
as the source and the log
sink applications here.
Global Kafka Connection Configurations
Before we jump to the demo walk-through, let's review how global properties can be configured centrally in SCDF. With that flexibility, every stream application deployed through SCDF will automatically also inherit all the globally defined properties, and it can be convenient for cases like Kafka connection credentials.
---
applications:
- name: scdf-server
host: scdf-server
memory: 2G
disk_quota: 2G
timeout: 180
instances: 1
path: spring-cloud-dataflow-server-2.6.3.jar
env:
SPRING_PROFILES_ACTIVE: cloud
JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI: http://your-skipper-server-uri/api
SPRING_APPLICATION_JSON: |-
{
"spring.cloud": {
"dataflow.task.platform.cloudfoundry": {
"accounts": {
"foo": {
"connection": {
"url": <api-url>,
"org": <org>,
"space": <space>,
"domain": <app-domain>,
"username": <email>,
"password": <password>,
"skipSslValidation": true
},
"deployment": {
"services": <comma delimited list of service>"
}
}
}
},
"stream": {
"kafka.binder": {
"brokers": "foo0.broker.foo,foo1.broker.foo,foo2.broker.foo",
"jaas": {
"options": {
"username": "test",
"password":"bestest"
},
"loginModule":"org.apache.kafka.common.security.plain.PlainLoginModule"
}
},
"bindings.output.destination":"fooTopic"
}
}
}
services:
- mysql
With the above manifest.yml
, SCDF should now be in the position to automatically propagate the Kafka connection credentials to all the stream application deployments.
dataflow:>stream create fooz --definition "time | log"
Created new stream 'fooz'
dataflow:>stream deploy --name fooz
Deployment request has been sent for stream 'fooz'
When the time
and log
applications are successfully deployed and started in Cloud Foundry, they should automatically connect to the configured external Kafka cluster.
You can verify the connection credentials by accessing time
or log
's /configprops
actuator endpoints.
Likewise, you will also see the connection credentials printed in the app logs.
Explicit Stream-level Kafka Connection Configuration
Alternatively, if you intend to deploy only a particular stream with external Kafka connection credentials, you can do so when deploying a stream with explicit overrides.
dataflow:>stream create fooz --definition "time | log"
Created new stream 'fooz'
dataflow:>stream deploy --name fooz --properties "app.*.spring.cloud.stream.kafka.binder.brokers=foo0.broker.foo,foo1.broker.foo,foo2.broker.foo,app.*.spring.spring.cloud.stream.kafka.binder.jaas.options.username=test,app.*.spring.spring.cloud.stream.kafka.binder.jaas.options.password=besttest,app.*.spring.spring.cloud.stream.kafka.binder.jaas.loginModule=org.apache.kafka.common.security.plain.PlainLoginModule"
Deployment request has been sent for stream 'fooz'
When the time
and log
applications are successfully deployed and started in Cloud Foundry, they should automatically connect to the external Kafka cluster.
You can verify the connection credentials by accessing time
or log
's /configprops
actuator endpoints.
Likewise, you will also see the connection credentials printed in the app logs.
Streaming Data Pipeline in SCDF for PCF Tile
We don't yet have support to supply global configuration properties through the SCDF for PCF Tile.
However, the option discussed at Explicit Stream Configuration should still work when deploying a stream from Spring Cloud Data Flow running as a managed service in Pivotal Cloud Foundry.
Alternatively, you could supply Kafka connection credentials as CUPS when creating the service instance of SCDF for PCF Tile.
cf create-service p-dataflow standard data-flow -c '{"messaging-data-service": { "user-provided": {"brokers":"foo0.broker.foo,foo1.broker.foo,foo2.broker.foo","username":"test","password":"bestest"}}}'
With that, when deploying the stream, you'd supply the CUPS properties as values from VCAP_SERVICES
.
dataflow:>stream create fooz --definition "time | log"
Created new stream 'fooz'
dataflow:>stream deploy --name fooz --properties "app.*.spring.cloud.stream.kafka.binder.brokers=${vcap.services.messaging-<GENERATED_GUID>.credentials.brokers},app.*.spring.spring.cloud.stream.kafka.binder.jaas.options.username=${vcap.services.messaging-<GENERATED_GUID>.credentials.username},app.*.spring.spring.cloud.stream.kafka.binder.jaas.options.password=${vcap.services.messaging-<GENERATED_GUID>.credentials.password},app.*.spring.spring.cloud.stream.kafka.binder.jaas.loginModule=org.apache.kafka.common.security.plain.PlainLoginModule"
Deployment request has been sent for stream 'fooz'
Replace <GENERATED_GUID>
with the GUID of the generated messaging service-instance name, which you can find from cf services
command.
Example: messaging-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11
.