Role of Multiple Platform Deployments

The goal of this recipe is to unpack the scenarios when multiple platform deployments become a necessity. Let's start with the use-cases and then dive into the details of how that can be set up in Spring Cloud Data Flow.

Use Cases

  • For certain use-cases, there is a desire to isolate the deployment of streaming and batch data pipelines to a unique environment. For instance, you may want to run a predictive model training routine that requires high memory. Where compute is usually defined with specific boundaries, and only the particular workloads are allowed to run on them. In other words, you don't want the regular applications to use the high-compute resource pool and saturate its availability. This is particularly important when you're running machines on a pay-by-use basis, to avoid premium costs.
  • Similar to the previous use-case, there might be a need to run the applications closer to where the message broker is (i.e., run the business logic close to where the data is). Doing so can avoid the I/O latency to meet the high throughput and low latency service-level agreements (SLAs). Once again, having to orchestrate a deployment pattern where the streaming applications can be targeted to deploy on the same VMs where message broker is running could help with the SLAs.
  • There's evidence of users using the "single" Spring Cloud Data Flow instance to orchestrate a deployment model in which the streaming and batch data pipelines are deployed and launched to multiple environments. This deployment pattern is primarily curated in order to organize the deployment topologies with well-defined boundaries, where a single SCDF instance can centrally orchestrate, monitor, and manage the data pipelines.

Above scenarios require Spring Cloud Data Flow to deploy streaming and batch applications with flexible platform configurations. Thankfully, though, starting with v2.0, Spring Cloud Data Flow supports the multi-platform deployment support. With that, users can declaratively configure the desired number of platforms accounts upfront, and use the defined accounts at the deployment time to distinguish the boundaries.

Now that we have the understanding of the use-case requirements, let's review the steps to configure multiple platform accounts in Kubernetes and Cloud Foundry.

Configurations

Kubernetes

Let's suppose you'd want to deploy a stream with 3 applications to the kafka-namespace. Likewise, if you'd want to launch a batch-job to the highmemory-namespace, the following configurations can be defined in the SCDF's deployment files.

Since the streaming data pipelines are managed through Skipper, you'd change the skipper-config-kafka.yaml with the following.

If RabbitMQ is the broker, you'd have to change skipper-config-rabbit.yaml instead.

apiVersion: v1
kind: ConfigMap
metadata:
  name: skipper
  labels:
    app: skipper
data:
  application.yaml: |-
    spring:
      cloud:
        skipper:
          server:
            platform:
              kubernetes:
                accounts:
                  default:
                    namespace: default
                    environmentVariables: 'SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZK_SERVICE_HOST}:${KAFKA_ZK_SERVICE_PORT}'
                    limits:
                      memory: 1024Mi
                      cpu: 500m
                    readinessProbeDelay: 120
                    livenessProbeDelay: 90
                  kafkazone:
                    namespace: kafka-namespace
                    environmentVariables: 'SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZK_SERVICE_HOST}:${KAFKA_ZK_SERVICE_PORT}'
                    limits:
                      memory: 2048Mi
                      cpu: 500m
                    readinessProbeDelay: 180
                    livenessProbeDelay: 120
      datasource:
        url: jdbc:mysql://${MYSQL_SERVICE_HOST}:${MYSQL_SERVICE_PORT}/skipper
        username: root
        password: ${mysql-root-password}
        driverClassName: org.mariadb.jdbc.Driver
        testOnBorrow: true
        validationQuery: "SELECT 1"

Notice that the inclusion of a platform account with the name kafkazone. Also, the default memory for the deployed pod is set to 2GB along with readiness and liveness probe customizations.

For batch data pipelines, however, you'd have to change the configurations in server-config.yaml as follows.

apiVersion: v1
kind: ConfigMap
metadata:
  name: scdf-server
  labels:
    app: scdf-server
data:
  application.yaml: |-
    spring:
      cloud:
        dataflow:
          applicationProperties:
            stream:
              management:
                metrics:
                  export:
                    prometheus:
                      enabled: true
                      rsocket:
                        enabled: true
                        host: prometheus-proxy
                        port: 7001
            task:
              management:
                metrics:
                  export:
                    prometheus:
                      enabled: true
                      rsocket:
                        enabled: true
                        host: prometheus-proxy
                        port: 7001
          grafana-info:
            url: 'https://grafana:3000'
          task:
            platform:
              kubernetes:
                accounts:
                  default:
                    namespace: default
                    limits:
                      memory: 1024Mi
                  highmemory:
                    namespace: highmemory-namespace
                    limits:
                      memory: 4096Mi
      datasource:
        url: jdbc:mysql://${MYSQL_SERVICE_HOST}:${MYSQL_SERVICE_PORT}/mysql
        username: root
        password: ${mysql-root-password}
        driverClassName: org.mariadb.jdbc.Driver
        testOnBorrow: true
        validationQuery: "SELECT 1"

Notice that the inclusion of a platform account with the name highmemory. Also, the default memory for the deployed pod is set to 4GB.

With these configurations, when deploying a stream from SCDF, you will have the option to select the platform.

List the available platforms.

dataflow:>stream platform-list
╔═════════╤══════════╤═══════════════════════════════════════════════════════════════════════════════════════╗
║  Name   │   Type   │                                   Description                                         ║
╠═════════╪══════════╪═══════════════════════════════════════════════════════════════════════════════════════╣
║default  │kubernetes│master url = [https://10.0.0.1:443/], namespace = [default], api version = [v1]        ║
║kafkazone│kubernetes│master url = [https://10.0.0.1:443/], namespace = [kafka-namespace], api version = [v1]║
╚═════════╧══════════╧═══════════════════════════════════════════════════════════════════════════════════════╝

dataflow:>task platform-list
╔═════════════╤═════════════╤════════════════════════════════════════════════════════════════════════════════════════════╗
║Platform Name│Platform Type│                                   Description                                              ║
╠═════════════╪═════════════╪════════════════════════════════════════════════════════════════════════════════════════════╣
║default      │Kubernetes   │master url = [https://10.0.0.1:443/], namespace = [default], api version = [v1]             ║
║highmemory   │Kubernetes   │master url = [https://10.0.0.1:443/], namespace = [highmemory-namespace], api version = [v1]║
╚═════════════╧═════════════╧════════════════════════════════════════════════════════════════════════════════════════════╝

Create a stream.

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

Deploy a stream.

dataflow:>stream deploy --name foo --platformName kafkazone

Verify deployment.

kubectl get svc -n kafka-namespace
NAME          TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)                      AGE
kafka         ClusterIP      10.0.7.155    <none>          9092/TCP                     7m29s
kafka-zk      ClusterIP      10.0.15.169   <none>          2181/TCP,2888/TCP,3888/TCP   7m29s

kubectl get pods -n kafka-namespace
NAME                                READY   STATUS    RESTARTS   AGE
foo-cassandra-v1-5d79b8bdcd-94kw4   1/1     Running   0          63s
foo-cardata-v1-6cdc98fbd-cmrr2      1/1     Running   0          63s
foo-predict-v1-758dc44575-tcdkd     1/1     Running   0          63s

Alternatively, the platform dropdown in the SCDF Dashboard can be used to make the selection to create and launch Tasks.

Launch against a platform

Cloud Foundry

For the same use-case requirement, if you'd want to deploy a stream with 3 applications to a ORG/Space where Kafka service is running, and likewise a batch-job to a ORG/Space with more compute power, the configurations in SCDF for Cloud Foundry could be as follows.

Since the streaming data pipelines are managed through Skipper, you'd change Skipper's manifest.yml to include Kafka ORG/space connection credentials.

applications:
  - name: skipper-server
    host: skipper-server
    memory: 1G
    disk_quota: 1G
    instances: 1
    timeout: 180
    buildpack: java_buildpack
    path: <PATH TO THE DOWNLOADED SKIPPER SERVER UBER-JAR>
    env:
      SPRING_APPLICATION_NAME: skipper-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_APPLICATION_JSON: |-
        {
          "spring.cloud.skipper.server" : {
             "platform.cloudfoundry.accounts":  {
                   "default": {
                       "connection" : {
                           "url" : <cf-api-url>,
                           "domain" : <cf-apps-domain>,
                           "org" : <org>,
                           "space" : <space>,
                           "username": <email>,
                           "password" : <password>,
                           "skipSsValidation" : false 
                       }
                       "deployment" : {
                           "deleteRoutes" : false,
                           "services" : "rabbitmq",
                           "enableRandomAppNamePrefix" : false,
                           "memory" : 2048
                       }
                  },
                  "kafkazone": {
                     "connection" : {
                         "url" : <cf-api-url>,
                         "domain" : <cf-apps-domain>,
                         "org" : kafka-org,
                         "space" : kafka-space,
                         "username": <email>,
                         "password" : <password>,
                         "skipSsValidation" : false 
                     }
                     "deployment" : {
                         "deleteRoutes" : false,
                         "services" : "kafkacups",
                         "enableRandomAppNamePrefix" : false,
                         "memory" : 3072
                     }
                  }
              }
           }
        }
services:
  - <services>

Notice that the inclusion of a platform account with the name kafkazone. Also, the default memory for the deployed application is set to 3GB.

For batch data pipelines, however, you'd have to change the configurations in SCDF's manifest.yml file.

applications:
  - name: data-flow-server
    host: data-flow-server
    memory: 2G
    disk_quota: 2G
    instances: 1
    path: { PATH TO SERVER UBER-JAR }
    env:
      SPRING_APPLICATION_NAME: data-flow-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI: https://<skipper-host-name>/api
      SPRING_APPLICATION_JSON: |-
        {
           "maven" : {
               "remoteRepositories" : {
                  "repo1" : {
                    "url" : "https://repo.spring.io/libs-snapshot"
                  }
               }
           }, 
           "spring.cloud.dataflow" : {
                "task.platform.cloudfoundry.accounts" : {
                    "default" : {
                        "connection" : {
                            "url" : <cf-api-url>,
                            "domain" : <cf-apps-domain>,
                            "org" : <org>,
                            "space" : <space>,
                            "username" : <email>,
                            "password" : <password>,
                            "skipSsValidation" : true 
                        }
                        "deployment" : {
                          "services" : "postgresSQL"
                        }
                    },
                    "highmemory" : {
                        "connection" : {
                            "url" : <cf-api-url>,
                            "domain" : <cf-apps-domain>,
                            "org" : highmemory-org,
                            "space" : highmemory-space,
                            "username" : <email>,
                            "password" : <password>,
                            "skipSsValidation" : true 
                        }
                        "deployment" : {
                          "services" : "postgresSQL",
                          "memory" : 5120
                        }
                    }
                }
           }
        }
services:
  - postgresSQL

Notice that the inclusion of a platform account with the name highmemory. Also, the default memory for the deployed application is set to 5GB.

List the available platforms.

dataflow:>stream platform-list
╔═════════╤════════════╤════════════════════════════════════════════════════════════════════════════╗
║  Name   │    Type    │                               Description                                  ║
╠═════════╪════════════╪════════════════════════════════════════════════════════════════════════════╣
║default  │cloudfoundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]  ║
║kafkazone│cloudfoundry│org = [kafka-org], space = [kafka-space], url = [https://api.run.pivotal.io]║
╚═════════╧════════════╧════════════════════════════════════════════════════════════════════════════╝

dataflow:>task platform-list
╔═════════════╤═════════════╤══════════════════════════════════════════════════════════════════════════════════════╗
║Platform Name│Platform Type│                               Description                                            ║
╠═════════════╪═════════════╪══════════════════════════════════════════════════════════════════════════════════════╣
║default      │Cloud Foundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]            ║
║highmemory   │Cloud Foundry│org = [highmemory-org], space = [highmemory-space], url = [https://api.run.pivotal.io]║
╚═════════════╧═════════════╧══════════════════════════════════════════════════════════════════════════════════════╝

Create a stream.

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

Deploy a stream.

dataflow:>stream deploy --name foo --platformName kafkazone

Verify deployment.

cf apps
Getting apps in org kafka-org / space kafka-space as [email protected]...
OK

name                           requested state   instances   memory   disk   urls
j6wQUU3-foo-predict-v1          started           1/1         3G       1G     j6wQUU3-foo-predict-v1.cfapps.io
j6wQUU3-foo-cardata-v1          started           1/1         3G       1G     j6wQUU3-foo-cardata-v1.cfapps.io
j6wQUU3-foo-cassandra-v1        started           1/1         3G       1G     j6wQUU3-foo-cassandra-v1.cfapps.io

Alternatively, the platform dropdown in the SCDF Dashboard can be used to make the selection to create and launch Tasks.

Mixing Cloud Foundry and Kubernetes Deployments

There are cases when you want to orchestrate a deployment model where specific workloads are deployed to Kubernetes, and the rest in Cloud Foundry. After all, both the platforms offer different levels of support from the runtime perspective and having the flexibility to deploy the workloads to different platforms is an added advantage.

Imagine a scenario with Spring Cloud Data Flow is running on Cloud Foundry. Only by configuration settings, it is also possible to define and stage one or many Kubernetes accounts within the same SCDF instance. This flexibility opens up compelling deployment scenarios where the streaming and batch data pipelines can be deployed to a variety of platforms!

Let's take the same Cloud Foundry scenario. Apart from the default and highmemory platform accounts, you will notice the gpuzone as another account in Skipper's manifest.yml below.

applications:
  - name: skipper-server
    host: skipper-server
    memory: 1G
    disk_quota: 1G
    instances: 1
    timeout: 180
    buildpack: java_buildpack
    path: <PATH TO THE DOWNLOADED SKIPPER SERVER UBER-JAR>
    env:
      SPRING_APPLICATION_NAME: skipper-server
      SPRING_PROFILES_ACTIVE: cloud
      JBP_CONFIG_SPRING_AUTO_RECONFIGURATION: '{enabled: false}'
      SPRING_APPLICATION_JSON: |-
        {
          "spring.cloud.skipper.server" : {
             "platform.cloudfoundry.accounts":  {
                   "default": {
                       "connection" : {
                           "url" : <cf-api-url>,
                           "domain" : <cf-apps-domain>,
                           "org" : <org>,
                           "space" : <space>,
                           "username": <email>,
                           "password" : <password>,
                           "skipSsValidation" : false
                       }
                       "deployment" : {
                           "deleteRoutes" : false,
                           "services" : "rabbitmq",
                           "enableRandomAppNamePrefix" : false,
                           "memory" : 2048
                       }
                  },
                  "kafkazone": {
                     "connection" : {
                         "url" : <cf-api-url>,
                         "domain" : <cf-apps-domain>,
                         "org" : kafka-org,
                         "space" : kafka-space,
                         "username": <email>,
                         "password" : <password>,
                         "skipSsValidation" : false
                     }
                     "deployment" : {
                         "deleteRoutes" : false,
                         "services" : "kafkacups",
                         "enableRandomAppNamePrefix" : false,
                         "memory" : 3072
                     }
                  }
              }
           },
           "platform.kubernetes.accounts":  {
                   "gpuzone": {
                       "fabric8" : {
                           "masterUrl" : <k8s-master-api-url>,
                           "namespace" : "gpuzone-namespace",
                           "trustCerts" : "true"
                  }
              }
           }
        }
services:
  - <services>

In this case, the gpuzone is targeting the GPU VM node-pool in Kubernetes. With simple declarative configuration, the same SCDF instance is now ready to deploy streaming and batch data pipelines to three different compute environments.

With this setup, you have an option to choose between three platform accounts (default, highmemory, and gpuzone) to deploying the streaming or batch data pipelines.

List the available platforms.

dataflow:>stream platform-list
╔═════════╤════════════╤═══════════════════════════════════════════════════════════════════════════════════════════╗
║  Name   │    Type    │                               Description                                                 ║
╠═════════╪════════════╪═══════════════════════════════════════════════════════════════════════════════════════════╣
║default  │cloudfoundry│org = [scdf-%%], space = [space-%%%%%], url = [https://api.run.pivotal.io]                 ║
║kafkazone│cloudfoundry│org = [kafka-org], space = [kafka-space], url = [https://api.run.pivotal.io]               ║
║gpuzone  │kubernetes  │master url = [https://10.0.0.1:443/], namespace = [gpuzone-namespace], api version = [v1]  ║
╚═════════╧════════════╧═══════════════════════════════════════════════════════════════════════════════════════════╝

Create a stream.

dataflow:>stream create foo --definition "cardata | predict | cassandra"
Created new stream 'foo'

Deploy a stream.

dataflow:>stream deploy --name foo --platformName gpuzone

Verify new pods in Kubernetes.

kubectl get pods -n gpuzone-namespace
NAME                                READY   STATUS    RESTARTS   AGE
foo-cassandra-v1-aakhslff-94kw4     1/1     Running   0          73s
foo-cardata-v1-fdalsssdf2-cmrr2     1/1     Running   0          73s
foo-predict-v1-p1j35435-tcdkd       1/1     Running   0          73s

No new applications should be deployed in Cloud Foundry, however. Let's verify.

cf apps
Getting apps in org scdf-%%% / space space-%%%%% as $$$$$@com.io...
OK

name                         requested state   instances   memory   disk   urls
sabby-skipper                started           1/1         1G       1G     sabby-skipper.....
sabby-test-dataflow-server   started           1/1         1G       1G     sabby-test-dataflow-server....