Waxctl for Platform
Waxctl and the Bytewax Platform
Waxctl is the Bytewax Command Line Interface. That tool helps you manage your dataflows in Kubernetes, AWS EC2, GCP VMs, and in your local machine.
That said, Waxctl works with open-source Bytewax dataflows or the Bytewax Platform dataflows.
Deploy on the Platform
To deploy a dataflow to a Kubernetes cluster using Waxctl, you must run the waxctl df deploy
command.
The following are special flags for that command only available on a Kubernetes cluster with the Bytewax Platform installed:
--concurrency-policy string specifies how to treat concurrent executions of a Scheduled Dataflow (the value must be Allow, Forbid or Replace) (default "Forbid")
--platform deploy the dataflow as a bytewax.io/dataflow Custom Resource (requires Bytewax Platform installed)
--recovery stores recovery files in Kubernetes persistent volumes to allow resuming after a restart (your dataflow must have recovery enabled: https://bytewax.io/docs/getting-started/recovery)
--recovery-backup back up worker state DBs to cloud storage (must have recovery flag present and provide S3 parameters)
--recovery-backup-interval int System time duration in seconds to keep extra state snapshots around
--recovery-backup-s3-aws-access-key-id string AWS credentials access key id
--recovery-backup-s3-aws-secret-access-key string AWS credentials secret access key
--recovery-backup-s3-k8s-secret string name of the Kubernetes secret storing AWS credentials.
--recovery-backup-s3-url string s3 url to store state backups. For example, s3://mybucket/mydataflow-state-backups
--recovery-parts int number of recovery parts (default 1)
--recovery-single-volume use only one persistent volume for all dataflow's pods in Kubernetes
--recovery-size string size of the persistent volume claim to be assign to each dataflow pod in Kubernetes (default "10Gi")
--recovery-snapshot-interval int system time duration in seconds to snapshot state for recovery
--recovery-storageclass string storage class of the persistent volume claim to be assign to each dataflow pod in Kubernetes
--schedule string dataflow schedule in Cron format, see https://en.wikipedia.org/wiki/Cron