Dataflow CRD

Name	Type	Description	Required
apiVersion	string	bytewax.io/v1alpha1	true
kind	string	Dataflow	true
metadata	object	Refer to the Kubernetes API documentation for the fields of the `metadata` field.	true
spec	object	DataflowSpec defines the desired state of Dataflow	false
status	object	DataflowStatus defines the observed state of Dataflow	false

Name	Type	Description	Required
image	object	Dataflow container image settings	true
pythonFileName	string	Python script file to run	true
artifactsDownload	object	Downloads a tar file from a URL. It could be a public URL or a private GitHub repository	false
chartValues	string	Advanced Bytewax helm chart values. See more at https://bytewax.github.io/helm-charts	false
concurrencyPolicy	string	Specifies how to treat concurrent executions of a Scheduled Dataflow. Valid values are: - "Allow": allows Dataflows to run concurrently; - "Forbid": (default) forbids concurrent runs, skipping next run if previous run hasn't finished yet; - "Replace": cancels currently running Dataflow and replaces it with a new one	false
configMapName	string	Dataflow Configmap name	false
dependencies	[]string	Python dependencies needed to run the dataflow (use pip syntax, e.g. package-name==0.1.0)	false
env	[]object	Environment variables to inject to dataflow container	false
jobMode	boolean	Run a Job in Kubernetes instead of an Statefulset. Use this to batch processing Default: false	false
keepAlive	boolean	Keep the Dataflow container alive after dataflow ends. It could be useful for troubleshooting purposes Default: false	false
processesCount	integer	Number of processes to run Default: 1	false
recovery	object	Stores recovery files in Kubernetes persistent volumes to allow resuming after a restart (your dataflow must have recovery enabled: https://bytewax.io/docs/getting-started/recovery)	false
schedule	string	Dataflow schedule in Cron format, see https://en.wikipedia.org/wiki/Cron	false
suspend	boolean	Suspends the Dataflow execution. For Scheduled Dataflows, it will suspend subsequent executions, it does not apply to already started executions. Default: false	false
tarName	string	Tar file name stored in the dataflow configmap	false
workersPerProcess	integer	Number of workers to run in each process Default: 1	false

Name	Type	Description	Required
tag	string	Container image tag	true
pullPolicy	string	Container image pull policy (the value must be Always, IfNotPresent or Never) Default: Always	false
pullSecret	string	Kubernetes secret name to pull images Default: default-credentials	false
repository	string	Container image repository URI	false

Name	Type	Description	Required
url	string	Url of the tar file to download	true
secretName	string	Name of the Kubernetes secret storing Personal Access Token. It must contain the key TOKEN	false
token	string	Personal Access Token	false

Name	Type	Description	Required
name	string	Environment variable name	true
value	string	Environment variable value	true

Name	Type	Description	Required
backupInterval	integer	System time duration in seconds to keep extra state snapshots around Default: 1	false
cloudBackup	object	Back up worker state DBs to cloud storage	false
enabled	boolean	Enable Dataflow recovery feature Default: false	false
partsCount	integer	Number of recovery partitions Default: 1	false
persistence	object	Kubernetes Persistence settings	false
singleVolume	boolean	Use only one persistent volume for all dataflow's pods in Kubernetes Default: false	false
snapshotInterval	integer	System time duration in seconds to snapshot state for recovery Default: 1	false

Name	Type	Description	Required
enabled	boolean	Enables Cloud Backup feature Default: false	false
s3	object	Cloud Backup S3 settings	false

Name	Type	Description	Required
url	string	S3 url to store state backups. For example, s3://mybucket/mydataflow-state-backups	true
accessKeyId	string	AWS credentials access key id	false
secretAccessKey	string	AWS credentials secret access key	false
secretName	string	Name of the Kubernetes Secret that stores AWS credentials. It must contain the keys LITESTREAM_ACCESS_KEY_ID and LITESTREAM_SECRET_ACCESS_KEY	false

Name	Type	Description	Required
size	string	Size of the persistent volume claim to be assign to each dataflow pod in Kubernetes Default: 10Gi	false
storageClassName	string	Storage class of the persistent volume claim to be assign to each dataflow pod in Kubernetes	false

Name	Type	Description	Required
conditions	[]object	INSERT ADDITIONAL STATUS FIELD - define observed state of cluster Important: Run "make" to regenerate code after modifying this file	false

Condition contains details for one aspect of the current state of this API Resource. --- This struct is intended for direct use as an array at the field path .status.conditions. For example, type FooStatus struct{ // Represents the observations of a foo's current state. // Known .status.conditions.type are: "Available", "Progressing", and "Degraded" // +patchMergeKey=type // +patchStrategy=merge // +listType=map // +listMapKey=type Conditions []metav1.Condition json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions" // other fields }

Name	Type	Description	Required
lastTransitionTime	string	lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. Format: date-time	true
message	string	message is a human readable message indicating details about the transition. This may be an empty string.	true
reason	string	reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.	true
status	enum	status of the condition, one of True, False, Unknown. Enum: True, False, Unknown	true
type	string	type of condition in CamelCase or in foo.example.com/CamelCase. --- Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be useful (see .node.status.conditions), the ability to deconflict is important. The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)	true
observedGeneration	integer	observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance. Format: int64 Minimum: 0	false

# API Reference

# bytewax.io/v1alpha1

# Dataflow

# Dataflow.spec

# Dataflow.spec.image

# Dataflow.spec.artifactsDownload

# Dataflow.spec.env[index]

# Dataflow.spec.recovery

# Dataflow.spec.recovery.cloudBackup

# Dataflow.spec.recovery.cloudBackup.s3

# Dataflow.spec.recovery.persistence

# Dataflow.status

# Dataflow.status.conditions[index]

Platform Helm Chart