Dataflow CRD
API Reference
Packages:
bytewax.io/v1alpha1
Resource Types:
Dataflow
Dataflow is the Schema for the dataflows API
Name | Type | Description | Required |
---|---|---|---|
apiVersion | string | bytewax.io/v1alpha1 | true |
kind | string | Dataflow | true |
metadata | object | Refer to the Kubernetes API documentation for the fields of the `metadata` field. | true |
spec | object |
DataflowSpec defines the desired state of Dataflow |
false |
status | object |
DataflowStatus defines the observed state of Dataflow |
false |
Dataflow.spec
DataflowSpec defines the desired state of Dataflow
Name | Type | Description | Required |
---|---|---|---|
image | object |
Dataflow container image settings |
true |
pythonFileName | string |
Python script file to run |
true |
artifactsDownload | object |
Downloads a tar file from a URL. It could be a public URL or a private GitHub repository |
false |
chartValues | string |
Advanced Bytewax helm chart values. See more at https://bytewax.github.io/helm-charts |
false |
concurrencyPolicy | string |
Specifies how to treat concurrent executions of a Scheduled Dataflow. Valid values are: - "Allow": allows Dataflows to run concurrently; - "Forbid": (default) forbids concurrent runs, skipping next run if previous run hasn't finished yet; - "Replace": cancels currently running Dataflow and replaces it with a new one |
false |
configMapName | string |
Dataflow Configmap name |
false |
dependencies | []string |
Python dependencies needed to run the dataflow (use pip syntax, e.g. package-name==0.1.0) |
false |
env | []object |
Environment variables to inject to dataflow container |
false |
jobMode | boolean |
Run a Job in Kubernetes instead of an Statefulset. Use this to batch processing Default: false |
false |
keepAlive | boolean |
Keep the Dataflow container alive after dataflow ends. It could be useful for troubleshooting purposes Default: false |
false |
processesCount | integer |
Number of processes to run Default: 1 |
false |
recovery | object |
Stores recovery files in Kubernetes persistent volumes to allow resuming after a restart (your dataflow must have recovery enabled: https://bytewax.io/docs/getting-started/recovery) |
false |
schedule | string |
Dataflow schedule in Cron format, see https://en.wikipedia.org/wiki/Cron |
false |
suspend | boolean |
Suspends the Dataflow execution. For Scheduled Dataflows, it will suspend subsequent executions, it does not apply to already started executions. Default: false |
false |
tarName | string |
Tar file name stored in the dataflow configmap |
false |
workersPerProcess | integer |
Number of workers to run in each process Default: 1 |
false |
Dataflow.spec.image
Dataflow container image settings
Name | Type | Description | Required |
---|---|---|---|
tag | string |
Container image tag |
true |
pullPolicy | string |
Container image pull policy (the value must be Always, IfNotPresent or Never) Default: Always |
false |
pullSecret | string |
Kubernetes secret name to pull images Default: default-credentials |
false |
repository | string |
Container image repository URI |
false |
Dataflow.spec.artifactsDownload
Downloads a tar file from a URL. It could be a public URL or a private GitHub repository
Name | Type | Description | Required |
---|---|---|---|
url | string |
Url of the tar file to download |
true |
secretName | string |
Name of the Kubernetes secret storing Personal Access Token. It must contain the key TOKEN |
false |
token | string |
Personal Access Token |
false |
Dataflow.spec.env[index]
Name | Type | Description | Required |
---|---|---|---|
name | string |
Environment variable name |
true |
value | string |
Environment variable value |
true |
Dataflow.spec.recovery
Stores recovery files in Kubernetes persistent volumes to allow resuming after a restart (your dataflow must have recovery enabled: https://bytewax.io/docs/getting-started/recovery)
Name | Type | Description | Required |
---|---|---|---|
backupInterval | integer |
System time duration in seconds to keep extra state snapshots around Default: 1 |
false |
cloudBackup | object |
Back up worker state DBs to cloud storage |
false |
enabled | boolean |
Enable Dataflow recovery feature Default: false |
false |
partsCount | integer |
Number of recovery partitions Default: 1 |
false |
persistence | object |
Kubernetes Persistence settings |
false |
singleVolume | boolean |
Use only one persistent volume for all dataflow's pods in Kubernetes Default: false |
false |
snapshotInterval | integer |
System time duration in seconds to snapshot state for recovery Default: 1 |
false |
Dataflow.spec.recovery.cloudBackup
Back up worker state DBs to cloud storage
Name | Type | Description | Required |
---|---|---|---|
enabled | boolean |
Enables Cloud Backup feature Default: false |
false |
s3 | object |
Cloud Backup S3 settings |
false |
Dataflow.spec.recovery.cloudBackup.s3
Cloud Backup S3 settings
Name | Type | Description | Required |
---|---|---|---|
url | string |
S3 url to store state backups. For example, s3://mybucket/mydataflow-state-backups |
true |
accessKeyId | string |
AWS credentials access key id |
false |
secretAccessKey | string |
AWS credentials secret access key |
false |
secretName | string |
Name of the Kubernetes Secret that stores AWS credentials. It must contain the keys LITESTREAM_ACCESS_KEY_ID and LITESTREAM_SECRET_ACCESS_KEY |
false |
Dataflow.spec.recovery.persistence
Kubernetes Persistence settings
Name | Type | Description | Required |
---|---|---|---|
size | string |
Size of the persistent volume claim to be assign to each dataflow pod in Kubernetes Default: 10Gi |
false |
storageClassName | string |
Storage class of the persistent volume claim to be assign to each dataflow pod in Kubernetes |
false |
Dataflow.status
DataflowStatus defines the observed state of Dataflow
Name | Type | Description | Required |
---|---|---|---|
conditions | []object |
INSERT ADDITIONAL STATUS FIELD - define observed state of cluster Important: Run "make" to regenerate code after modifying this file |
false |
Dataflow.status.conditions[index]
Condition contains details for one aspect of the current state of this API Resource. --- This struct is intended for direct use as an array at the field path .status.conditions. For example,
type FooStatus struct{ // Represents the observations of a foo's current state. // Known .status.conditions.type are: "Available", "Progressing", and "Degraded" // +patchMergeKey=type // +patchStrategy=merge // +listType=map // +listMapKey=type Conditions []metav1.Condition json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"
// other fields }
Name | Type | Description | Required |
---|---|---|---|
lastTransitionTime | string |
lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. Format: date-time |
true |
message | string |
message is a human readable message indicating details about the transition. This may be an empty string. |
true |
reason | string |
reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty. |
true |
status | enum |
status of the condition, one of True, False, Unknown. Enum: True, False, Unknown |
true |
type | string |
type of condition in CamelCase or in foo.example.com/CamelCase. --- Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be useful (see .node.status.conditions), the ability to deconflict is important. The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt) |
true |
observedGeneration | integer |
observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance. Format: int64 Minimum: 0 |
false |