![]() So you would use this operator instead of using the Helm chart to deploy Kubernetes itself. This name is quite confusing, as operator here refers to a controller for an application on Kubernetes, not an Airflow Operator that describes a task. There is work by Google on a Kubernetes Operator for Airflow. Helm install -namespace "airflow " -name "airflow " stable/airflow Kubernetes Operator The 'official' chart probably covers most deployment options, including Celery and non-Kubernetes options, while the others may be more opinionated (and focused). It's a bit unfortunate that the community has not yet arrived at a canonical Chart, so you'll have to try your luck. There are several helm charts to install Airflow on kubernetes: There is some work in this area, but it is not completely finished yet. Next you need to create some Kubernetes manifests, or a Helm chart, to deploy the Docker image on Kubernetes. At this point the Airflow community is lacking a canonical Docker image.Īnother option may be the image by astronomer.io, though I cannot find the Dockerfile source, so at this point I'm hesitant to run it: Many people are forking this repo and updating it themselves. While puckel/docker-airflow is widely used, it isn't updated very often anymore, so it is lagging behind Airflow releases a bit. The image has an entrypoint script that allows the container to fulfill the role of scheduler, webserver, flower, or worker. This means that you do not need to build your own image when you are first starting out. This is a robust Docker image that is up to date and also has an automated build set up, pushing images to Dockerhub. Most people seem to use puckel/docker-airflow. Docker imagesįirst thing you need is a Docker image that packages Airflow. In that case, you'll probably want Flower (a UI for Celery) and you need a queue, like RabbitMQ or Redis. If you want to distribute workers, you may want to use the CeleryExecutor. The main components are the scheduler, the webserver, and workers. Running Airflow on KubernetesĪirflow is implemented in a modular way. I think eventually this can replace the CeleryExecutor for many installations.ĭocumentation for the new Operator can be found here. So your workers end up hosting the combination of all dependencies of all your DAGs. Previously, if your task requires some python library or other dependency, you'll need to install that on the workers. The cool thing about this Operator will be that you can define custom Docker images per task. In the next release of Airflow (1.10), a new Operator will be introduced that leads to a better, native integration of Airflow with Kubernetes. KubernetesPodOperator (coming in 1.10)Ī subset of functionality will be released earlier, according to AIRFLOW-1517. This is still work in progress so deploying it should probably not be done in production. ![]() Progress can be tracked in Jira ( AIRFLOW-1314).ĭevelopment is being done in a fork of Airflow at bloomberg/airflow. The wiki contains a discussion about what this will look like, though the pages haven't been updated in a while. Work is in progress that should lead to native support by Airflow for scheduling jobs on Kubernetes. The Helm chart mentioned below does this. However, you can also deploy your Celery workers on Kubernetes. The simplest way to achieve this right now, is by using the kubectl commandline utility (in a BashOperator) or the python sdk. The reason that I make this distinction is that you typically need to perform some different steps for each scenario. And of course you can run them in Kubernetes and deploy to Kubernetes as well. Or you can host them on Kubernetes, but deploy somewhere else, like on a VM. you can use Jenkins or Gitlab (buildservers) on a VM, but use them to deploy on Kubernetes. You can actually replace Airflow with X, and you will see this pattern all the time. Using Airflow to schedule jobs on Kubernetes.There are some related, but different scenarios: Please do not hesitate to provide updates, suggestions, fixes etc. Because things move quickly, I've decided to put this on Github rather than in a blog post, so it can be easily updated. Here I write down what I've found, in the hope that it is helpful to others. Also, there are many forks and abandoned scripts and repositories. While there are reports of people using them together, I could not find any comprehensive guide or tutorial. Recently I spend quite some time diving into Airflow and Kubernetes.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |