Apache Airflow

What is apache airflow and why airflow is important for spark?

Apache Airflow is an open-source workflow management platform.

It started at Airbnb in October 2014, and was later open source to the apache community.

Apache airflow is used to define and schedule the data pipelines.

In production, most people use apache airflow to schedule the spark job.

Apache airflow python to define the workflows.

Airflow used PostgreSQL to store the metadata information.

Apache airflow also provides an excellent UI to view and trigger the data pipelines.

Apache airflow can run on standalone VM, docker, and k8s.