Apache Airflow
What is apache airflow and why airflow is important for spark?
Apache Airflow
is an open-source workflow management platform.
It started at
Airbnb
in October 2014, and was later open source to the apache community.
Apache airflow
is used to define and schedule the
data pipelines
.
In production, most people use apache airflow to schedule the
spark job
.
Apache airflow
python
to define the workflows.
Airflow used
PostgreSQL
to store the metadata information.
Apache
airflow also provides an excellent UI to view and trigger the data pipelines.
Apache airflow can run on standalone
VM, docker, and k8s
.
Thank You
Read More:
Click Here