- Introduction
-
Welcome! 0 hr 4 min
- Apache Airflow® and Gen AI
-
Introduction to GenAI 0 hr 6 min
-
Airflow for GenAI 0 hr 8 min
-
RAG Project Components 0 hr 3 min
- Project overview
-
Setup the RAG Project 0 hr 4 min
-
RAG DAG Structure 0 hr 2 min
- Build the RAG DAG
-
DAG Definition 0 hr 4 min
-
TaskFlow API 0 hr 2 min
-
Connect to Weaviate 0 hr 3 min
-
Airflow Branching 0 hr 4 min
-
Implementing Dynamic Task mapping 0 hr 4 min
-
Extracting and Processing Text 0 hr 2 min
-
Embedding with Weaviate 0 hr 4 min
- Modify the Front-end Application with Streamlit
-
The Streamlit Application 0 hr 3 min
- Running the GenAI project
-
Adapt and run the pipeline 0 hr 2 min
- Real-world RAG Pipeline
-
Explore Ask Astro 0 hr 3 min
- Wrap Up
-
Review
-
Resources
-
How was it?
Introduction to GenAI with Apache Airflow®
Create a retrieval augmented generation (RAG) application with Airflow and run it locally.
Welcome to the Introduction to GenAI with Apache Airflow® Module!
In this module, you will learn everything needed to create a retrieval augmented generation (RAG) application with Airflow.
We will begin by introducing Generative AI (GenAI): what it is and how Airflow helps power GenAI applications. Next, we will cover the module project, a content generation application using a Streamlit frontend that creates custom text based on local data. The application's data will be made available using an Airflow pipeline consisting of one DAG that ingests local files, chunks the text using LangChain, embeds it using OpenAI, and stores it into a Weaviate vector database.
After covering the basics and explaining how the Airflow DAG powers the RAG application, you will clone a GitHub repository with a pre-built Airflow environment in which we will build the DAG step-by-step. While building the DAG, you'll learn about key Airflow features that are useful for GenAI applications such as the TaskFlow API, advanced DAG parameters, Airflow branching, and dynamic task mapping.
Finally, we will conclude the module by adapting the application for a custom use case and by exploring a real-world RAG application: AskAstro; a chatbot with advanced knowledge about Airflow and Astronomer.
🎯 Learning Objectives
At the end of this module, you'll be able to:
- Explain what GenAI is, the challenges associated with GenAI pipelines, and how Airflow addresses those challenges.
- Identify the high-level structural components of a RAG content generation application.
- Use the TaskFlow API to write Airflow DAGs in a pythonic way for the RAG pipeline.
- Use the Airflow Weaviate provider to interact with your Weaviate instance for the RAG pipeline.
- Employ Airflow branching to allow for conditional execution of Airflow tasks for the RAG pipeline.
- Use LangChain to chunk extracted text from local files for the RAG pipeline.
- Use dynamic task mapping to create DAGs that can adapt to changing input at runtime for the RAG pipeline.
📝 Prerequisites
To complete this module, we recommend you have:
- Basic Airflow
- Basic Python
- Basic GitHub
- (Optional) Basic Pandas