Introduction to GenAI with Apache Airflow®

📚 About this Module

Welcome to the Introduction to GenAI with Apache Airflow® Module!

In this module, you will learn everything needed to create a retrieval augmented generation (RAG) application with Airflow.

We will begin by introducing Generative AI (GenAI): what it is and how Airflow helps power GenAI applications. Next, we will cover the module project, a content generation application using a Streamlit frontend that creates custom text based on local data. The application's data will be made available using an Airflow pipeline consisting of one DAG that ingests local files, chunks the text using LangChain, embeds it using OpenAI, and stores it into a Weaviate vector database.

After covering the basics and explaining how the Airflow DAG powers the RAG application, you will clone a GitHub repository with a pre-built Airflow environment in which we will build the DAG step-by-step. While building the DAG, you'll learn about key Airflow features that are useful for GenAI applications such as the TaskFlow API, advanced DAG parameters, Airflow branching, and dynamic task mapping.

Finally, we will conclude the module by adapting the application for a custom use case and by exploring a real-world RAG application: AskAstro; a chatbot with advanced knowledge about Airflow and Astronomer.

🎯 Learning Objectives

At the end of this module, you'll be able to:

Explain what GenAI is, the challenges associated with GenAI pipelines, and how Airflow addresses those challenges.
Identify the high-level structural components of a RAG content generation application.
Use the TaskFlow API to write Airflow DAGs in a pythonic way for the RAG pipeline.
Use the Airflow Weaviate provider to interact with your Weaviate instance for the RAG pipeline.
Employ Airflow branching to allow for conditional execution of Airflow tasks for the RAG pipeline.
Use LangChain to chunk extracted text from local files for the RAG pipeline.
Use dynamic task mapping to create DAGs that can adapt to changing input at runtime for the RAG pipeline.

📝 Prerequisites

To complete this module, we recommend you have:

Basic Airflow
Basic Python
Basic GitHub
(Optional) Basic Pandas

Syllabus0 hr 56 min

Introduction
Welcome! 0 hr 4 min
Apache Airflow® and Gen AI
Introduction to GenAI 0 hr 6 min
Airflow for GenAI 0 hr 8 min
RAG Project Components 0 hr 3 min
Project overview
Setup the RAG Project 0 hr 4 min
RAG DAG Structure 0 hr 2 min
Build the RAG DAG
DAG Definition 0 hr 4 min
TaskFlow API 0 hr 2 min
Connect to Weaviate 0 hr 3 min
Airflow Branching 0 hr 4 min
Implementing Dynamic Task mapping 0 hr 4 min
Extracting and Processing Text 0 hr 2 min
Embedding with Weaviate 0 hr 4 min
Modify the Front-end Application with Streamlit
The Streamlit Application 0 hr 3 min
Running the GenAI project
Adapt and run the pipeline 0 hr 2 min
Real-world RAG Pipeline
Explore Ask Astro 0 hr 3 min
Wrap Up
Review
Resources
How was it?

📚 About this Module

Welcome to the Introduction to GenAI with Apache Airflow® Module!

In this module, you will learn everything needed to create a retrieval augmented generation (RAG) application with Airflow.

🎯 Learning Objectives

At the end of this module, you'll be able to:

Explain what GenAI is, the challenges associated with GenAI pipelines, and how Airflow addresses those challenges.
Identify the high-level structural components of a RAG content generation application.
Use the TaskFlow API to write Airflow DAGs in a pythonic way for the RAG pipeline.
Use the Airflow Weaviate provider to interact with your Weaviate instance for the RAG pipeline.
Employ Airflow branching to allow for conditional execution of Airflow tasks for the RAG pipeline.
Use LangChain to chunk extracted text from local files for the RAG pipeline.
Use dynamic task mapping to create DAGs that can adapt to changing input at runtime for the RAG pipeline.

📝 Prerequisites

To complete this module, we recommend you have:

Basic Airflow
Basic Python
Basic GitHub
(Optional) Basic Pandas

Syllabus0 hr 56 min

Introduction
Welcome! 0 hr 4 min
Apache Airflow® and Gen AI
Introduction to GenAI 0 hr 6 min
Airflow for GenAI 0 hr 8 min
RAG Project Components 0 hr 3 min
Project overview
Setup the RAG Project 0 hr 4 min
RAG DAG Structure 0 hr 2 min
Build the RAG DAG
DAG Definition 0 hr 4 min
TaskFlow API 0 hr 2 min
Connect to Weaviate 0 hr 3 min
Airflow Branching 0 hr 4 min
Implementing Dynamic Task mapping 0 hr 4 min
Extracting and Processing Text 0 hr 2 min
Embedding with Weaviate 0 hr 4 min
Modify the Front-end Application with Streamlit
The Streamlit Application 0 hr 3 min
Running the GenAI project
Adapt and run the pipeline 0 hr 2 min
Real-world RAG Pipeline
Explore Ask Astro 0 hr 3 min
Wrap Up
Review
Resources
How was it?

Introduction to GenAI with Apache Airflow®

Create a retrieval augmented generation (RAG) application with Airflow and run it locally.

📚 About this Module

🎯 Learning Objectives

📝 Prerequisites

Syllabus0 hr 56 min

🎯 Learning Objectives

📝 Prerequisites