Generative AI

Build a Chatbot using NeMo guardrails!

InfoServices Gen AI Team

Jun 13, 2024 • 9 min read

Large Language Models (LLMs) are incredibly powerful, but at the same time bit unpredictable. They can respond to anything, from product FAQs to philosophical rants on r/AskPolitics. While this flexibility is impressive, it also introduces risks. From hallucinated facts to inappropriate replies, LLM's can quickly veer off course. That’s where AI safety guardrails come in.

In this blog, we'll walk you through how we built a mock AI chatbot (using a sample dataset) without any prompt engineering, but with robust NeMo Guardrails and LangChain to enforce output alignment and safety. Think of it as creating a chatbot with a seatbelt, airbags, and GPS, ensuring it behaves predictably and stays on-topic.

Why Guardrails Matter in Chatbots

Building an LLM-powered chatbot isn’t just about answering questions, it’s about doing it safely and responsibly. With frameworks like NeMo Guardrails, developers can ensure their AI responds within predefined ethical, policy, or business logic boundaries.

Whether you're using LangChain NeMo Guardrails or integrating them into AWS chatbot guardrails, this approach adds essential layers of AI safety and alignment. In real-world deployments, it's not enough for a chatbot to be clever—it also needs to be compliant.

According to a 2024 Gartner report, over 70% of enterprises deploying GenAI will require strict output guardrails by 2026. That’s why we’re not just experimenting with safety—we’re building it in from the start.

What is NeMo Guardrails?

NeMo Guardrails is an open-source framework developed by NVIDIA to help you define and enforce AI behavior. But what exactly is NeMo Guardrails?

In simple terms, it's a control system that guides your chatbot’s responses based on a set of predefined "rails"—covering safety, security, and policy compliance. If you're looking to truly understand how AI guardrails work, this section breaks down the core ideas behind the NeMo define process and how it differs from other chatbot guardrails available today.

You write rules in Colang, a conversational DSL, to say things like:

Don’t respond to political questions
Avoid discussing personal data
Use a friendly, helpful tone

You can think of it as writing mini-conversation scripts. NeMo Guardrails listens in on every interaction, enforces the rules, and hands off clean input/output to your LLM.

LangChain + NeMo Guardrails Integration

Combining LangChain with NeMo Guardrails allows developers to build modular, multi-step workflows that follow safety protocols. For example, a nemo guardrails langchain example could involve enforcing tone constraints in customer support bots or blocking attempts to manipulate the chatbot.

By using this duo, you gain full control over conversational flow and alignment, especially when handling sensitive queries in sectors like finance, healthcare, and education.

Want to go deeper?

This combo supports RAG chatbot alignment using NeMo Guardrails for pre/post filtering
You can use it on any cloud, whether it’s AWS chatbot guardrails or guardrails Azure OpenAI
You can build context-aware, compliant bots without prompt engineering

This approach isn’t just for demos. it’s being used in real AI guardrails tutorials across industries.

Architecture:

Data Ingestion

2. Retrieval and Generation

Reference: medium.com

Ok, now let’s build a Chatbot:

Setup a python project on any IDE like VSCode, IntelliJ, etc.
1. 1. Create a new conda environment with python 3.11 and install the needed dependencies.
    - Cmd: conda create -n llm_chatbot python=3.11
      
      Cmd: Conda activate llm_chatbot
  2. Here is the working set of python libraries in this example, save it in a requirements.txt file and run the below command
    - Cmd: pip install -r requirements.txt
2. ```
langchain~=0.2.0 #0.1.20 
langchain_community~=0.2.0 
langchain-openai~=0.1.7 
langchain-pinecone~=0.1.1 
langchainhub~=0.1.15 
python-dotenv~=1.0.1 
streamlit~=1.34.0 
streamlit_chat~=0.0.2.2 
black~=24.4.2 
beautifulsoup4~=4.12.3 
nemoguardrails~=0.9.0 
```

Set environment variables:

Create a .env file and update the file with

OPENAI_API_KEY=sk-infoservices-***** 
INDEX_NAME=infoservices-doc-index 
PINECONE_API_KEY=******* 
PINECONE_ENVIRONMENT_REGION=us-east-1-aws-free 
LANGCHAIN_TRACING_V2=true 
LANGCHAIN_API_KEY=lsv2_pt_******* 
LANGCHAIN_PROJECT=Infoservices 
PYTHONPATH=D:LLMnemogdnemogd 
PIPENV_IGNORE_VIRTUALENVS=1 
PYTHONIOENCODING=utf-8 
PINECONE_CLOUD=aws 
PINECONE_REGION=us-east-1

Data Ingestion
- There are many data sources like documents, webpages, databases, etc. Let’s use a text file named “info_lower.txt” (**this is a mock dataset).
  - The info_lower.txt has some information about infoservices website.
  - Load all the environment variables.
  - Import all the necessary libraries.
  - Create an ingestion.py file and define an ingestion function.

1. Since we are using a text file, we use Textloader
2. Use CharacterTextSplitter to split the data into chunks
3. Convert the text into embeddings using OpenAI Embeddings
4. Load the data into Pinecone Vector DB

import os
from dotenv import load_dotenv
from langchain_community.document_loaders import ReadTheDocsLoader
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone as PineconeLangChain
# from pinecone import Pinecone
from langchain_pinecone import Pinecone, PineconeVectorStore
load_dotenv()
def ingest_docs() -> None:
    print("Document Loading")
    loader = TextLoader("info_lower.txt", encoding="UTF-8")
    document = loader.load()
    print("Text Splitting...")
    text splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
    texts = text_splitter.split_documents(document)
    print(f"created {len(texts)} chunks")
    print("Convert Text to Embeddings using Open AI Embeddings")
    embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
    print("Text ingesting to pinecone.....")
    PineconeVectorStore.from_documents(
        texts, embeddings, index_name=os.environ["INDEX_NAME"]
    )
    print("*** Added to Pinecone Vector store vectors ***")
if __name__ == "__main__":
    ingest_docs()

5. Now, let’s run ingestion.py file

4. Now, let’s define the configuration and guardrails files.

- - Define the config .yaml file.

models: 
  - type: main 
    engine: openai 
    model: gpt-3.5-turbo-instruct

“Rails.yaml” has the definitions of how the bot should respond to,

NeMo Guard rails take care of Topical, Safety, Security, if there is a specific way we want the Bot to act, we can define custom rails.
Rails are defined in Colang script, very easy to understand.

For each Custom rail we would like to add, the 3 steps are to be followed.

Define the possible User ask
Define the Bot Response

Define the Flow with definitions of User Ask and Bot Response.

# Define User Name flow 
define user give name 
    "My name is James" 
    "I'm Julio" 
    "Sono Andrea" 
    "I am James" 
 
define flow give name 
    user give name 
    $name = ... 
    bot name greeting 
   
# Define Greeting flow 
define user greeting 
    "Hey there!" 
    "How are you?" 
    "What's up?" 
 
define bot name greeting 
    "Hey $name!" 
 
define flow 
    user greeting 
    if not $name 
        bot ask name 
    else 
        bot name greeting 
   
# Define Irrevalent topic flow 
define user ask politics 
    "what are your political beliefs?" 
    "thoughts on the president?" 
    "left wing" 
    "right wing" 
 
define bot answer politics 
    "I'm a shopping assistant, I don't like to talk of politics." 
    "Sorry I can't talk about politics!" 
 
define flow politics 
    user ask politics 
    bot answer politics 
    bot offer help 
 
 
define user ask personal details 
    "what is the salary of $name?" 
    "what is the phone number an $name?" 
    "what is the address an $name?" 
    "what is the email an $name?" 
    "what is the social security number an $name?" 
 
define bot answer personal details 
    "I'm sorry, but I can't provide any sensitive information of employees." 
 
define flow personal details 
    user ask personal details 
    bot answer personal details 
    bot offer help 
   
   
# define RAG intents and flow 
define user ask infoservices 
    "tell me about infoservices?" 
    "what is infoservices?" 
    "what is infoservices mission?" 
    "what are the Core Values?" 
 
 
define flow infoservices 
    user ask infoservices 
    $answer = execute generate_response(query=$last_user_message, docs=$contexts) 
    bot $answer

5. Now, let’s build a RAG model using guardrails.

Create a “main.py” file

Import the necessary libraries and environment variables.

# -*- coding: utf-8 -*- 
from typing import Set, Any, Dict 
from dotenv import load_dotenv 
import asyncio 
import streamlit as st 
import os 
from langchain.chains import ConversationalRetrievalChain 
from langchain_openai import OpenAIEmbeddings 
from langchain_openai import ChatOpenAI 
from streamlit_chat import message, _streamlit_chat 
from langchain_community.vectorstores.pinecone import Pinecone as PineconeLangChain 
from pinecone import Pinecone 
from nemoguardrails import LLMRails, RailsConfig 
from consts import INDEX_NAME 
 
load_dotenv()

Define the Pinecone API Key

pc = Pinecone( 
    api_key=os.environ["PINECONE_API_KEY"], 
    # environment=os.environ.get("PINECONE_ENVIRONMENT_REGION"), 
)

Load the rails and config files

# Load contents from files  with open("rag_colang.co", "r") as file1:      rag_colang_content = file1.read()    with open("config.yaml", "r") as file2:      yaml_content = file2.read()

Create a config using RailsConfig

config = RailsConfig.from_content( 
    colang_content=rag_colang_content, yaml_content=yaml_content 
)

Define a function to initialize_llmrails

All the User queries will go through the rag_rails.

def initialize_llmrails(config: RailsConfig) -> LLMRails: 
    try: 
        loop = asyncio.get_event_loop() 
    except RuntimeError: 
        loop = asyncio.new_event_loop() 
        asyncio.set_event_loop(loop) 
    return LLMRails(config) 
 
 
# Use the wrapper function to initialize LLMRails 
rag_rails = LLMRails(config)

Define a function to generate response.

We will need to use asynchronous function here,

async def generate_response(prompt: str, chat_history: list[Dict[str, Any]] = []): 
    print("prompt3", prompt) 
    result = await rag_rails.generate_async(prompt=prompt) 
    print("result", result) 
    if "sorry" not in result: 
        embeddings = OpenAIEmbeddings() 
        docsearch = PineconeLangChain.from_existing_index( 
            index_name=INDEX_NAME, 
            embedding=embeddings, 
        ) 
        docs = docsearch.similarity_search(prompt) 
        # docs_str = "n".join([doc.page_content for doc in docsearch]) 
        print("n PROMPT: n", prompt) 
        print("n DOCS: n", docs) 
 
        qa = ConversationalRetrievalChain.from_llm( 
            llm=ChatOpenAI(verbose=True, temperature=0), 
            retriever=docsearch.as_retriever(),
return_source_documents=False, 
        ) 
        inputs = { 
            "question": prompt, 
            "chat_history": chat_history, 
        } 
 
        print("nINPUTS: n", inputs) 
        print("n qa(inputs): n", qa(inputs)) 
 
        # result1 = await qa(inputs) 
        result1 = qa(inputs) 
        result = result1["answer"] 
        print("nRESULT 1", result1) 
        print("nFINAL RESULT", result) 
 
    else: 
        print("Sorry string found. Exiting...") 
        print(result) 
    return result

Now, define a main function to call streamlit app and create a Bot experience for the user with chat history.

def main():

    # Initialize Streamlit

    st.title("🦜🔗 InfoServices Chat Bot")

    # Initialize chat history

    if "messages" not in st.session_state:

        st.session_state.messages = []

    # Saving user prompt history

    if "user_prompt_history" not in st.session_state:

        st.session_state["user_prompt_history"] = []

    # For saving chat answer history

    if "chat_answers_history" not in st.session_state:  

        st.session_state["chat_answers_history"] = []

    # For saving chat history for a use

    if "chat_history" not in st.session_state:  r

        st.session_state["chat_history"] = []

    # Handling Input Prompt

    if "prompt" not in st.session_state:  

        st.session_state["prompt"] = ""



    # Display chat messages from history on app rerun

    for message in st.session_state.messages:

        with st.chat_message(message["role"]):

            st.markdown(message["content"])

Now, there are 2 ways to start the Streamlit server
1. From the terminal/ anaconda prompt run the below command from the main.py path.
  - cmd: streamlit.exe run main.py
2. Setup a configuration on your IDE
3. Run the server
  1. 1. Hit the Run button to start the streamlit server, you should see the below on the terminal
    2. The Stream lit App should open in a browser.
4. Now, let’s interact with the bot with some general questions like greeting
5. Let’s ask some questions

So, the bot was able to respond to different contexts from the services the firm provides, benefits, etc.

FAQs

How do NeMo Guardrails work with LangChain?

NeMo Guardrails LangChain integration acts as a conversational firewall for your chatbot. While LangChain manages tools, RAG chains, or memory components, NeMo Guardrails monitors inputs and outputs—ensuring your chatbot follows ethical, safety, and business rules without needing prompt engineering. This synergy makes LangChain NeMo Guardrails a powerful duo for building aligned conversational agents.

Can I see a simple NeMo Guardrails example?

Sure! Here’s a basic NeMo Guardrails LangChain example using Colang:

define user ask politics
    "what are your political beliefs?"

define bot answer politics
    "I’m here to help with shopping, not politics."

define flow block politics
    user ask politics
    bot answer politics

This simple NeMo bot flow ensures your chatbot stays on topic and avoids sensitive discussions like r/askpolitics, helping demonstrate how to define guardrails clearly. It’s a great starting point for any NeMo Guardrails tutorial.

What’s the difference between AWS chatbot guardrails and NeMo Guardrails?

AWS chatbot guardrails are integrated within the AWS Bedrock ecosystem and offer managed policies. In contrast, NeMo Guardrails is open-source, cloud-agnostic, and ideal for teams working with LangChain, Azure OpenAI, Pinecone, or HuggingFace. If you’re looking for customizable, flexible, and cross-platform support, NeMo Guardrail tooling wins hands down.

How do I test a chatbot with mock AI safety guardrails?

To test a mock AI safety guardrails new chatbot, create a sample dataset and define your safety rules in NeMo’s Colang format. Simulate edge cases—like political questions or offensive queries—and analyze the chatbot’s guarded outputs. You can visualize this easily using testing tools like Gradio or Streamlit. This process is essential for any AI guardrails tutorial.

What is RAG chatbot alignment in the context of NeMo?

RAG chatbot alignment involves grounding model outputs in retrieved knowledge while adhering to policy and safety constraints.
In this case, NeMo Guardrails acts before and after the retrieval process—validating inputs, sanitizing outputs, and ensuring that hallucinated or risky content doesn’t reach users. A well-designed NeMo Guardrails RAG setup brings both factual grounding and aligned behavior together.