RAG(Retrieval-Augmented Generation) using LLama3

RAG, or Retrieval-Augmented Generation, represents a groundbreaking approach in the realm of natural language processing (NLP). By combining the strengths of retrieval and generative models, RAG delivers detailed and accurate responses to user queries. When paired with LLAMA 3, an advanced language model renowned for its nuanced understanding and scalability, RAG achieves new heights of capability. This article explores the synergy between RAG and LLAMA 3, outlining their benefits and providing a step-by-step guide for building a project that leverages these technologies.

What is RAG?

RAG is a framework designed to handle a range of NLP tasks, including question-answering, summarization, and conversational agents. It comprises three main stages:

  1. Retrieve: In this initial phase, relevant documents are retrieved from a corpus based on the user’s query. Traditional search engines rely on keyword matching, but RAG goes beyond this by utilizing advanced language models for semantic understanding.
  2. Aggregate: Once documents are retrieved, the next step is to aggregate the information contained within them. This involves condensing and summarizing the content to extract key insights or answers to the user’s query.
  3. Generate: Finally, the aggregated information is used to generate a coherent response or answer. This could involve paraphrasing, synthesizing new information, or providing additional context to enrich the response.

Why RAG Before Fine-Tuning?

Before delving into the implementation details, it’s essential to understand why RAG is preferred over fine-tuning in certain scenarios:

  1. Improved Significance and Context: RAG augments the model with pertinent context from a vast corpus of documents, leading to more accurate and contextually relevant responses.
  2. Enhanced Data Effectiveness: By sparing the model from encoding all external knowledge during pre-training or fine-tuning, RAG allows for the utilization of a large quantity of external knowledge. This dynamic access to information minimizes the need for extensive task-specific data during fine-tuning, making it more effective.
  3. Effectiveness with Long-Tail Requests: RAG-equipped models excel at handling rare or unseen queries by obtaining relevant data from external sources to fill knowledge gaps. Fine-tuning further enhances the model’s performance on less frequent queries by teaching it to utilize retrieved information effectively.

What is LLAMA 3?

LLama3, short for Language Model for Local Document Search, serves as the retrieval component in the RAG framework. Unlike traditional search engines, LLama3 leverages language models to understand the user’s query in a more nuanced manner. It retrieves documents from a local corpus based on semantic relevance, rather than relying solely on keyword matching.

Project for Extracting Insights from Documents and URLs

To showcase the integration of RAG with LLAMA 3, we’ll build a project using Phidata, a framework that enhances language models’ capabilities. By adding memory, knowledge, and tools, Phidata enables language models to engage in more complex interactions and tasks.

Phidata Framework:

Phidata is a framework designed for Language Model augmentation. It addresses the limitations of language models by adding the following components:

  1. Memory: Allows Language Models to maintain long-term conversations by storing chat history in a database.
  2. Knowledge: Provides Language Models with contextual information by storing data in a vector database.
  3. Tools: Enables Language Models to perform actions like pulling data from APIs, sending emails, or querying databases.

General Steps:

  1. Download LLAMA 3: Obtain LLAMA 3 from its official website.
  2. Clone Phidata Repository: Clone the Phidata Git repository or download the code from the repository.
  3. Navigate to the RAG Directory: Access the RAG directory within the Phidata repository.
  4. Set Up Environment: Create a new Python environment using Conda, then install the necessary packages.
  5. Install Packages: Use Phidata to pull LLAMA 3 and other required packages.
  6. Database Setup: Install Docker and create a PostgreSQL database using the provided command.
  7. Run the Application: Launch the application using Streamlit.

Step By Step Guide :

  • Step 1 – Download the ollama form ollama.com
  • Step 2 – Clone a Phidata Git repository or take this code
git clone https://github.com/phidatahq/phidata
  • Step 3 – Navigate to this path in the repo
PDFUsingRag\phidata\cookbook\llms\ollama\rag
  • Step 4 – Setup Environment
conda create -n phidata python=3.11 -y
conda activate phidata
pip install -r requirements.txt
  • Step 5 – Install Packages
ollama pull llama3
  • Step 6 – To create a database first install the docker and create the account
  • Step 7 – use this command in the same path
docker run -d -e POSTGRES_DB=ai -e POSTGRES_USER=ai -e POSTGRES_PASSWORD=ai -e PGDATA=/var/lib/postgresql/data/pgdata -v pgvolume:/var/lib/postgresql/data -p 5532:5432 --name pgvector phidata/pgvector:16
  • Step 8 – Run the Application
streamlit run app.py
Python
from typing import List
import streamlit as st
from phi.assistant import Assistant
from phi.document import Document
from phi.document.reader.pdf import PDFReader
from phi.document.reader.website import WebsiteReader
from phi.utils.log import logger
from assistant import get_rag_assistant  # type: ignore

st.set_page_config(
    page_title="Local RAG",
    page_icon=":orange_heart:",
)
st.title("Local RAG with Ollama and PgVector")
st.markdown("##### :orange_heart: built using [phidata](https://github.com/phidatahq/phidata)")


def restart_assistant():
    st.session_state["rag_assistant"] = None
    st.session_state["rag_assistant_run_id"] = None
    if "url_scrape_key" in st.session_state:
        st.session_state["url_scrape_key"] += 1
    if "file_uploader_key" in st.session_state:
        st.session_state["file_uploader_key"] += 1
    st.rerun()


def main() -> None:
    # Get model
    llm_model = st.sidebar.selectbox("Select Model", options=["llama3", "phi3", "openhermes", "llama2"])
    # Set assistant_type in session state
    if "llm_model" not in st.session_state:
        st.session_state["llm_model"] = llm_model
    # Restart the assistant if assistant_type has changed
    elif st.session_state["llm_model"] != llm_model:
        st.session_state["llm_model"] = llm_model
        restart_assistant()

    # Get Embeddings model
    embeddings_model = st.sidebar.selectbox(
        "Select Embeddings",
        options=["nomic-embed-text", "llama3", "openhermes", "phi3"],
        help="When you change the embeddings model, the documents will need to be added again.",
    )
    # Set assistant_type in session state
    if "embeddings_model" not in st.session_state:
        st.session_state["embeddings_model"] = embeddings_model
    # Restart the assistant if assistant_type has changed
    elif st.session_state["embeddings_model"] != embeddings_model:
        st.session_state["embeddings_model"] = embeddings_model
        st.session_state["embeddings_model_updated"] = True
        restart_assistant()

    # Get the assistant
    rag_assistant: Assistant
    if "rag_assistant" not in st.session_state or st.session_state["rag_assistant"] is None:
        logger.info(f"---*--- Creating {llm_model} Assistant ---*---")
        rag_assistant = get_rag_assistant(llm_model=llm_model, embeddings_model=embeddings_model)
        st.session_state["rag_assistant"] = rag_assistant
    else:
        rag_assistant = st.session_state["rag_assistant"]

    # Create assistant run (i.e. log to database) and save run_id in session state
    try:
        st.session_state["rag_assistant_run_id"] = rag_assistant.create_run()
    except Exception:
        st.warning("Could not create assistant, is the database running?")
        return

    # Load existing messages
    assistant_chat_history = rag_assistant.memory.get_chat_history()
    if len(assistant_chat_history) > 0:
        logger.debug("Loading chat history")
        st.session_state["messages"] = assistant_chat_history
    else:
        logger.debug("No chat history found")
        st.session_state["messages"] = [{"role": "assistant", "content": "Upload a doc and ask me questions..."}]

    # Prompt for user input
    if prompt := st.chat_input():
        st.session_state["messages"].append({"role": "user", "content": prompt})

    # Display existing chat messages
    for message in st.session_state["messages"]:
        if message["role"] == "system":
            continue
        with st.chat_message(message["role"]):
            st.write(message["content"])

    # If last message is from a user, generate a new response
    last_message = st.session_state["messages"][-1]
    if last_message.get("role") == "user":
        question = last_message["content"]
        with st.chat_message("assistant"):
            response = ""
            resp_container = st.empty()
            for delta in rag_assistant.run(question):
                response += delta  # type: ignore
                resp_container.markdown(response)
            st.session_state["messages"].append({"role": "assistant", "content": response})

    # Load knowledge base
    if rag_assistant.knowledge_base:
        # -*- Add websites to knowledge base
        if "url_scrape_key" not in st.session_state:
            st.session_state["url_scrape_key"] = 0

        input_url = st.sidebar.text_input(
            "Add URL to Knowledge Base", type="default", key=st.session_state["url_scrape_key"]
        )
        add_url_button = st.sidebar.button("Add URL")
        if add_url_button:
            if input_url is not None:
                alert = st.sidebar.info("Processing URLs...", icon="ℹ️")
                if f"{input_url}_scraped" not in st.session_state:
                    scraper = WebsiteReader(max_links=2, max_depth=1)
                    web_documents: List[Document] = scraper.read(input_url)
                    if web_documents:
                        rag_assistant.knowledge_base.load_documents(web_documents, upsert=True)
                    else:
                        st.sidebar.error("Could not read website")
                    st.session_state[f"{input_url}_uploaded"] = True
                alert.empty()

        # Add PDFs to knowledge base
        if "file_uploader_key" not in st.session_state:
            st.session_state["file_uploader_key"] = 100

        uploaded_file = st.sidebar.file_uploader(
            "Add a PDF :page_facing_up:", type="pdf", key=st.session_state["file_uploader_key"]
        )
        if uploaded_file is not None:
            alert = st.sidebar.info("Processing PDF...", icon="?")
            rag_name = uploaded_file.name.split(".")[0]
            if f"{rag_name}_uploaded" not in st.session_state:
                reader = PDFReader()
                rag_documents: List[Document] = reader.read(uploaded_file)
                if rag_documents:
                    rag_assistant.knowledge_base.load_documents(rag_documents, upsert=True)
                else:
                    st.sidebar.error("Could not read PDF")
                st.session_state[f"{rag_name}_uploaded"] = True
            alert.empty()

    if rag_assistant.knowledge_base and rag_assistant.knowledge_base.vector_db:
        if st.sidebar.button("Clear Knowledge Base"):
            rag_assistant.knowledge_base.vector_db.clear()
            st.sidebar.success("Knowledge base cleared")

    if rag_assistant.storage:
        rag_assistant_run_ids: List[str] = rag_assistant.storage.get_all_run_ids()
        new_rag_assistant_run_id = st.sidebar.selectbox("Run ID", options=rag_assistant_run_ids)
        if st.session_state["rag_assistant_run_id"] != new_rag_assistant_run_id:
            logger.info(f"---*--- Loading {llm_model} run: {new_rag_assistant_run_id} ---*---")
            st.session_state["rag_assistant"] = get_rag_assistant(
                llm_model=llm_model, embeddings_model=embeddings_model, run_id=new_rag_assistant_run_id
            )
            st.rerun()

    if st.sidebar.button("New Run"):
        restart_assistant()


main()
Python
from typing import Optional

from phi.assistant import Assistant
from phi.knowledge import AssistantKnowledge
from phi.llm.ollama import Ollama
from phi.embedder.ollama import OllamaEmbedder
from phi.vectordb.pgvector import PgVector2
from phi.storage.assistant.postgres import PgAssistantStorage

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"


def get_rag_assistant(
    llm_model: str = "llama3",
    embeddings_model: str = "nomic-embed-text",
    user_id: Optional[str] = None,
    run_id: Optional[str] = None,
    debug_mode: bool = True,
) -> Assistant:
    """Get a Local RAG Assistant."""

    # Define the embedder based on the embeddings model
    embedder = OllamaEmbedder(model=embeddings_model, dimensions=4096)
    embeddings_model_clean = embeddings_model.replace("-", "_")
    if embeddings_model == "nomic-embed-text":
        embedder = OllamaEmbedder(model=embeddings_model, dimensions=768)
    elif embeddings_model == "phi3":
        embedder = OllamaEmbedder(model=embeddings_model, dimensions=3072)
    # Define the knowledge base
    knowledge = AssistantKnowledge(
        vector_db=PgVector2(
            db_url=db_url,
            collection=f"local_rag_documents_{embeddings_model_clean}",
            embedder=embedder,
        ),
        # 3 references are added to the prompt
        num_documents=3,
    )

    return Assistant(
        name="local_rag_assistant",
        run_id=run_id,
        user_id=user_id,
        llm=Ollama(model=llm_model),
        storage=PgAssistantStorage(table_name="local_rag_assistant", db_url=db_url),
        knowledge_base=knowledge,
        description="You are an AI called 'RAGit' and your task is to answer questions using the provided information",
        instructions=[
            "When a user asks a question, you will be provided with information about the question.",
            "Carefully read this information and provide a clear and concise answer to the user.",
            "Do not use phrases like 'based on my knowledge' or 'depending on the information'.",
        ],
        # Uncomment this setting adds chat history to the messages
        # add_chat_history_to_messages=True,
        # Uncomment this setting to customize the number of previous messages added from the chat history
        # num_history_messages=3,
        # This setting adds references from the knowledge_base to the user prompt
        add_references_to_prompt=True,
        # This setting tells the LLM to format messages in markdown
        markdown=True,
        add_datetime_to_instructions=True,
        debug_mode=debug_mode,
    )

Output:

Demonstration of the App

  • First add any URL or File

adding source

  • Ask any question related to the source

asking question

  • It will take some time and then display the response

result

Conclusion

To sum up, the use of LLaMA 3 for the Rag task has shown great promise in improving natural language comprehension and generation. LLaMA 3’s sophisticated ability to process and generate human-like text when combined with retrieval methods provides greater precision and relevance in the generated content. The combination of retrieval and LLaMA 3 allows the model to gain access to a vast library of knowledge, making sure that responses are contextually relevant and enriched with relevant information.



Contact Us