Introduction
Retrieval-Augmented Generation (RAG) has emerged as the standard architecture for allowing Large Language Models (LLMs) to query proprietary enterprise data. However, when deploying RAG in a corporate environment, security and data isolation become the primary engineering hurdles.
In this post, we discuss how DivyamStack engineers secure RAG pipelines that respect existing Identity and Access Management (IAM) permissions.
The Security Problem with Naive RAG
A naive RAG implementation chunks all company documents into a single vector database. When a user asks a question, the system retrieves the most semantically similar chunks and feeds them to the LLM.
The issue? The LLM might retrieve sensitive HR documents or executive financials and show them to an unauthorized employee if their query happens to match the vector embeddings.
Our Approach: RBAC-Enforced Vector Search
To solve this, we implement Role-Based Access Control (RBAC) directly at the vector database level.
1. Metadata Tagging at Ingestion
During the data ingestion pipeline, every document chunk is tagged with permission metadata (e.g., allowed_roles: ['hr', 'admin']).
2. Pre-Filtering at Retrieval
When a user submits a query, we intercept their IAM token (via Auth0 or Azure AD), extract their roles, and apply a metadata filter to the vector search query.
# Example pseudo-code for a secure Pinecone query
results = index.query(
vector=user_query_embedding,
top_k=5,
filter={
"allowed_roles": { "$in": user.roles }
}
)
Conclusion
By enforcing permissions before the context window is assembled, we guarantee that the LLM can never hallucinate or leak data that the user isn’t strictly authorized to see.