How CBG builds high-accuracy Generative AI services on enterprise data using Amazon Kendra, Amazon Bedrock, LangChain, and large language models

5 min readJan 27, 2024

In this blog post, I’m excited to share how CBG is reimagining the way we address complex natural language processing challenges with the power of Generative AI (GenAI) and Large Language Models (LLMs) like Amazon Bedrock and Amazon Titan. These advanced tools are not just changing our game; they’re redefining it, enabling us to craft more dynamic AI conversations and boost productivity through sharper, more intuitive responses. Using GenAI, we’ve been able to reduce time spent supporting SOC 1 & 2 audits, accelerate finding relevant company policies, increase time spent developing value-driven features, and provide better overall customer support in a complicated healthcare environment.

At CBG, we believe that effective GenAI applications need to orbit around company data and respect user access permissions. That’s where our clever use of Retrieval Augmented Generation (RAG) comes into play. RAG smartly sifts through our vast enterprise knowledge, picking just the right info to feed into the LLMs. This precision helps in crafting responses that are not just accurate but also relevant to our internal teams.

Architecture diagram for LangChain, Kendra, and LLMs at CBG

To move quickly, we simply integrate Amazon Kendra with these LLMs through Bedrock and LangChain. We’re not just building GenAI applications; we’re creating top-tier conversational experiences that navigate our enterprise content with accuracy. With this architecture, our teams are able to more efficiently navigate compliance audits, query policies, and streamline workflows relying on multiple unrelated data sources. We call this architecture FRAN (Flexible Responsible AI Navigator).

Solution overview

Our approach to developing the FRAN family of Generative AI (GenAI) applications involves leveraging Amazon Kendra to effectively ingest and process a wide range of unstructured enterprise data. This data includes troves of data sources like SharePoint, MS Teams, Box, Amazon S3, GitHub, Amazon RDS, Amazon Redshift, and more.

This architecture offers flexibility in selecting the most appropriate LLM for our specific use case. We can choose from a variety of LLMs, including those from Hugging Face, AI21 Labs, Cohere, and others hosted on an Amazon SageMaker endpoint. This selection also extends to models from companies like Anthropic and OpenAI. With Amazon Bedrock, we have the advantage of selecting Amazon Titan or partner LLMs that all use the same Bedrock API, ensuring secure and simple API interactions within the AWS ecosystem.

The interaction flows of our FRAN GenAI apps are meticulously designed for efficiency and accuracy so our internal teams can quickly retrieve relevant information. A common flow looks something like this:

User Request: Users initiate the process by making a request through the user interface provided by Retool or MS Teams as a chat bot. All user requests are secured via AWS services.

Query Processing: The FRAN GenAI app formulates a search query based on the user’s request. This query is processed through an API Gateway and Lambda, which in turn forwards it to the Amazon Kendra index using LangChain.

Data Retrieval: Amazon Kendra index responds with search results, presenting excerpts from relevant documents stored in various enterprise sources such as SharePoint, Box, S3, GitHub repositories, MS Teams, and Slack.

Historical Context: The application retrieves historical conversation data from DynamoDB tables, ensuring a comprehensive understanding of the user’s context from a historical perspective.

Prompt Creation: The user’s current request, along with the historical conversation data and search results from Kendra, are compiled to create a detailed prompt for the Large Language Model (LLM) selected for the use case.

LLM Response: The LLM hosted on BedRock processes this prompt and generates a concise and relevant response to the user’s query.

User Response Delivery: This LLM response is then conveyed back to the user, completing the interaction cycle.

Observability and Monitoring: All user requests, querying, processing, and LLM responses are streamed to centralized logging and aggregation sites to meet compliance controls and to achieve full transparency in how AI is used at CBG.

For optimal performance, our FRAN GenAI apps are designed to tailor the prompt to both the user’s request and the specific LLM in use. Managing the chat history and context is crucial for the effectiveness of conversational AI applications. To facilitate this, our developers utilize open-source frameworks such as LangChain. These frameworks provide essential modules for integrating with the chosen LLM and tools for managing chat history and engineering prompts.

Specifically, we employ the AmazonKendraRetriever class from LangChain, which interfaces with the Amazon Kendra index to retrieve relevant data. This class utilizes the Amazon Kendra’s Retrieve API to efficiently query the index and obtain results that include the most pertinent excerpt passages for the query.

TL;DR

The integration of Generative AI with large language models represents a significant advancement in how our teams at CBG gather and utilize insights from data. For enterprise applications, it’s essential that these insights are derived directly from enterprise content that we use every day. This approach ensures that responses remain relevant to the domain and reduces the likelihood of inaccuracies or hallucinations.

The effectiveness of this method hinges on the Retrieval Augmented Generation (RAG) approach, where the quality of the insights produced by the Large Language Model (LLM) is directly linked to the semantic relevance of the information retrieved.

Amazon Kendra and Amazon Bedrock have been vital tools in this process, offering high-accuracy semantic search results straight out of the box. Kendra’s Retrieve API, specifically tailored for RAG applications, combined with an extensive array of data source connectors, support for widely used file formats, and robust security measures, positions Amazon Kendra as an ideal retrieval mechanism for implementing Generative AI solutions in enterprise contexts.

With Amazon Kendra and Amazon Bedrock, organizations like CBG can efficiently harness the power of Generative AI, ensuring that their insights are both accurate and highly relevant to their specific enterprise needs.

How CBG builds high-accuracy Generative AI services on enterprise data using Amazon Kendra, Amazon Bedrock, LangChain, and large language models

Solution overview

TL;DR

Written by Mark Fowler