How CBG leverages AWS to eliminate waste from our Fraud Waste & Abuse program

8 min readJun 6, 2022

Many software engineering teams champion values, missions, and tenets focused on helping them rapidly build meaningful software solutions for their customers. In cases where development teams weren’t able to deliver quickly it’s often been due to factors & organizational debt outside of the technology teams’ control like misalignment with business-wide value streams or executive prioritization.

At Cooperative Benefits Group, or CBG, we’re fortunate to have development, solutions, & operations teams aligned to value streams led by executives with common mindsets who have a united intent to push forward our CBG vision, mission, and roadmap.

What was the business problem?

At CBG, we recently partnered up with Doug Griffel to build a future-ready Fraud Waste and Abuse (FWA) solution in AWS to solve some core problems his team was facing.

“FWA is a critical component of every PBM and healthcare entity. A major component of an effective FWA program is the utilization of claims data for analysis. Utilization trends provide visibility into identifying outliers and may be a leading indicator to identify potential issues. This analysis allows us to investigate dispensing patterns of pharmacies and providers/prescribers. There are numerous scenarios that need to be included in a thorough and comprehensive FWA oversight model. The mass volume of scenarios necessitates a robust reporting suite to help identify outliers, trends and standard metrics which provide researchers and machines with a logical place to begin analyzing data, by stripping away some of the forest to enable you to see the trees, if you will.
Building the reports and queries that drive the analysis was only the beginning. My team found themselves having to manually execute the reports including modifying the parameters within the queries, wait for them to complete one at a time and exporting the result. Obviously, this was not an efficient use of time and there had to be a way to automate this process.”

— Doug Griffel

With these specific pain points in mind, we had some candid requirements gathering conversations with Doug and his team. From those quick huddles, our technology team plotted a course forward to see if we could solve some, if not all, of these challenges. Our technology team worked through initial architecture discussions focusing on the main business problems and identified what we felt was a promising MVP. We reviewed that architecture and MVP with Doug, received additional details and approval, then started off on a sprint towards MVP.

What does the architecture actually look like?

Many visitors to this blog post are looking to find out exactly what tech stack was used to build such an advanced product. To set context, we did this with four full-time developers and one full-time rockstar business analyst. The developers were focused on this project for 50% of each day in the 6-week period.

Building upon our values, tenants, and architecture principles we’ve built a very resilient and reliable cloud infrastructure in AWS with best practices in effect. This allowed us to pivot quickly and deliver using the reliable and nimble infrastructure we already have in place.

The complete architecture diagram with AWS, Retool, Datadog, Serverless, and SnapLogic references

Rather than “start with technology,” we started with the business requirements and expected outcomes. Early on, some core requirements revealed that custom business logic executing should be easily scaled & secured.

We didn’t want to worry about compute size or databases running out of space as the solution scaled.
We wanted to ensure running services were natively secured using least-privilege security roles and permissions.
And most important of all, the solution needed to facilitate self-service and eliminate manual processes through automation.

Plotting a course forward

With these expected outcomes in mind, we formed a high-level architecture based on our experience in AWS.

All our executing code would run in secure VPC private subnets to restrict internet access and ensure outbound traffic goes through our Fortigate firewalls. This also keeps all traffic contained to our VPCs instead of going out to the internet.
All report configurations, including custom schedules, needed to be stored in a schema-less DynamoDB data store for flexibility and to achieve 2-digit millisecond response times.
All data to messaging services, Lambdas, databases, Step Functions, API Gateways, DynamoDB, and more are encrypted in transit and at rest with AWS KMS services.
Executing business logic pushes log activity to central locations in CloudWatch and Datadog with customizable storage retention periods.
To ensure that errors don’t go unnoticed, any errors in processing are captured by CloudWatch alarms and Datadog monitors which are then broadcast to an AWS SNS Topic which pushes to all subscribers.

What does the UI look like?

One of the more important requirements for end-users was that we couldn’t be introducing yet another application for users to login to. We harnessed Retool UI for its nimble ability to spin up advanced internal UIs. Retool has proven to be unmatched in building advanced UIs for our internal teams with respect to flexibility and speed of development.

Once in the Retool UI shown above, FWA report configurations are managed in a resilient schema-less datastore, DynamoDB.
End users create SQL scripts used to generate report results and store them in S3 buckets for durable storage, version control, tight permissions, and simple API usage.

MVP version of UI used for managing scheduled FWA report configurations

End users configure schedules and S3 bucket destinations so that generated reports are exported to expected S3 buckets for durable storage, simple retrieval, integrated APIs, and auto-archiving.
When FWA reports are discovered by services, the results are published immediately to subscribed parties for immediate or later retrieval.

Triggering on-demand and scheduled reports

For our end-users, we needed two ways to trigger automated reports: on-demand and scheduled. When it comes to orchestrating on-demand reports, they are run by sending requests through API Gateway from the Retool UI which proxies requests through the report_run Lambda which pulls the report config from DynamoDB and creates a message in the process SQS queue.

Finally, the report_run Lambda executes an AWS Step Function to process all queued message in SQS and generate configured reports. This state machine continues to pull and delete messages from the SQS queue until they are cleared. As each message is pulled from the queue, the report_execute Lambda calls the SnapLogic pipeline with the queue message instructions, which generates a report in S3 using the report configuration stored in DynamoDB.

When it comes to orchestrating scheduled report runs, an EventBridge rule runs once daily. When triggered, the rule invokes the report_watch Lambda, which pulls report configs from DynamoDB. It then checks if a report schedule has been met. If a report schedule has been met, the report config is sent to the process SQS queue. Like the on-demand triggered report run, the AWS Step Function state machine is started to process through the messages on the queue.

Running reports via automated schedules with EventBridge rule

How do we keep systems healthy?

To keep all logging centralized, compliant, and insistent, all our Lambdas are configured to log to CloudWatch log groups which have subscriptions to a Kinesis Firehose that continually push to Datadog.

CloudWatch log groups with subscriptions to Kinesis Firehose

In Datadog, we use custom built and Datadog-provided dashboards to actively monitor for production assurance purposes the performance of this delivered system against expected traffic patterns. When monitored “red” states are detected, notifications go out to our fireteams via several messaging channels.

Deploying to production 15x per week?

For our developer workflows we “walk the talk” of CI/CD. Our teams align on using Visual Studio Code, GitHub for source code control, GitHub Actions for continuously deploying once integrated via AWS CloudFormation, Terraform or the Serverless Framework CLI, depending on what artifacts are being pushed. We typically hit a cadence of 2–3 releases per day for this feature.

This cadence is only possible through through a team unified by strong values, a mindset to eliminate waste, and an effective prioritization mechanism at the team and executive level.

What were the business outcomes?

None of the technology matters, no matter how well it was conceived and architected, if it doesn’t solve the business problem and is easy for the end user to navigate. Nothing speaks to the success of a project quite like outcome metrics. With this project, we discovered real business outcomes that have a lasting impact on how we perform our FWA service as a PBM.

“Working with our IT partners, we were able to quickly identify a path to make the queries dynamic, eliminating the need to update the parameters every time a report needs to be run. Additionally, a solution was identified to automate the execution of the reports and exportation of the data. This automation effectively took out all front-loaded work required prior to our team being able to analyze the data. This saved approximately 20 hours of effort across 65+ reports, and also reduced the risk of manual errors . The most important byproduct of the automation was the additional 20 hours that are now available to work on claims review and analysis along with additional client requests and projects.”

— Doug Griffel

Beyond MVP

With wind in our sails, a respectable cloud-native solution, delighted end-users, and real business outcomes, it’s time to think about what comes next beyond the MVP.

“The automation of our FWA reporting engine opens the opportunity to introduce more reports, an increased audit frequency, cloud-native resiliency and scalability, and advanced AI/ML for deeper insights. This allows us to find issues sooner, communicate potential problem to our partners and get things resolved/clarified quickly. The real beauty behind the solution provided is that it’s cross-functional. Automation of reporting and removing the need to manually run SQL queries can be leveraged across our whole enterprise. This progressive and innovative approach to a cross-functional, automated solution is not something typically found in legacy corporations with internal bureaucracies and silos.”

TL;DR

My hope is that by reading this post visitors get a sense for how CBG approaches solving business problems with modern technologies, a strong sense of culture, solid cloud-native architectural practices, and an effective prioritization mechanism focused on delighting our end users. Our solutions aren’t about the tech toys; they’re about tackling real-world challenges, identifying business outcomes, and charging hard towards those potential wins to better serve our internal and external clients.

How CBG leverages AWS to eliminate waste from our Fraud Waste & Abuse program

What was the business problem?

What does the architecture actually look like?

Plotting a course forward

What does the UI look like?

Triggering on-demand and scheduled reports

How do we keep systems healthy?

Deploying to production 15x per week?

What were the business outcomes?

Beyond MVP

TL;DR

Written by Mark Fowler