AWS RAG Portfolio Assistant

Grounding and Reasoning: Engineering a Serverless Portfolio RAG System

The goal was not just to ship a chatbot. It was to internalize how retrieval, reranking, and complex reasoning actually behave when you push on them in a production environment.

AWS Bedrock

S3 Vectors

Titan Text Embeddings V2

Amazon Nova Lite

Cohere Rerank 3.5

Claude Haiku 4.5

Lambda

API Gateway

React/Tailwind widget

View Repository See It Live on This Site

Context

"The goal was not just to ship a chatbot. It was to internalize how retrieval, reranking, and complex reasoning actually behave when you push on them in a production environment."

This project sits inside my certification preparation, but it is also a real product surface on the site. The assistant answers recruiter questions about my public GitHub projects and backs its answers with clickable citations. The value of this system is not that it sounds smart. It is that it remains strictly grounded in evidence and can reason critically about complex inputs like job descriptions.

Overview

The assistant is fully serverless on AWS Bedrock and designed to answer recruiter questions about my projects with evidence that can be inspected.

This write-up covers two major architectural iterations that took the assistant from generating plausible answers to providing grounded evidence, and finally to executing multi-step reasoning. This is the point where the certification material stopped being definitions and started becoming engineering intuition.

The Problem

Iteration 1 began when I watched the assistant answer a question about one project while citing code from a completely different repository.

For a system whose value is "here's the proof," wrong proof is not a minor defect. It is confidently wrong. The issue was not a weak model or a bad embedding. I had misframed the retrieval problem itself.

The pipeline pushed every question through the same retrieval path. That was the mistake. Recruiter-style comparison questions and code evidence lookups do not behave like the same retrieval job, and forcing them into one path is what produced the off-target citations.

The Insight

Some questions are comparisons across projects. Some questions are direct evidence lookups against a file. Some are reasoning tasks over complex inputs such as long job descriptions. Treating them identically makes retrieval noisy, causes citations to drift, and weakens the system when it should be thinking step by step.

"Which project best shows end-to-end ownership?"
This is a comparison question. No single code chunk answers it because the judgment spans multiple projects.
"How does the computer vision system route OCR between edge and cloud?"
This is an evidence lookup. A real file either supports the answer or it does not.
"Is he a fit for this role?"
This is a reasoning task. The system has to decompose a large job description, retrieve evidence per requirement, and judge direct matches, transferable skills, and gaps.

What I Changed

Iteration 1 focused on grounding the evidence so the assistant stopped citing the wrong repository for the right-sounding answer.

Iteration 2 focused on reasoning, especially when recruiters pasted large job descriptions and expected the system to assess fit using evidence instead of keyword bingo.

Routing by question type

Intents now split into abstract and concrete buckets. Abstract questions are answered from curated project profiles, while concrete questions retrieve repository evidence and cite it.

Decoupling framing from retrieval

Narrative rewrites now shape how the answer reads, but retrieval runs against the user's concrete wording. Framing and evidence use different queries because they serve different jobs.

Recall wide, then rerank narrow

I removed the fixed quota per repository, pooled and deduplicated broad retrieval results, and used Cohere Rerank 3.5 through Bedrock as the precision gate. The reranker scores low even on correct matches, so a light floor of 0.01 protects recall without silently discarding good answers.

Tightening the citation contract

The composer only cites a source when it genuinely supports the sentence, and a guardrail strips hallucinated or out-of-range markers before anything reaches the frontend.

Decompose and conquer

Instead of embedding an 800-word job description into one muddy vector, the pipeline now intercepts fit-assessment intents and routes them to a dedicated module where a fast model extracts the real technical requirements first.

Parallel retrieval under hard limits

With the requirements isolated, the system executes retrieval for each requirement in parallel. Because API Gateway enforces a strict 29-second timeout, each branch shares a hard deadline and degrades to no evidence instead of crashing the full request.

Dual-model reasoning architecture

Amazon Nova Lite handles intent classification, query rewriting, and requirement extraction, while Claude Haiku 4.5 evaluates the evidence and judges direct matches, transferable skills, and gaps.

Treating the CV as evidence

Candidate profile and CV documents are now injected directly into the knowledge base, so the assistant can cite experience from those documents just as naturally as it cites repository code.

What It Looks Like Now

Ask how the computer vision system routes OCR between edge and cloud, and the assistant now answers against the right project and cites files from that exact repository.

Paste a long job description and ask whether I fit the role, and the assistant now extracts the real requirements, gathers evidence in parallel, and reasons about direct matches, transferable skills, and gaps.

Major iterations

Grounding first, then reasoning

Second timeout

API Gateway hard limit shaping retrieval strategy

Model tiers

Fast routing plus deeper reasoning

None of this required complex external infrastructure. The reranker and the dual models are simply Bedrock API calls inside the existing Lambda function. The system stays serverless, cheap, and operationally light.

How It Maps To The Cert

This project turned several certification topics from textbook definitions into working intuition. Debugging grounded retrieval and parallel latency constraints teaches you significantly more than simply knowing that these concepts exist.

Retrieval and grounding

Grounding failed early on because I retrieved with the wrong query shape, not because grounding as a concept was missing.

Reranking in a RAG pipeline

Recall versus precision became a live tuning decision with a counterintuitive score distribution instead of a slide deck abstraction.

Bedrock building blocks

The system combines Knowledge Bases over S3 Vectors, the Rerank API, and multiple generation models inside one Lambda flow.

Least-privilege IAM

The reranker and cross-region inference profiles required their own specific IAM permissions, reinforcing the least-privilege posture that the certification stresses.

The biggest takeaway is broader than this project. RAG quality is not only a question of embeddings, models, and chunking. It is also about deciding whether the question in front of you requires a simple lookup or multi-step reasoning, and engineering the architecture to route it accordingly.

What's Next

The next work is less about adding features and more about refining the current retrieval and reasoning path under real usage.

Latency: reduce the multi-step Bedrock pipeline while preserving grounded retrieval and reasoning quality

Reasoning calibration: keep refining fit-assessment prompts, evidence thresholds, and degradation behavior under hard time limits

Corpus growth: expand the knowledge base and revisit profile retrieval strategy as the portfolio and candidate documents grow

Repository

Explore the full implementation, infrastructure scripts, backend, and widget source.

Open AWS-Portfolio-RAG