AWS RAG Portfolio Assistant
Grounding and Reasoning: Engineering a Serverless Portfolio RAG System
The goal was not just to ship a chatbot. It was to internalize how retrieval, reranking, and complex reasoning actually behave when you push on them in a production environment.
Context
"The goal was not just to ship a chatbot. It was to internalize how retrieval, reranking, and complex reasoning actually behave when you push on them in a production environment."
This project sits inside my certification preparation, but it is also a real product surface on the site. The assistant answers recruiter questions about my public GitHub projects and backs its answers with clickable citations. The value of this system is not that it sounds smart. It is that it remains strictly grounded in evidence and can reason critically about complex inputs like job descriptions.
Overview
The assistant is fully serverless on AWS Bedrock and designed to answer recruiter questions about my projects with evidence that can be inspected.
This write-up covers two major architectural iterations that took the assistant from generating plausible answers to providing grounded evidence, and finally to executing multi-step reasoning. This is the point where the certification material stopped being definitions and started becoming engineering intuition.
The Problem
Iteration 1 began when I watched the assistant answer a question about one project while citing code from a completely different repository.
For a system whose value is "here's the proof," wrong proof is not a minor defect. It is confidently wrong. The issue was not a weak model or a bad embedding. I had misframed the retrieval problem itself.
The pipeline pushed every question through the same retrieval path. That was the mistake. Recruiter-style comparison questions and code evidence lookups do not behave like the same retrieval job, and forcing them into one path is what produced the off-target citations.
The Insight
Some questions are comparisons across projects. Some questions are direct evidence lookups against a file. Some are reasoning tasks over complex inputs such as long job descriptions. Treating them identically makes retrieval noisy, causes citations to drift, and weakens the system when it should be thinking step by step.
- "Which project best shows end-to-end ownership?"
This is a comparison question. No single code chunk answers it because the judgment spans multiple projects.
- "How does the computer vision system route OCR between edge and cloud?"
This is an evidence lookup. A real file either supports the answer or it does not.
- "Is he a fit for this role?"
This is a reasoning task. The system has to decompose a large job description, retrieve evidence per requirement, and judge direct matches, transferable skills, and gaps.
What I Changed
Iteration 1 focused on grounding the evidence so the assistant stopped citing the wrong repository for the right-sounding answer.
Iteration 2 focused on reasoning, especially when recruiters pasted large job descriptions and expected the system to assess fit using evidence instead of keyword bingo.
Routing by question type
Intents now split into abstract and concrete buckets. Abstract questions are answered from curated project profiles, while concrete questions retrieve repository evidence and cite it.
Decoupling framing from retrieval
Narrative rewrites now shape how the answer reads, but retrieval runs against the user's concrete wording. Framing and evidence use different queries because they serve different jobs.
Recall wide, then rerank narrow
I removed the fixed quota per repository, pooled and deduplicated broad retrieval results, and used Cohere Rerank 3.5 through Bedrock as the precision gate. The reranker scores low even on correct matches, so a light floor of 0.01 protects recall without silently discarding good answers.
Tightening the citation contract
The composer only cites a source when it genuinely supports the sentence, and a guardrail strips hallucinated or out-of-range markers before anything reaches the frontend.
Decompose and conquer
Instead of embedding an 800-word job description into one muddy vector, the pipeline now intercepts fit-assessment intents and routes them to a dedicated module where a fast model extracts the real technical requirements first.
Parallel retrieval under hard limits
With the requirements isolated, the system executes retrieval for each requirement in parallel. Because API Gateway enforces a strict 29-second timeout, each branch shares a hard deadline and degrades to no evidence instead of crashing the full request.
Dual-model reasoning architecture
Amazon Nova Lite handles intent classification, query rewriting, and requirement extraction, while Claude Haiku 4.5 evaluates the evidence and judges direct matches, transferable skills, and gaps.
Treating the CV as evidence
Candidate profile and CV documents are now injected directly into the knowledge base, so the assistant can cite experience from those documents just as naturally as it cites repository code.
What It Looks Like Now
Ask how the computer vision system routes OCR between edge and cloud, and the assistant now answers against the right project and cites files from that exact repository.
Paste a long job description and ask whether I fit the role, and the assistant now extracts the real requirements, gathers evidence in parallel, and reasons about direct matches, transferable skills, and gaps.
None of this required complex external infrastructure. The reranker and the dual models are simply Bedrock API calls inside the existing Lambda function. The system stays serverless, cheap, and operationally light.
How It Maps To The Cert
This project turned several certification topics from textbook definitions into working intuition. Debugging grounded retrieval and parallel latency constraints teaches you significantly more than simply knowing that these concepts exist.
Retrieval and grounding
Grounding failed early on because I retrieved with the wrong query shape, not because grounding as a concept was missing.
Reranking in a RAG pipeline
Recall versus precision became a live tuning decision with a counterintuitive score distribution instead of a slide deck abstraction.
Bedrock building blocks
The system combines Knowledge Bases over S3 Vectors, the Rerank API, and multiple generation models inside one Lambda flow.
Least-privilege IAM
The reranker and cross-region inference profiles required their own specific IAM permissions, reinforcing the least-privilege posture that the certification stresses.
The biggest takeaway is broader than this project. RAG quality is not only a question of embeddings, models, and chunking. It is also about deciding whether the question in front of you requires a simple lookup or multi-step reasoning, and engineering the architecture to route it accordingly.
What's Next
The next work is less about adding features and more about refining the current retrieval and reasoning path under real usage.
Repository
Explore the full implementation, infrastructure scripts, backend, and widget source.