I rebuilt my CV as an AI assistant. Here's the architecture, the cost, and what it taught me.

How I replaced the static resume with Sensei — a conversational AI that answers questions about my work in real time.

Published: Apr 23, 2026

The problem with static CVs

Static CVs are broken. Recruiters skim in seconds; candidates can't answer follow-up questions. So I replaced mine with Sensei — a conversational AI trained on my projects, skills, and decisions. This post walks through the architecture, the prompt engineering, the cost model, and the three things I'd do differently if I built it again.

A PDF resume is a one-way broadcast. It can't answer "what was your role specifically?" or "how did you handle scaling?" The recruiter either guesses, moves on, or schedules a call to ask what the document should have answered. I wanted something that could handle those follow-ups instantly — without me being online.

Architecture overview

Sensei is a serverless application running entirely on AWS. Here's the stack:

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│  Browser     │────▶│ API Gateway  │────▶│  AWS Lambda       │
│  Chat Widget │     │ (REST)       │     │  (Node.js)        │
└──────────────┘     └──────────────┘     └────────┬─────────┘
                                                   │
                                    ┌──────────────┼──────────────┐
                                    ▼              ▼              ▼
                              ┌──────────┐  ┌──────────┐  ┌────────────┐
                              │ S3       │  │ DynamoDB  │  │ Bedrock    │
                              │ RAG Store│  │ Sessions  │  │ (Llama 3)  │
                              └──────────┘  └──────────┘  └────────────┘

The user types a question in the embedded chat widget. The request hits API Gateway, which invokes a Lambda function. Lambda does three things:

Retrieves context — A precomputed TF-IDF index over my project descriptions, skills, and career data sits in S3. The retriever scores the query against this index and pulls the top-k relevant chunks.
Builds the prompt — The retrieved chunks are injected into a system prompt that constrains the model to only answer based on provided context, cite sources, and stay on-topic.
Calls the LLM — AWS Bedrock (Llama 3) generates the response. DynamoDB tracks session state for multi-turn conversations.

Prompt design

The system prompt is the most important part of this build. It does four things:

Persona constraint: "You are Sensei, an AI assistant that answers questions about Ousseini Oumarou's professional background." This prevents the model from wandering into general knowledge.
Source grounding: "Only answer using the provided context. If the context doesn't contain the answer, say so." This eliminates hallucination about projects I never built.
Tone calibration: Concise, professional, factual. No marketing language, no superlatives.
Guardrails: Refuses personal questions, salary discussions, and off-topic queries. Redirects to LinkedIn or email for things the bot shouldn't handle.

I spent more time tuning the system prompt than writing the Lambda function. The retriever picks the right context; the prompt decides what to do with it. Get the prompt wrong and you get a chatbot that confidently fabricates project details.

The retrieval strategy

I chose TF-IDF over vector embeddings for a specific reason: cost. Embedding models charge per token, and for a corpus this small (roughly 15 documents covering projects, skills, and certifications), TF-IDF gives perfectly adequate retrieval quality at zero marginal cost.

The index is precomputed at build time and stored as a JSON file in S3. At query time, the Lambda function loads the index, scores the query, and returns the top 3 chunks. Total retrieval time: under 50ms.

If the corpus grew to hundreds of documents, I'd switch to an embedding model (Titan Embeddings on Bedrock or a self-hosted model). But for a portfolio-sized knowledge base, TF-IDF is the right tool.

Cost model

This matters because "I built an AI chatbot" sounds expensive. It isn't.

Lambda: ~$0.20/month at current traffic (free tier covers most of it)
API Gateway: ~$0.10/month
S3: Negligible (a few KB of JSON)
DynamoDB: On-demand pricing, ~$0.05/month
Bedrock (Llama 3): ~$0.50-2.00/month depending on traffic. Input tokens are cheap; output tokens are the variable.

Total monthly cost: under $3. That's less than a single cup of coffee in Dubai. The serverless architecture means zero cost when nobody's asking questions, and it scales automatically when traffic spikes (like after posting on LinkedIn).

Three things I'd do differently

1. Add streaming responses

Currently, Sensei waits for the full LLM response before sending it to the client. This creates a noticeable delay (2-4 seconds) that feels slow in a chat interface. Streaming via WebSocket or Server-Sent Events would make the experience feel instant. API Gateway now supports WebSocket APIs, so this is straightforward to implement.

2. Track which questions go unanswered

When the retriever can't find relevant context, Sensei says "I don't have information about that." But I'm not logging these misses. If recruiters keep asking about a specific topic and getting no answer, that's a signal I need to add that content to the knowledge base. A simple CloudWatch metric on "no context found" responses would close this gap.

3. A/B test the system prompt

I iterated on the prompt manually, but I never ran structured experiments. Does a more conversational tone increase engagement? Does adding "ask me about X" suggestions at the end of responses drive follow-up questions? These are testable hypotheses that I should instrument.

What this project demonstrates

Sensei is a small project, but it touches every layer of a production AI system: data preparation, retrieval, prompt engineering, LLM integration, cost management, and user experience. It's live on my homepage right now — try it by clicking the chat icon in the bottom-right corner.

If you're evaluating whether to build a similar internal assistant — for customer support, internal knowledge bases, or onboarding — the architecture is the same. The hard part isn't the infrastructure; it's the prompt design and retrieval quality. Get those right and the rest follows.

Questions or want to discuss a similar build? Reach out: contact@ousseinioumarou.com · LinkedIn: @marubozu