Hello,

I'm Saad Aamir,
a software
engineer
building toward
AI safety.

I build full-stack systems and LLM tooling, including a Model Context Protocol server for my studio. Moving toward empirical work on the interpretability and oversight of frontier models.

Saad Aamir
33.6°N 73.0°E2024

What I do

I build software and the AI tooling that operates it.

Three areas where I spend most of my time: engineering production systems, integrating language models into real workflows, and moving toward empirical research on how those models behave.

Full-Stack Engineering

Production systems across Python, TypeScript, and Node: analytics dashboards, internal portals, REST APIs, and cloud infrastructure on AWS. 3+ years shipping at MobileLIVE and independently.

AI & LLM Tooling

Practical integrations of language models into existing systems: MCP servers, RAG pipelines with LangChain and FAISS, structured outputs with Pydantic, and grounded workflows that don't hallucinate.

Research Direction

Moving toward empirical AI safety research: mechanistic interpretability, behavioral evaluations, scalable oversight. Currently self-studying transformer internals and applying to research-focused programs.

Saad Aamir
02 / About
Rawalpindi, PK@saadaamir

About

More about me.

I'm a software engineer based in Rawalpindi, Pakistan. I graduated from NUST in 2023 and spent the next two and a half years at MobileLIVE, a Canadian tech consultancy, building full-stack features across analytics dashboards, internal portals, and AI-integrated workflows.

In early 2026 I started Dark Matter Studio, an independent web design and development practice. Most of my own work these days is on LLM systems, the plumbing that makes model integrations reliable. I built Dark Matter Co-Pilot, a Python MCP server that lets Claude reach into the studio's pipeline as typed tools. I also built a sycophancy evaluation harness with a hand-validated judge and proper statistics behind it.

What pulls me most is empirical work on how models behave. Building evaluations carefully, measuring what models actually do instead of what they look like they're doing, and staying honest about what the data can support. That's the direction I'm steering toward, one project at a time.

Experience

3+ years of engineering, shipping,
and learning.

JAN 2026PRESENT

Founder & Full Stack Developer

Dark Matter Studio · Independent

Running an independent web design and development studio. Shipping Next.js sites for clients across fitness, finance, and creative industries. I also build my own LLM tooling: Dark Matter Co-Pilot, a Python MCP server that exposes studio operations to Claude, and a sycophancy evaluation harness measuring how language models hold up under user pushback.

JUL 2023DEC 2025

Software Engineer (Full Stack)

MobileLIVE · Canadian Consultancy · Remote

Owned full-stack feature development on Geotab Fleet Analytics: React dashboards backed by Node.js and TypeScript REST services, reducing query latency around 40%. Refactored the RAB and RDA Lighting portals from .NET 6 to .NET 8, introducing repository pattern and dependency injection to a legacy codebase. Integrated LLM-backed features into the PRF Portal, cutting manual workflow steps by half. Deployed and maintained AWS infrastructure with CI/CD via GitHub Actions.

JUL 2020SEP 2020

Web Development Intern

ZSystems · Lahore

Built responsive frontends and Node.js/Express services in agile sprints with senior engineers. Shipped internal tooling for logistics clients and gained hands-on exposure to production deployment pipelines.

SEP 2019JUN 2023

BS Computer Science

NUST · Islamabad

Final year project: NLP-based automated code review system using transformer models. Research assistant in the AI lab, working on sequence modeling for structured prediction tasks.

Selected work

Projects shipped and currently building.

ClaudeDesktopMCP clientMCP protocolFastMCPServerPython +Pydantic v2MCP TOOLSlist_case_studiescreate_leadlist_leadsupdate_lead_statuslist_projectslist_clientsrecord_website_findinglist_website_findingsdraft_outreach_emailhello_studioqueriesMCP RESOURCESstudio://pricingstudio://positioningstudio://processSQLiteDatabaseclientsleadsprojectscase_studieswebsite_findingsMarkdownfilespricing.mdpositioning.mdprocess.mdshipped

Dark Matter Co-Pilot

shipped
2026

MCP server that gives Claude access to real studio data

An MCP server that connects Claude Desktop to my studio's operations data (past client work, live leads, voice docs, and outreach templates), so I can draft outreach, update leads, and query case studies through natural conversation.

PythonFastMCPSQLitePydantic v2
Dataset200q · 3 pushSolverAsk → PushLlama 3.1 8Bvia OllamaQwen 2.5 7Bvia OllamaSonnet 4.5judge93 hand-labeledvalidatesMAINTAINEDorNOT_MAINTAINED

Sycophancy Evals

shipped
2026

Empirical eval of LLM dispositional behavior under user pushback

A controlled experiment measuring how often two same-scale open-weight language models (Llama 3.1 8B, Qwen 2.5 7B) reverse correct answers under user pushback. Hand-validated LLM-as-judge, question-level bootstrap CIs, 1,800 conversations per model.

PythonInspectAnthropic APIOllamaPydantic
JobDescriptionResumeLLM Extractorstructured outputskills · seniority · tone87%alignment score

AI Resume Matcher

shipped
2023

LLM-powered resume-to-JD alignment tool

Parses job descriptions and resumes using structured LLM extraction, then scores alignment across skills, experience, and tone. Reduced manual screening time significantly for early hiring pipelines.

PythonFastAPIClaude APIReactTailwind
darkmatterstudio.ioView Work

Dark Matter Studio

shipped
2023

Landing page for my web studio

The public site for Dark Matter Studio, my small web studio based in Pakistan. Headline: "Websites That Turn Visitors Into Clients." Built for speed and conversion. No CMS, just clean Next.js and Tailwind.

Next.jsTypeScriptTailwindVercel

Toolkit

What I work with day-to-day.

languages
PythonTypeScriptJavaScriptC++C#SQL
ai / ml
MCPLangChainOpenAI APIFastAPIFAISSPydanticTensorFlow
backend
Node.jsExpressPostgreSQLMySQLMongoDBSQLiteDockerRESTJWT
frontend
ReactNext.jsReduxTailwindFramer Motion
cloud
AWSEC2LambdaS3CloudWatchCloudFormationGitHub ActionsVercel

Writing

Things I've been thinking about.

All posts on Substack

Llama folds when you sound vague. Qwen folds when you sound specific.

What I learned about LLM evals by getting fooled three times in a row.

Jun 18, 20269 min readAI Safety

Tools should return data, Language models should return language

How I stopped trying to make my MCP tool write emails and let Claude do its job.

Jun 11, 20265 min readMCP

Why 23 strangers in a room are more interesting than they look

Walk into a room with 22 other people. There's a better than 50% chance two of you share a birthday. That sounds wrong.

May 27, 20265 min readProbability

Why nice guys finish first (sometimes)

The Prisoner's Dilemma, Axelrod's tournament, and how cooperation actually wins.

May 23, 20265 min readGame Theory

Contact

Let's work together.

Open to research collaborations, consulting engagements, and senior engineering roles, especially anything adjacent to AI safety or applied ML infrastructure. Dark Matter Studio is also available for select client projects.

Hello there!

Welcome to Saad's portfolio.