I'm Saad Aamir,
a software
engineer
building toward
AI safety.
I build full-stack systems and LLM tooling, including a Model Context Protocol server for my studio. Moving toward empirical work on the interpretability and oversight of frontier models.

What I do
I build software and the AI tooling
that operates it.
Three areas where I spend most of my time: engineering production systems, integrating language models into real workflows, and moving toward empirical research on how those models behave.
Full-Stack Engineering
Production systems across Python, TypeScript, and Node: analytics dashboards, internal portals, REST APIs, and cloud infrastructure on AWS. 3+ years shipping at MobileLIVE and independently.
AI & LLM Tooling
Practical integrations of language models into existing systems: MCP servers, RAG pipelines with LangChain and FAISS, structured outputs with Pydantic, and grounded workflows that don't hallucinate.
Research Direction
Moving toward empirical AI safety research: mechanistic interpretability, behavioral evaluations, scalable oversight. Currently self-studying transformer internals and applying to research-focused programs.

About
More about me.
I'm a software engineer based in Rawalpindi, Pakistan. I graduated from NUST in 2023 and spent the next two and a half years at MobileLIVE, a Canadian tech consultancy, building full-stack features across analytics dashboards, internal portals, and AI-integrated workflows.
In early 2026 I started Dark Matter Studio, an independent web design and development practice. Most of my own work these days is on LLM systems, the plumbing that makes model integrations reliable. I built Dark Matter Co-Pilot, a Python MCP server that lets Claude reach into the studio's pipeline as typed tools. I also built a sycophancy evaluation harness with a hand-validated judge and proper statistics behind it.
What pulls me most is empirical work on how models behave. Building evaluations carefully, measuring what models actually do instead of what they look like they're doing, and staying honest about what the data can support. That's the direction I'm steering toward, one project at a time.
Experience
3+ years of engineering, shipping,
and learning.
JAN 2026→PRESENT
Founder & Full Stack Developer
Dark Matter Studio · Independent
Running an independent web design and development studio. Shipping Next.js sites for clients across fitness, finance, and creative industries. I also build my own LLM tooling: Dark Matter Co-Pilot, a Python MCP server that exposes studio operations to Claude, and a sycophancy evaluation harness measuring how language models hold up under user pushback.
JUL 2023→DEC 2025
Software Engineer (Full Stack)
MobileLIVE · Canadian Consultancy · Remote
Owned full-stack feature development on Geotab Fleet Analytics: React dashboards backed by Node.js and TypeScript REST services, reducing query latency around 40%. Refactored the RAB and RDA Lighting portals from .NET 6 to .NET 8, introducing repository pattern and dependency injection to a legacy codebase. Integrated LLM-backed features into the PRF Portal, cutting manual workflow steps by half. Deployed and maintained AWS infrastructure with CI/CD via GitHub Actions.
JUL 2020→SEP 2020
Web Development Intern
ZSystems · Lahore
Built responsive frontends and Node.js/Express services in agile sprints with senior engineers. Shipped internal tooling for logistics clients and gained hands-on exposure to production deployment pipelines.
SEP 2019→JUN 2023
BS Computer Science
NUST · Islamabad
Final year project: NLP-based automated code review system using transformer models. Research assistant in the AI lab, working on sequence modeling for structured prediction tasks.
Selected work
Projects shipped and currently building.
Dark Matter Co-Pilot
shippedMCP server that gives Claude access to real studio data
An MCP server that connects Claude Desktop to my studio's operations data (past client work, live leads, voice docs, and outreach templates), so I can draft outreach, update leads, and query case studies through natural conversation.
Sycophancy Evals
shippedEmpirical eval of LLM dispositional behavior under user pushback
A controlled experiment measuring how often two same-scale open-weight language models (Llama 3.1 8B, Qwen 2.5 7B) reverse correct answers under user pushback. Hand-validated LLM-as-judge, question-level bootstrap CIs, 1,800 conversations per model.
AI Resume Matcher
shippedLLM-powered resume-to-JD alignment tool
Parses job descriptions and resumes using structured LLM extraction, then scores alignment across skills, experience, and tone. Reduced manual screening time significantly for early hiring pipelines.
Dark Matter Studio
shippedLanding page for my web studio
The public site for Dark Matter Studio, my small web studio based in Pakistan. Headline: "Websites That Turn Visitors Into Clients." Built for speed and conversion. No CMS, just clean Next.js and Tailwind.
Toolkit
What I work with day-to-day.
Writing
Things I've been thinking about.
All posts on SubstackLlama folds when you sound vague. Qwen folds when you sound specific.
What I learned about LLM evals by getting fooled three times in a row.
Tools should return data, Language models should return language
How I stopped trying to make my MCP tool write emails and let Claude do its job.
Why 23 strangers in a room are more interesting than they look
Walk into a room with 22 other people. There's a better than 50% chance two of you share a birthday. That sounds wrong.
Why nice guys finish first (sometimes)
The Prisoner's Dilemma, Axelrod's tournament, and how cooperation actually wins.
Contact
Let's work together.
Open to research collaborations, consulting engagements, and senior engineering roles, especially anything adjacent to AI safety or applied ML infrastructure. Dark Matter Studio is also available for select client projects.