Playbooks over heroics
Reliable systems aren't built by on-call heroes. They're built by teams with clear runbooks, defined escalation paths, and the discipline to follow them at 2am.
Years of running infrastructure at scale, building on-call culture from scratch, and turning operational firefighting into repeatable playbooks. infrawei is where that experience meets your team.
I'm Martynas Sklizmantas. I've spent years scaling systems through rapid growth, building on-call culture from scratch, and turning operational firefighting into repeatable playbooks.
infrawei is the independent practice where I apply those lessons. I work with infrastructure teams who have outgrown ad-hoc ops but aren't ready (or don't want) to build a full SRE function in-house. The retainer model is built for deep ongoing engagement over months and quarters.
My work sits at the intersection of systems reliability, cost engineering, and team effectiveness. Operational excellence is a discipline practiced over time. The best infrastructure work is invisible to everyone except the team that built it.
Strategic frames for every engagement.
Reliable systems aren't built by on-call heroes. They're built by teams with clear runbooks, defined escalation paths, and the discipline to follow them at 2am.
Automation without measurement is just faster chaos. Baselines come first, then SLOs, then the improvements you track them against. Code comes after.
The best SRE work changes how teams think alongside how systems behave. Operational excellence is owned across the team. The on-call engineer carries it on duty; the rest of the team carries it the other 167 hours.
Distributed Systems Architecture. Cloud Cost Engineering. CI/CD Pipeline Design. Infrastructure as Code. Kubernetes and Container Orchestration.
Incident Response and Post-Mortems. SLO/SLA Design and Tracking. On-Call Architecture. Observability and Monitoring. Chaos Engineering.
Built on-call rotations, incident response frameworks, and cost optimization practices that reduced cloud spend while scaling throughput across rapidly growing organizations.
Various technical leadership roles focused on building resilient infrastructure, designing distributed systems, and developing high-performing engineering teams.
My thinking is shaped by systems theory, complexity science, and the engineering practice itself. I maintain an active reading list and publish insights on operational excellence regularly.
If the operational floor of your infrastructure is more of a question than an answer, the diagnostic call is the right place to begin.