Wildbit is the team behind Postmark, Beanstalk, DMARC Digests, and People-First Jobs. We’re self-funded, profitable, and have been working remotely since 2000. We believe that businesses are human, and our team is committed to proving you can grow a profitable company while prioritizing people first. We exist to support our team, and our values and products reflect the care and intention that we operate with. You would be joining a team of ~30 people, where you would have the opportunity to innovate on the technical foundation of all our products.
Watch this 2-minute video from Amy Chantasirivisal, our Director of Engineering, to hear more about the role and the company:
Being both remote-first and having 32-hour/4-day workweeks means that we need to be ruthless about prioritizing time and asynchronous communication. We're sure you have lots of questions about this role, so Amy (the hiring manager) is publicly answering them in this FAQ.
We are in the process of transitioning our software from co-located mixed-OS environments to cloud-native applications running exclusively on Linux. Our largest backend stack is primarily written in C#/.NET, and some of the technologies and platforms you will be interfacing with on a daily basis include: AWS (Lambda, RDS, ECS, etc.), Google Cloud Platform, MySQL, Elasticsearch, RabbitMQ, Benthos, Kafka, Grafana, Kibana, and more.
The Senior Site Reliability Engineer role is geared towards someone who thinks about engineering operations as the foundation for all software that is built at the company, agnostic of the product. The purpose of this role is to help accelerate and amplify the efforts of any individual engineer on the team by the effective introduction of tools, infrastructure, automation, and process. This role is pivotal in increasing the team’s velocity, building more confidence when we ship code, and ensuring we have controls for both proactive and reactive issue management.
An SRE’s role at Wildbit is to reduce the cognitive load of running software, so that engineers, in turn, can deliver on the promise to help our customers succeed. As such, the success of an SRE is measured in an engineer’s ability to contextualize and operate their systems with as little friction as possible.
What this role isn’t: We believe all engineers need to be able to operate, monitor, and secure the code they ship. This role is not about maintaining or monitoring software written by other engineers, but rather, removing barriers from performing those tasks.
- Work with our architect to align systems and tools with the product and software needs
- Work with Product and Customer Success to educate and advocate for privacy and security improvements in our products
- Increase system resilience
- Own the Incident Response process and SLO metrics
- Long-term system capacity planning
- Define and drive consistency in the developer toolset across different tech stacks, product lines, and product maturity levels
- Mentor engineers on how to observe and support their software
Initial goals & projects
- Develop a vision, strategy, and roadmap for how engineers build, test, and deploy their code (taking into account the various products we build and support)
- Refine our vulnerability disclosure and remediation processes
- Implement tools that increase system observability
Wildbit engineers have a tremendous amount of autonomy on the technical direction of our product, but are held to a high standard for the reliability, maintainability, and usability of code they write. With this autonomy, individual engineers are expected to take a big picture view of our products and consider future directions to develop the best solution for today, while working with the constraints associated with growing successful products.
You will be a great fit for this role if your technical philosophy is informed by pragmatism, and driven by a desire to execute. You have strong opinions, loosely held, and approach your conversations with a desire to address the root cause of an issue.
This role is a blend of technical leadership and strategy, coupled with hands-on implementation.
Ideally, you have:
- A passion for cloud-based platforms, managed system infrastructure, and system automation tools
- Experience designing, building, and operating large systems with varying scalability, availability, and performance requirements
- Experience with various deployment architecture paradigms, such as zero downtime deploys, canary servers, etc.
- Experience with containers and container management/orchestration
- Built out tools and workflows that support Twelve-Factor app principles