Stack Overflow - Site Reliability Engineer - Data Platform job opening

Stack Overflow

Site Reliability Engineer - Data Platform

Posted on 28 September, 2022

Job Details

🌍 Location

United States

⚗️ Experience Level


⏳ Employment Type

Full Time

💼 Job

Site Reliablity Engineer

📍 Remote Policy


🤓 Tech


💰 Salary

$149K - $201K

About Stack Overflow

Stack Overflow is a question and answer website for professional and enthusiast programmers.

The Position

We are targeting GMT +1 to GMT -8 in the countries identified with core overlapping hours of 10am-1pm Eastern Standard

At Stack Overflow, our mission is to serve developers. We build products that make millions of developers’ lives better every day. Our goal is to create a community and a company where every developer feels welcome to learn, share their knowledge, and build their careers.

Stack Overflow is growing fast, and our infrastructure needs keep getting bigger as our products scale and grow. Our Data Platform team needs an SRE dedicated to facilitating the build-out of our cloud-based Data Pipeline. The Data Pipeline will ingest data from many different sources and enable users across the enterprise to use this data to drive business value. We have deployed a basic pipeline in Azure - but we are gearing up to expand our scope: 20 different data sources, event-based triggers, CICD for infrastructure and data transformations, and CTCD for Machine Learning models.

As an SRE dedicated to the Data Platform, you would be a liaison between the SRE and Data Platform teams - ensuring that the standards and practices from the SRE team are applied on the Data Platform and that the data-centric needs of the Data Platform are represented among the SREs.

We are looking for someone with experience managing data in an Azure environment. Our pipeline needs to be managed as code (Infrastructure as Code) - and you should be comfortable deploying a variety of services; as we grow, the list of services we use will grow, but here is an idea of what we might leverage in the next couple of years: Azure Data Factory, Azure Databricks, Cognos, Azure Event Hub, Kafka, Power BI, Looker, and Fivetran.

What you’ll work on:

  • Partner with Data Engineers on Data Platform to understand data movement intent and help craft deployment practices
  • Advocate for Azure Cloud-best practices and speak-up when Data Platform is deviating from best-practices
  • Reduce toil through software solutions and removing or automating manual tasks, steps and workflows as we further streamline deployments and upgrades.
  • Improve the observability of our systems to help identify issues or bottlenecks by iterating on our monitoring and alerting strategies.
  • Improve our security patching and compliance strategy for cloud solutions.
  • Participate in our on-call rotation.
  • Partner closely with your peers to accomplish goals within an agile software development lifecycle.

Our current ecosystem includes:

  • Microsoft Azure
  • Terraform, PowerShell, Go
  • Windows Server, IIS and .NET Core
  • Linux
  • SAAS: Azure DataBricks and Snowflake
  • Our toolchain includes: GitHub, TeamCity (CI), CircleCI, Octopus Deploy, ElasticSearch, Redis, Argo
  • In the works: Containerization with Kubernetes

Skills & Requirements

If you don't meet all of these exact qualifications, we encourage you to apply anyway!

We’re looking for:

  • Experience writing software solutions in a high-level programming language (for example, but not limited to, Python, Golang, C#).
  • An understanding of software development lifecycle phases, from planning and development through production deployment and monitoring.
  • Willingness to learn new technologies and adapt to changing priorities.
  • Eagerness and ability to work with different types of functional groups, share knowledge, collaborate and contribute. This is particularly important given our remote first environment.
  • Demonstrated understanding of basic concepts in a cloud environment.

We like to see:

  • Experience with scripting languages (Bash, Powershell).
  • Experience with Azure or equivalent Amazon AWS, Google Cloud, etc.
  • Experience with automating repetitive tasks.
  • SQL experience (Microsoft SQL Server or Azure SQL a plus)
  • Experience with terraform or similar IaC tools
  • Containerization technologies (Docker, Swarm, Kubernetes)
  • An understanding of service level indicators and service level objectives

What you’ll get in return:

  • Competitive Base Salary between $149K and $201K USD
  • 20 days paid vacation
  • Generous parental leave (16 weeks at 100% pay), family care leave, and paid sick days
  • Stock options
  • Completely free health insurance (no copay, no premiums)
  • 401K match
  • Gym membership reimbursement
  • Employees will never be poked with a sharp stick

If your role is not located in one of our offices…. We’ll reimburse you up to $2,000 to set up a great home office.

If you want to work in our office… You’ll be in our headquarters in New York City, and enjoy additional benefits like free lunches, transportation reimbursement, and all the espresso you can drink.

Work Environment:

We’re a remote-friendly team. Whether you work remotely or work out of our New York office (re-opening voluntarily Fall 2021), you’ll be part of a remote work culture that emphasizes online communication (Slack, GitHub, Hangouts, Zoom, Stack Overflow for Teams).

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

Apply →
Apply for Site Reliability Engineer - Data Platform

Please mention Software Engineer Jobs on the application 🙂