Engineer II - Site Reliability Engineering - - 52243

Description & Requirements

who we are

lululemon is an innovative performance apparel company for yoga, running, training, and other athletic pursuits. Setting the bar in technical fabrics and functional design, we create transformational products and experiences that support people in moving, growing, connecting, and being well. We owe our success to our innovative product, emphasis on stores, commitment to our people, and the incredible connections we make in every community we're in. As a company, we focus on creating positive change to build a healthier, thriving future. In particular, that includes creating an equitable, inclusive and growth-focused environment for our people.

about this team

Site Reliability Engineering

We are looking for a motivated Engineer to join the Foundations team which is responsibility for observability and monitoring in Site Reliability Engineering, guiding the digital organization to improve the practice of reliability here at lululemon. We are a consultative enablement team providing guidance and support to product engineering teams for the development of high-quality and resilient software systems through the use of monitoring tools and practices. SRE partners with many product engineering teams across digital and beyond to infuse the concepts and practices of reliability into engineering process and deliverables. The Foundations team owns the management of our monitoring tools and the best practices for using those tools to provide total visibility into our systems. This role requires a vision and strategy for monitoring and how to manage it across a disparate organization.

As a SRE Engineer you will be responsible for designing, implementing, and maintaining robust monitoring solutions, creating insightful dashboards, identifying relevant metrics, and driving efficient problem management practices. You will help identify observability maturity opportunities and roadblocks to success for digital teams and clearing those roadblocks. You will partner closely with Product Owners and Scrum Masters to manage scope and strike a balance between support and investment work. You are expected to clearly communicate risks to your partners for deliverables.

core responsibilities

As an Engineer II on the SRE Foundations team, you are a technical contributor and domain leader in observability and reliability. Your day-to-day responsibilities include:

Observability & Monitoring

Design, implement, and optimize observability solutions across metrics, logging, and tracing.
Build and maintain dashboards and alerts (e.g., Datadog) that provide meaningful insight into system health and performance.
Define and support adoption of Service Level Objectives (SLOs), Indicators (SLIs), and error budgets.

Incident & Problem Management

Participate in and lead incident response efforts during major outages and critical events.
Support on-call rotations, particularly during key business events (e.g., product launches, holiday traffic).
Conduct and contribute to Root Cause Analyses (RCAs) and post-incident reviews, driving follow-up actions and long-term remediation plans.
Collaborate with partner teams to enhance incident playbooks, reduce mean time to detect (MTTD) and resolve (MTTR), and improve operational readiness.
Apply principles of the ITIL framework in areas such as incident, problem, and change management, ensuring alignment with organizational reliability goals.

Team Collaboration & Enablement

Partner with digital product teams to integrate observability best practices into their development and deployment workflows.
Identify tooling and knowledge gaps; champion improvements and automation initiatives that reduce toil and increase visibility.
Support product owners and engineering leads with prioritization between support, investment, and innovation work.
Mentor junior team members and advocate for team-wide knowledge sharing and continuous improvement.

Continuous Improvement & Strategic Contribution

Stay up to date with SRE and observability trends, helping to evaluate and adopt new tools and approaches.
Contribute to domain-level standards and practices within the broader technology organization.
Influence reliability strategy by sharing insights, performance metrics, and “what’s working/what’s not” feedback with senior engineers and technical leadership.

qualifications

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
5–8 years of experience in software engineering or SRE, with deep exposure to observability and monitoring.
Strong experience with observability tools such as Datadog, Splunk, and distributed tracing frameworks.
Proven track record in incident management, RCA facilitation, and on-call response — especially during critical peak traffic events.
Understanding of ITIL concepts including Incident, Problem, and Change Management.
Experience building and maintaining dashboards, alerts, and SLOs/SLIs.
Strong debugging and root cause analysis skills across complex distributed systems.
Excellent collaboration, documentation, and communication skills.
Familiarity with infrastructure-as-code (e.g., Terraform), Kubernetes, and cloud-native systems.
Relevant certifications (e.g., Certified Kubernetes Administrator, Terraform Associate) are a plus.

Bonus

Deep expertise in observability tooling (Datadog, Splunk).
Prior experience in e-commerce or high-availability digital platforms.
Background in product ownership or leading reliability-focused initiatives

must haves

Acknowledge the presence of choice in every moment and take personal responsibility for your life.
Possess an entrepreneurial spirit and continuously innovate to achieve great results.
Communicate with honesty and kindness and create the space for others to do the same.
Lead with courage, knowing the possibility of greatness is bigger than the fear of failure.
Foster connection by putting people first and building trusting relationships.
Integrate fun and joy as a way of being and working, aka doesn’t take yourself too seriously.

additional notes

Authorization to work in Canada is required for this role.

compensation and benefits package

lululemon’s compensation offerings are grounded in a pay-for-performance philosophy that recognizes exceptional individual and team performance. The typical hiring range for this position is from $105,800 - $138,800 annually; the base pay offered is based on market location and may vary depending on job-related knowledge, skills, experience, and internal equity. As part of our total rewards offering, permanent employees in this position may be eligible for our competitive annual bonus program and equity offerings, subject to program eligibility requirements.  

At lululemon, investing in our people is a top priority. We believe that when life works, work works. We strive to be the place where inclusive leaders come to develop and enable all to be well. Recognizing our teams for their performance and dedication, other components of our total rewards offerings include support of career development, wellbeing, and personal growth:

Extended health and dental benefits, and mental health plans
Paid time off
Savings and retirement plan matching
Generous employee discount
Fitness & yoga classes
Parenthood top-up
Extensive catalog of development course offerings
People networks, mentorship programs, and leadership series (to name a few)

Note: The incentive programs, benefits, and perks have certain eligibility requirements. The Company reserves the right to alter these incentive programs, benefits, and perks in whole or in part at any time without advance notice.

workplace arrangement

Hybrid

In-person collaboration and connection is important to our culture. Work is performed onsite, minimum 4 days per week.

#LI-Onsite #LI-AC1

Lululemon

Chúng tôi đã chờ đợi bạn

Engineer II - Site Reliability Engineering

Description & Requirements

Ready to join us?