Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJās Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, weāre committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.
The Benefits of working at BJās
ā¢Ā Ā Ā Ā Ā Ā Ā BJās pays weekly
ā¢Ā Ā Ā Ā Ā Ā Ā Ā Eligible for free BJ's Inner Circle and Supplemental membership(s)*
ā¢Ā Ā Ā Ā Ā Ā Ā Generous time off programs to support busy lifestyles*Ā
Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty
ā¢Ā Ā Ā Ā Ā Ā Ā Benefit plans for your changing needs*
Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā o Three medical plans**, Health SavingsĀ Account (HSA), two dental plans, vision plan,flexible spending ā
ā¢Ā Ā Ā Ā Ā Ā Ā 401(k) plan with company match (must be at least 18 years old)
*eligibility requirements vary by position
**medical plans vary by location
The Head of IT Operations & Service Excellence is the strategic and operational leader responsible for uptime and resiliency of systems across BJās digital and enterprise technology landscape (across applications, infrastructure and security) to provide worldāclass experiences to our members and team members. The role sets the "northāstar" for what āgoodā looks like ā defining and publishing serviceālevel objectives (SLOs/SLIs) and operational key results ā while building the organizational muscle to deliver them consistently. Reporting to the VP of Infrastructure & Operations, this leader balances realātime incident response with multiāyear serviceāreliability vision, enabling teams to see the forest through the trees and make dataādriven tradeāoffs.
Key Responsibilities
Strategic Leadership
- Define and execute the multiāyear IT Service Excellence maturity roadmap aligned to business objectives, cloud migration plans, uptime and resiliency requirements.
- Craft multiāyear resiliency and costāoptimization roadmap aligned to company growth goals.
- Implement IT operations best practicesĀ
- Collaborate with product development teams and influences them to ensure reliability and scalability are considered at the design phase.
- Partner with Enterprise Architecture to define standards for building reliable applications that are highly available and resilient.
- Define Service Level Objectives (SLOs), Service Level Indicators (SLIs) for all critical services.
- Foster a highātrust, blameless culture that rewards learning, experimentation, and excellence.
- Own the IT Operations & Service Excellence budget; optimize OpEx through automation, selfāservice, and vendor management.
IT Operations & Incident Management (24Ć7 Command Center, NOC & Service Desk)
- Oversee realātime monitoring, incident triage, and majorāincident management ensuring MTTR and communications SLAs are met.
- Maintain a highāperforming L1 Service Desk; drive call deflection via knowledge, AI chatbots, and selfāservice password reset.
- Publish operational metrics (MTTA, MTTR, FCR, abandon rate) with actionable insights.
- Lead the major incident management function, including defining escalation paths, coordinating cross-functional teams, and ensuring timely communication to stakeholders
- Oversee the entire incident lifecycle, from identification and triage to resolution and post-incident analysis, ensuring efficient and effective processes are in place.Ā
- Manage on-call rotations and ensure 24 by 7 coverage with major incident managers
- Ensure a robust playbook is developed and followed during a MIM process with clearly assigned roles, communication protocols and a well defined triaging process
- Matrix management of people, processes and resources including third parties ā including resolving conflict to move forward to resolution
Change & Release Governance
- Chair the Change Advisory Board (CAB); uphold 99%+ change success while accelerating deployment velocity.
- Implement riskābased change classification; Ensure thoroughness of end to end testing, automated preādeployment checks, rollback processes in place and postāimplementation reviews.
Service Reliability Engineering (SRE) & Observability
- Develop and implement SRE policies, standards, and best practices for enterprise-wide systems.
- Lead SRE squads covering AWS, colocation data centers, network/edge, and SaaS platforms.
- Set error budgets, reliability targets, and chaosāengineering practices; ensure recovery time and point objectives (RTO/RPO) meet or exceed DR objectives and business expectations.
- Work with Service managers overseeing SRE functions for Digital, Membership, Enterprise, and Club & Fuel systems and deliver integrated SRE.
- Drive endātoāend service design ā service maps, dependency graphs, support models ā to complement observability tooling.
- Lead the roadmap for logging, metrics, tracing, and AIOps platforms, delivering actionable insights and predictive alerting.
Engineering Excellence and Practices:
- Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
- Continuously work on enhancing the reliability, stability, and performance of our key platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
- Advance problem detection and ensure service restoration processes are well defined
- Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, improve both cloud-based and on-premises systems, thereby fortifying our digital infrastructureās robustness and efficiency
Process Ownership & Continuous Improvement
- Codify SOPs and RACI matrices across Ops, SRE, Service Desk, and engineering partners to drive clarity of ownership.
- Lead Lean/Kaizen initiatives that reduce toil and amplify engineering productivity.
- Track and report OKRs; courseācorrect based on data.
- Drive rootācause analysis (RCA) and problem management; close systemic gaps and prevent recurrence of major incidents.
Compliance, Security & Risk
- Partner with Cybersecurity and Compliance teams to meet PCIāDSS, SOX, and dataāprivacy obligations.
- Ensure operational controls withstand internal and external audits.
People Development
- Possess robust technical expertise and leadership qualities to lead by example with a proven track record in Site Reliability Engineering
- Foster a culture of psychological safety, empowerment, and continuous learning.
- Coach and develop managers; Build, mentor, and retain organization spanning Service Desk, Command Center, SRE, Change Governance, Problem Management and Analytics.
Required Qualifications
- Bachelorās degree in Computer Science, Engineering, or related discipline (Masterās preferred).
- 15+ years of progressive IT Operations leadership with 5+ years at a Director/Head level supporting largeāscale, Retail and distributed environments.
- Proven track record of leading teams through complex system outages and scalability challenges.
- 5+ years of proven oversight of 24Ć7 operations (NOC, Service Desk) and SRE or DevOps functions.
- Proficiency in system design and architecture, particularly in a cloud environment.
- Demonstrated success operating hybrid cloud (AWS) and onāprem dataācenter environments.
- Expertise with ITIL v4/Service Management frameworks; ITIL certification strongly desired.
- Experience implementing observability, AIOps, and automation platforms (e. g., ServiceNow, Ops Ramp, SolarWinds, New Relic, PagerDuty).
- Outstanding communication skills and executive presence; able to brief Cāsuite on risk and performance.
Preferred Qualifications
- Retail industry experience managing store, fuel, and distribution center technologies.
- Certifications in ServiceNow.
- Lean Six Sigma or Continuous Improvement accreditation.
Leadership Competencies
- Strategic Thinking / āForestāThroughātheāTreesā: Articulates longāterm vision while executing tactically under pressure.
- Influence & Communication:
- Excellent verbal and written communication skills. Experience presenting to C-level executives and stakeholders.
- Translates technical concepts into business outcomes for executives and frontline associates.
- Servant Leadership: Builds inclusive teams and empowers others to experiment and learn.
- Accountability: Holds self and teams to high standards; measures what matters.
- Change Catalyst: Leads through ambiguity, driving adoption of new ways of working.
Work Environment & Travel
- Hybrid work model (Westborough, MA HQ) with periodic visits to colocation data centers, distribution centers, and club locations. Afterāhours or weekend availability required for major incidents or change windows. Occasional travel (<10%) to BJās club locations and technology partners.
This is a hybrid role. Tuesday through Thursday are in-office days at BJ's Club Support Center in Marlborough, MA and Monday and Friday are remote days.
In accordance with the Pay Transparency requirements, the following represents a good faith estimate of the compensation range for this position. At BJās Wholesale Club, we carefully consider a wide range of non-discriminatory factors when determining salary. Actual salaries will vary depending on factors including but not limited to location, education, experience, and qualifications. The pay range for this position is starting from $179,000.00.