Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Grafana @ 4 Prometheus @ 4 Redis @ 4 Python @ 7 Java @ 4 NoSQL @ 4 RDBMS @ 4 Datadog @ 4 Leadership @ 6 API @ 4 Reporting @ 4 Splunk @ 4 Cassandra @ 4 Compliance @ 4Details
For over 25 years, NVIDIA has been revolutionizing computer graphics, PC gaming, and accelerated computing. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
At NVIDIA, we are seeking a highly skilled Senior Engineer Operations Manager to join our world-class NGC Cloud team. In this role, you will help drive the efficiency, reliability, and scalability of the systems that power our global business operations. This is an exceptional opportunity to shape how we automate, streamline, and support critical operational workflows across the organization. You will define how we implement innovative automation and support solutions, enabling teams to operate seamlessly and deliver impact at global scale—all within an encouraging and inclusive environment.
Responsibilities
- Lead, mentor, and develop a team of 4-8 engineers, providing technical guidance, performance feedback, and career development opportunities
- Build and implement comprehensive monitoring, alerting, and reporting solutions using industry-standard tools
- Develop and maintain automation pipelines to streamline operational workflows and reduce manual overhead
- Coordinate incident, problem, and process adjustment procedures in alignment with ITSM guidelines
- Collaborate with multi-functional teams to identify operational difficulties and implement solutions
- Build and maintain internal operational tools and frameworks that enhance team productivity
- Ensure alignment with security and compliance standards across all operational systems and processes
- Define key performance indicators and metrics to measure operational health and team performance
Requirements
- BS/MS in Computer Science or a related technical field, or equivalent experience, combined with 8+ overall years of hands-on experience building, supporting, and managing complex services and infrastructure
- Proven track record of 4+ years of leadership/management experience in a technical environment
- Strong proficiency in Python for automation, data handling, and tool development
- Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, CloudWatch, or Splunk
- Demonstrated expertise in ITSM practices, including incident, problem, and process improvement
- Ability to implement secure and compliant offboarding procedures and manage access-related tasks
- Strong understanding of IT operations, system workflows, and operational standards
- Core knowledge of Java, including Collections API, Streams API, Concurrency, and I/O
- Solid understanding of RDBMS and NoSQL databases, with hands-on experience in Cassandra, DynamoDB, or Redis
Ways to stand out
- Experience designing or implementing end-to-end automation pipelines and internal operational tools
- Prior experience in security-conscious or compliance-heavy environments (financial services, healthcare, SaaS, etc.)
- Expertise in creating comprehensive monitoring solutions, custom dashboards, and automated reporting mechanisms
- Track record of success in fast-paced, high-growth environments with constantly evolving operational needs
- Strong documentation habits and demonstrated commitment to continuous improvement and knowledge management
Compensation & Benefits
- Base salary range: 272,000 USD - 425,500 USD (base determined by location, experience, and pay of employees in similar positions)
- Eligible for equity and benefits
- NVIDIA offers a comprehensive benefits package; details at: www.nvidiabenefits.com/
Additional Information
- Applications for this job will be accepted at least until December 6, 2025
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment