Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Grafana @ 4 Prometheus @ 4 Redis @ 4 Python @ 4 Java @ 4 NoSQL @ 4 RDBMS @ 4 CI/CD @ 4 Datadog @ 4 Communication @ 4 Dashboarding @ 4 API @ 4 Reporting @ 4 Customer Support @ 4 Splunk @ 4 Cassandra @ 4 Compliance @ 4Details
At NVIDIA, we are seeking a highly skilled Senior Operations Engineer to join our world-class NGC Cloud team. In this role, you will help drive the efficiency, reliability, and scalability of the systems that power our global business operations. This is an exceptional opportunity to shape how we automate, streamline, and support critical operational workflows across the organization. You will define how we implement innovative automation and support solutions, enabling teams to operate seamlessly and deliver impact at global scale—all within an encouraging and inclusive environment.
Responsibilities
- Drive day-to-day interactions with NVIDIA-wide IT subsystems, ensuring smooth operational workflows across infrastructure and applications.
- Craft and maintain GitLab CI/CD pipelines to automate build, test, and deployment workflows.
- Monitor system health, build and maintain dashboards, create alerts, and produce operational reports.
- Perform user offboarding, access reviews, and compliance-related tasks across multiple systems.
- Drive interactions with various IT subsystems, ensuring API performance and integration stability meet defined SLAs and SLOs.
- Coordinate changes and releases between engineering, operations, and security teams.
- Enforce security guidelines, manage vulnerability remediation, and collaborate with security teams on audits and assessments.
- Maintain documentation, SOPs, and process improvements to enhance operational maturity.
Requirements
- 8+ years of hands-on experience building/supporting complex services; BS/MS in Computer Science or equivalent experience.
- Knowledge of Python for automation, data handling, and tool development.
- Experience with monitoring tools such as Prometheus, Grafana, Datadog, CloudWatch, and Splunk, including reporting and dashboarding.
- Familiarity with ITSM practices, including incident, problem, and modification/change processes.
- Ability to perform secure and compliant offboarding and access-related tasks.
- Strong understanding of IT operations and system workflows.
- Knowledge of core Java (Collections API, Streams API, Concurrency, I/O).
- Knowledge of RDBMS and NoSQL databases (Cassandra, DynamoDB, Redis).
- Excellent communication skills and ability to collaborate across multiple teams.
- Strong documentation, problem-solving, and cross-team alignment skills.
Ways to stand out
- Experience designing or implementing automation pipelines or internal operational tools.
- Background in customer support, technical support, or customer-facing engineering roles.
- Prior work in a security-conscious or compliance-heavy environment.
- Ability to build end-to-end monitoring solutions, dashboards, and automated reporting.
- Strong documentation habits and a continuous-improvement approach.
Compensation & Other Details
- Base salary range: 184,000 USD - 287,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (see https://www.nvidia.com/en-us/benefits/).
- Applications accepted at least until December 6, 2025.
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. NVIDIA does not discriminate based on protected characteristics and values diversity in its employees.