Used Tools & Technologies
Machine Learning GPURequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Go @ 4
Linux @ 4
DevOps @ 8
Python @ 4
Java @ 4
Hiring @ 4
Leadership @ 4
SRE @ 8
React @ 4
Jira @ 4
ServiceNow @ 4
API @ 4
LLM @ 4
AI @ 4
Agentic AI @ 4
RAG @ 4
Data Pipelines @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
For over 25 years, NVIDIA has been at the forefront of transforming computer graphics, PC gaming, and accelerated computing, driven by a legacy of continuous innovation and exceptional talent. We are now leveraging the immense potential of AI to usher in the next era of computing, where our GPUs power the "brains" of computers, robots, and autonomous vehicles that can comprehend the world. This pioneering work demands vision, innovation, and the world's best talent. Join our diverse and supportive environment, where NVIDIANs are inspired to excel and make a profound global impact.
We're hiring a Senior Staff Software Engineer to own the engineering efforts across NVIDIA enterprise systems. You'll partner with IT leadership to transform reactive support into strategic, AI infused automated resolution systems and prevent problems before they occur, balancing speed, security, and an exceptional user experience for NVIDIAs.
Responsibilities
- Design and implement agentic AI workflows using LLM-based agents, tool calling, RAG patterns, and orchestration frameworks. Push the boundaries of what AI-assisted operations can achieve.
- Build robust integrations and automation pipelines across ServiceNow, identity management, monitoring platforms, and enterprise SaaS. Own the full stack from infrastructure to user facing tools.
- Triage and resolve Enterprise issues with a focus on automation and improving mitigation and resolution times.
- Manage and troubleshoot Enterprise scale collaboration, productivity, AI and Infrastructure systems.
- Trace and root cause complex, multi system failures. Identify patterns in recurring tickets, and build automation or self-service solutions.
- Build and maintain runbooks, troubleshooting guides, and knowledge base articles that elevate team capabilities.
- Mentor team members on troubleshooting methodology and systems thinking.
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, IT, or related field (or equivalent experience).
- 12+ years overall experience in SRE, Enterprise Support or DevOps.
- Experience with SaaS, hybrid cloud, AI/ML environments.
- Experience building production grade agentic workflows (e.g., multi-agent systems and MCP servers).
- Software engineering fundamentals with deep experience in building products and operating large scale systems.
- Expertise in two or more backend languages such as Go, Python, or Java with a track record of owning complex production systems.
- Full stack engineering experience, including building user-facing web applications and operational dashboards using modern frontend frameworks such as React.js, along with backend APIs and data pipelines.
- Systems thinker who naturally traces dependencies, considers second-order effects, and asks "why did this break?" not just "how do I fix it?".
- Strong incident management skills: triage, root-cause analysis, blameless postmortems, pattern recognition.
- Expert troubleshooting across Enterprise hybrid stack such as Jira, Microsoft, OS [Apple, Linux, and Windows], Infrastructure systems such as compute, AI, and storage.
Compensation & Additional Information
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.
You will also be eligible for equity and benefits. Applications for this job will be accepted at least until April 21, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.