Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 4
Grafana @ 4
Prometheus @ 4
CI/CD @ 4
Distributed Systems @ 6
gRPC @ 4
IaaS @ 4
Rust @ 4
Microservices @ 4
API @ 4
OpenTelemetry @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking a Senior Systems Software Engineer to join an advanced infrastructure software team to design, develop, and maintain high-performance, rack-scale management solutions for datacenter environments. The role focuses on systems-level development primarily in Rust, Go, and C++, bridging hardware, firmware, and cloud-native services.
Responsibilities
- Architect, implement, and maintain core components of an internally developed IaaS (Infrastructure-as-a-Service) product and related microservices primarily in Rust, C++, or Go.
- Develop and automate workflows for device discovery, firmware updates, and health monitoring using protocols such as Redfish and other BMC interfaces.
- Build and extend distributed microservices and gRPC APIs for rack management, supporting multi-rack, multi-tenant, and multi-site deployments.
- Implement telemetry collection, aggregation, and analysis pipelines using Prometheus, OpenTelemetry, and Grafana; contribute to Health-as-a-Service initiatives.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).
- 5+ years of experience in systems software engineering with a focus on distributed systems, software/firmware development, or infrastructure automation.
- Strong hands-on experience with Rust, Go, and C++ for systems-level development.
- Datacenter or computer architecture experience — understanding server, rack, and network topologies, and hardware/firmware/software interactions.
- Experience with hardware management protocols (Redfish, IPMI, BMC) and firmware update automation.
Ways to stand out
- Experience with rack-scale or data center management platforms, test automation, simulation/mocking frameworks, and CI/CD pipelines.
- Knowledge of hardware validation, health monitoring, and diagnostics (DCGM, nvbandwidth, Field Diag).
- Contributions to open-source infrastructure or systems software projects.
Compensation & Benefits
- Base salary ranges (location, experience, and level dependent):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and additional benefits (link to NVIDIA benefits referenced in original posting).
Additional information
- Applications accepted at least until March 22, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and committed to diversity and inclusion.