System Software Engineer, Distributed Systems

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

MIDDLE

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Go @ 3 Kubernetes @ 3 Linux @ 3 Python @ 5 R @ 3 Distributed Systems @ 3 Perl @ 3 Debugging @ 6 LLM @ 6 Observability @ 3 AI @ 3

Details

NVIDIA's VLSI Productivity and Infrastructure team builds tools and platforms that support 1000+ chip design engineers. The team focuses on long shelf-life systems spanning build automation, observability, analytics, automated error detection/remediation, and codebase modernization. The core workflow infrastructure runs as userspace software on bare-metal Linux hosts (no sudo, no containers), coordinates shared state and artifacts via NFS, and launches long-running, compute-heavy workflows on IBM LSF. This role is a pragmatic, generalist systems engineering position with emphasis on distributed systems and operational excellence in a "below containers" world: coordination, reliability, performance, and safe evolution of legacy systems (including incremental modernization into Go).

Responsibilities

Design, build, and deliver core components of next-generation productivity platforms
Develop reliable userspace infrastructure for long-running engineering workflows at scale on bare-metal Linux hosts
Build state coordination over NFS (atomicity, idempotency/dedup, partial-write recovery, without privileged ops)
Build and improve orchestration around IBM LSF (submission/tracking, retries/cancel, log capture, fairness/backpressure)
Convert legacy codebases into modern code (e.g., incremental migration from Perl to Go) with stage gates and parity strategies
Debug and improve performance and reliability across Linux and Kubernetes, including operational tooling
Collaborate with engineering users to turn ambiguous workflows into durable production systems

Requirements

B.S. in Computer Science/Electrical Engineering or equivalent experience
5+ years developing and operating production software in Go and/or Python, ideally in large codebases
Strong Linux fundamentals: processes, filesystems, permissions, synchronization/locks, concurrency, and debugging
Solid distributed-systems thinking: failures, retries/timeouts, backoff, idempotency, and operational rigor
Experience building long-runtime automation or services on shared compute clusters (batch schedulers, build systems)
Ability to translate high-level goals into safe delivery plans (instrumentation, staged rollout, measurable outcomes)

Ways to stand out

Hands-on experience with shared filesystems at scale (NFS) or coordination patterns on eventually-consistent storage
Experience with batch job scheduling, shared compute fleets, or build systems
Track record of incremental modernization (tests, shadow runs, canaries, rollback plans)
Experience partitioning/optimizing metadata-heavy systems and reducing I/O or R/W hot spots
Strong incident/debug tactics: root-cause analysis, remediation, guardrails, and rapid comprehension/ownership of unfamiliar codebases (including LLM-generated code)

Compensation & Benefits

Base salary ranges (determined by location, experience, and comparable roles):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and benefits (link provided in original posting)

Other details

Applications for this job will be accepted at least until February 19, 2026.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer committed to diversity and non-discrimination.