Senior Software Engineer - AI Inference

at Bloomberg

📍 New York City, United States

USD 160,000-240,000 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 4 Distributed Systems @ 4 Machine Learning @ 4 Performance Optimization @ 4 Debugging @ 4 PyTorch @ 3 CUDA @ 3 GPU @ 3 Observability @ 4 AI @ 4 InfiniBand @ 4 vLLM @ 4 NCCL @ 3 TensorRT @ 4 NVLink @ 4

Details

Join the team that is building the core infrastructure for AI at Bloomberg. The Bloomberg AI Inference Platform provides production-grade managed infrastructure for hosting, deploying, and serving all machine learning models, both predictive and cutting-edge generative models. The platform abstracts away infrastructure complexity, empowering engineering teams to focus on creating intelligent applications with guaranteed scalability, performance, and governance. The platform is built on the open-source KServe project, and the CNCS AI Inference team is a primary contributor to its development.

Responsibilities

Design and build scalable infrastructure for both online and offline inference workloads.
Lead integration of high-performance inference runtimes and serving frameworks, including TensorRT, vLLM, ONNX, and Triton.
Drive architecture and technical decisions across Bloomberg’s inference platform, balancing latency, throughput, reliability, and cost.
Partner across engineering teams to improve model deployment, observability, and production performance.
Mentor junior engineers on system design, debugging, and performance optimization.

Requirements

Required

5+ years of professional software engineering experience.
Experience designing, building, and operating production distributed systems.
Strong systems intuition and a track record of debugging and optimizing performance-critical services.
Ability to own problems end-to-end and quickly ramp up in unfamiliar technical areas.
4+ years of demonstrated experience working with an object-oriented programming language.
A degree in Computer Science, Electrical Engineering, or equivalent practical experience.

Preferred / Nice to have

Experience deploying and operating machine learning systems at scale.
Experience with inference optimization techniques such as batching, caching, request scheduling, or memory-aware serving.
Familiarity with PyTorch and GPU software stacks such as CUDA and NCCL.
Exposure to high-performance interconnects and distributed computing technologies such as NVLink, InfiniBand, or MPI.
Experience with Kubernetes and cloud-native infrastructure.
Experience with load balancing, request routing, or traffic management systems.

Representative projects

Autoscaling a heterogeneous compute fleet to match supply and demand across diverse inference workloads.
Building production-grade deployment pipelines to safely roll out new models to millions of users.
Developing new inference capabilities such as structured sampling, prompt caching, and advanced serving optimizations.
Analyzing observability data from real production workloads to improve latency, throughput, and resource efficiency.

Compensation & Benefits

Salary Range: 160000 - 240000 USD Annually + Benefits + Bonus
Benefits may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) + match, life insurance, and various wellness programs. (The company does not provide benefits directly to contingent workers/contractors and interns.)