Senior Software Engineer - Core Communications Reliability

at Bloomberg

📍 New York City, United States

USD 160,000-240,000 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Grafana @ 4 Kafka @ 3 Python @ 6 Spark @ 4 Java @ 3 Distributed Systems @ 4 Communication @ 7 SRE @ 4 Observability @ 7

Details

Bloomberg’s Core Communications platforms power real-time messaging across the global financial industry (e.g., Instant Bloomberg and MSG), handling billions of messages daily. This role focuses on improving the reliability of these large-scale, low-latency distributed systems by building tools, improving system design, and ensuring predictable behavior under load and failure.

Responsibilities

Work on large-scale distributed systems with high availability and low latency requirements.
Build tools and automation to improve how distributed systems are operated and debugged.
Define and implement service level objectives (SLOs) that reflect real user impact.
Identify and continuously assess reliability risks across services, infrastructure, and workflows; help teams prioritize work based on real impact.
Improve development and deployment workflows to drive more consistent and reliable paths to production.
Reduce time to recovery and triage effort by improving diagnostics, alerting, and system-level visibility.
Design and validate failure scenarios and resilience testing practices to ensure predictable behavior under stress.
Collaborate closely with software engineers and product teams to influence how systems are designed, built, and operated.

Requirements

4+ years of experience in software engineering.
Proficiency in Python.
Experience working with distributed systems.
Strong understanding of system reliability, observability, and performance.
Familiarity with SLOs, SLIs, and SLAs and how to relate system performance back to client impact.
Strong collaboration and communication skills.
A degree in Computer Science, Engineering, or equivalent practical experience.

Preferred Qualifications

Experience with monitoring or tracing tools such as Grafana, Humio, or distributed tracing systems.
Familiarity with Kafka, Java, or large-scale data systems.
Experience with chaos engineering, failure injection, or resilience testing frameworks.
Exposure to capacity planning and scaling analysis.
Contributions to open source or involvement in SRE communities.
Experience with big data technologies like Apache Spark and Amazon S3.

Why this role

Work on systems operating at very high scale (billions of messages processed daily).
Tackle complex distributed systems challenges involving latency, consistency, and failure handling.
Build tooling and frameworks used across multiple teams.
Have direct impact on systems relied upon by the global financial industry.

Benefits

The company offers a comprehensive benefits plan that may include merit increases, incentive compensation (exempt roles only), paid holidays, paid time off, medical, dental, vision, short and long term disability, 401(k) with match, life insurance, and wellness programs. Note: benefits are not provided directly to contingent workers/contractors and interns.

Compensation

Salary range: 160000 - 240000 USD annually, plus benefits and bonus.