Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Grafana @ 4
TypeScript @ 6
Automated Testing @ 4
Python @ 6
GitHub @ 4
CI/CD @ 4
Networking @ 7
Jira @ 4
API @ 4
Reporting @ 4
Splunk @ 4
.NET @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are the Disaster Recovery as a Service engineering team, responsible for end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios across many services. The team builds and maintains tools, monitors, frameworks, interfaces, protocols, and solutions that form automated and self-healing systems used to manage services provided by Platform Services. Work includes improving uptime, provisioning and balancing resources, defining operational procedures, administering backup and recovery processes, managing replication, and overseeing workflows.
Responsibilities
- Design, develop, test, and maintain in-house tooling to automate disaster-recovery testing for clusters and managed services across datacenters and nodesites.
- Implement scalable, reliable, and self-driven systems with metrics and transparency for internal and external consumers.
- Own product/system lifecycle: system tuning, performance analysis, availability targets (SLAs/SLOs/SLIs), and operations procedures.
- Integrate services into the Bloomberg operational environment and products, collaborating with teams that design Bloomberg-specific components, APIs, and runtime environments.
- Build monitors, alarms, and dashboards to report on system performance, status, and stability.
- Ensure tooling has end-to-end unit testing and continuous integration to provide high stability.
Requirements
- 4+ years of experience in Python and/or TypeScript.
- Degree in Computer Science, Engineering, or a similar field, or equivalent work experience.
- 5+ years of experience with Unix, Unix tools, and shell scripting.
- Experience designing stable, long-lasting APIs.
- Deep understanding of TCP/IP networking and the OSI model.
- Experience designing and automating repeatable processes in a client/server environment.
- Ability to build and maintain highly available, performant, scalable, and critical systems.
- Experience building monitors and alarms for system performance and stability.
- Experience with CI/CD systems and writing robust unit and system tests.
Nice to Have
- Basic knowledge of the Rapid framework.
- Experience analyzing existing systems and driving measurable improvements.
- Experience with Chaos Engineering.
- Experience with Splunk or Humio, and Grafana or other metric/reporting tools.
- Experience with GitHub and JIRA.
- Passion for product ownership, stability, and security.
Benefits
- Salary Range: 160000 - 240000 USD annually, plus benefits and bonus.
- Comprehensive benefits may include merit increases, incentive compensation (exempt roles), paid holidays, paid time off, medical, dental, vision, short and long term disability, 401(k) with match, life insurance, and various wellness programs. (Benefits may vary for contingent workers/contractors and interns.)
Additional Information
- Team focus: disaster recovery tooling, automated testing, datacenter operations, replication and backup, and operational transparency.
- Apply: https://bloomberg.avature.net/careers/Login?jobId=13664&source=&tags=&user=&formValues=