Site Reliability Engineer

July 23

Apply Now
Logo of Syndica

Syndica

Solana • Infrastructure • RPC infrastructure • Blockchain

2 - 10

Description

• Administer overall site availability, security, latency, and system health. • Effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructure. • Develop comprehensive monitoring solutions to provide full visibility to the different system components using tools like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, etc. • Enable the development team to release code quickly and reliably by ensuring full observability of systems and automated detection of performance and integration issues. • Formulate technical performance measures and implement them using queries, logs, code instrumentation and other analytics tools. • Design dashboards and visualizations that effectively convey technical measures • Troubleshoot issues at multiple layers of deployment, from hardware, to operating environment, network, and application to conduct root cause analysis and make recommendations from your findings. • Work with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the start. • Forecast changes in demand and capacity to establish appropriate scalability plans and drive decisions on the right-sizing of servers, storage and other resources. • Design and perform high-throughput stress testing to determine system capacity limits and identify points of failure. • Troubleshoot critical customer issues related to Syndica’s RPC, APIs, and App Deployments.

Requirements

• Great collaborator with 5+ years of experience in a DevOps or SRE role • Proficiency in scripting languages (Python, Shell) and experience with at least one modern programming language (Go, Rust, Typescript, etc.) • Experience deploying large-scale systems reliably • Experience using Kubernetes • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc) • Working knowledge of information security issues • Experience writing automation tools & eagerness to 'automate all the things' • Commitment to implementing reliability and security best practices • Capacity planning experience, including resource optimization and load testing • Systematic problem-solving approach, combined with a strong sense of ownership and drive • Experience with Prometheus/Grafana for metrics aggregation/visualization and other monitoring and alerting tools • Experience with infrastructure-as-code tools such as Terraform, Ansible, Chef • Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code • Knowledge of one or more load testing tools (K6, Locust, JMeter, etc.) • Experience with configuration of CI/CD pipelines

Apply Now

Similar Jobs

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@remoterocketship.com