Seeking a US Military Veteran, National Guardsman, Reservist and/or Military Spouse to work as a Site Reliability Engineer. Take over as a Cassandra expert within a team to review, optimize, and improve documentation/processes. Ensure optimal performance and up-time of our portal services and infrastructure.
Responsibilities: Veteran Military
Requirements: Veteran Military
- Ensure optimal end-to-end performance and up-time of our portal services and infrastructure.
- Manage existing Cassandra database environment, guide operational upgrades and advancements in the Cassandra system, extend documentation and train team members in Cassandra administration.
- Cassandra administration, operations, and architecture for a multi data center environment.
- Implement proactive monitoring, alerting, trend analysis and self-healing systems
- Participate in incident resolution processes driving restoration and repair of service-impacting issues
- Instrument existing code and/or write performance-dedicated applications to enable fine-grain tracking of speed bottlenecks.
- Graphically report, in near real-time, Luna Control Center performance as perceived by our customers
- Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
- Train/teach a couple of team members to take long term support/knowledge.
- US Veteran, National Guardsman, Reservist, or Military Spouse
- Bachelor's degree in Computer Science or equivalent.
- 3 years Cassandra administrative expertise in a multi data center architecture
- 3+ years as a SRE, Operations, or system administration of customer-facing, high-availability, large scale web-based applications
- Expertise in administering and supporting Apache Cassandra database systems required
- Fluent in systems programming and/or automation, and leverage their experience to solve complex problems associated with running production environments at massive scale in multi-tenant environments.
- Prior successful experience as a systems performance or site reliability engineer
- Mastery of Linux/Unix, PHP, Perl or Python Programming.
- Administrative Experience with installs, configures, troubleshoots, monitors, maintains of Linux infrastructure.
- Experience in writing SQL and PL/SQL procedures. Orchestration Tools like Ansible
- Experience with one of the log analysis tools like Splunk or ELK Products (ElasticSearch,Logstash, Kibana)
- Experience with monitoring tools like Sensu, Collectd, and Grafana
- Desire to work in a fast paced and dynamic environment & a passion for performance excellence