Senior GPU Cluster Developer

1 month ago


Shanghai, Shanghai, China NVIDIA Full time

We are seeking a highly skilled Senior GPU Cluster Software Engineer to join our team at NVIDIA. This is a unique opportunity to work on large-scale distributed systems infrastructure with monitoring, logging, visualization, and alerting capabilities.

About the Role

As a key member of our System Software team, you will be responsible for building profiling solutions for real-world applications running on GPU compute clusters. Your primary goal will be to improve the user experience for customers and engineers supporting the cluster.

Responsibilities
  • Work in an agile and fast-paced global environment to gather requirements, architect, design, implement, test, deploy, release, and support large-scale distributed systems infrastructure with promised uptime.
  • Build internal profiling tools for real-world ML/DL applications running on HPC GPU clusters for failure and efficiency analysis.
  • Understand state-of-the-art improvements in the ML/DL domain and work with various application owners and research teams to add/improve profiling needs for current and potential future supported features.
Requirements

To succeed in this role, you will need:

  • Bachelor's degree in Computer Science or related field (or equivalent experience) and 5+ years of software development experience in Python.
  • Experience with Gitlab (or another source code management tool) branch/release, CI/CD pipeline, etc.
  • Solid understanding of algorithms, data structures, and runtime/space complexity.
  • Experience working with distributed system software architecture.
  • Basic understanding of HPC GPU cluster, Slurm.
  • Basic understanding of Machine learning concepts and terminologies.
  • Background with databases - SQL and NoSQL (Prometheus, Elasticsearch, OpenSearch, Redis, etc.).
  • Experience with distributed Data Pipeline, Telemetry, Visualizations (Kibana, Grafana, etc.), Alerting (PagerDuty, etc.).
Estimated Salary

The estimated annual salary for this role is around $150,000 - $200,000, depending on your level of experience and qualifications.



  • Shanghai, Shanghai, China NVIDIA Full time

    As a member of the System Software team at NVIDIA, you will be responsible for building and optimizing large-scale distributed systems infrastructure with monitoring, logging, visualization, and alerting capabilities. Your focus will be on creating profiling solutions for real-world applications running on GPU compute clusters to improve efficiency and user...


  • Shanghai, Shanghai, China Optiver Full time

    About the Role:Optiver is a global market maker with a presence in multiple continents, and our Shanghai office is a rapidly growing participant in the Chinese markets. We are seeking a highly skilled Senior Machine Learning Platform Engineer to join our team and help shape the future of our company.Key Responsibilities:Design and develop the infrastructure...


  • Shanghai, Shanghai, China Bosch Full time

    Job Overview We are seeking a highly skilled Software Development Engineer to join our team at Bosch, focusing on the development of automotive instrument clusters. This role offers an exciting opportunity to work on cutting-edge technologies and collaborate with cross-functional teams. Salary The estimated annual salary for this position is $120,000 -...


  • Shanghai, Shanghai, China Bosch Group Full time

    We are seeking a highly skilled Automotive Cluster Software Expert to join our team at Bosch Group. As a key member of our software development team, you will play a crucial role in designing and developing cutting-edge automotive instrument clusters.Job SummaryThis is an exciting opportunity for an experienced software engineer to lead the development of...


  • Shanghai, Shanghai, China NVIDIA Full time

    About NVIDIANVIDIA is a leader in the technology industry, renowned for its innovative and cutting-edge products.Job OverviewWe are seeking a skilled GPU Graphics Performance Architect to join our team. The successful candidate will be responsible for investigating and studying state-of-the-art real-time rendering techniques and their implementation on GPU,...


  • Shanghai, Shanghai, China Optiver Full time

    Company OverviewOptiver is a global market maker with offices around the world, united in its commitment to improving the market through competitive pricing, execution, and risk management. By providing liquidity on multiple exchanges across the globe, Optiver participates in safeguarding healthy and efficient markets.SalaryThe estimated salary for this...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA is a global leader in the technology industry, renowned for its innovative and high-performance graphics solutions. As a GPU Graphics Performance Architect, you will be part of a dynamic team that drives the development of cutting-edge graphics architecture.What You Will Do:Investigate and study state-of-the-art real-time rendering techniques to...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA - High Performance GPU Architectural EngineerWe are seeking a skilled GPU C++ Modeling Engineer to join our team. As a key member of our organization, you will play a crucial role in designing and developing high-performance GPU architectures.About the Role:You will investigate and propose innovative architecture ideas based on thorough quantitative...


  • Shanghai, Shanghai, China NVIDIA Full time

    Job Title: GPU Graphics Performance ArchitectAbout the Role:As a member of the graphics performance team at NVIDIA, you will contribute to the development of efficient and powerful graphics architectures. Your work will involve studying graphics workloads, testing innovative hardware and software solutions, and identifying areas for improvement. The goal is...


  • Shanghai, Shanghai, China NVIDIA Full time

    Graphics Performance TeamNVIDIA's Graphics Performance Team is responsible for delivering efficient and powerful graphics architecture every generation. The team studies graphics workloads and tests innovative HW/SW solutions on various platforms to address inefficiencies in the current architecture.Our work paves the path for real-time rendering of complex...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA, a leader in the technology world, is seeking a highly skilled and innovative GPU Graphics Performance Architect Intern. As a member of our team, you will play a crucial role in delivering cutting-edge graphics architectures that set new standards for efficiency and performance.We're looking for a talented individual with a strong background in...


  • Shanghai, Shanghai, China Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 Full time

    **Job Title:** Senior Graphics Architecture EngineerAbout the Role:We are seeking an experienced Senior Graphics Architecture Engineer to join our team at Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93. As a key member of our graphics software development team, you will be responsible for designing and implementing high-performance...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA is now looking for an exceptional individual to join its Compute Developer Technology team as a Deep Learning Expert Intern. This role offers the opportunity to work on cutting-edge techniques in deep learning, graphs, machine learning, and data analytics.About NVIDIA:As a pioneer in the field of AI computing, NVIDIA has established itself as a leader...


  • Shanghai, Shanghai, China Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 Full time

    Job SummaryAs a key member of the Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 team, you will play a pivotal role in designing, implementing, and optimizing multimedia functionalities for embedded systems. Your expertise in Linux BSP development and multimedia integration will drive the success of our projects.Key...


  • Shanghai, Shanghai, China NVIDIA Full time

    About the Role:We are seeking a Power Methodology and Analysis engineer to join our team at NVIDIA. Our company prides itself on having energy-efficient products, and we believe that maintaining this advantage over competition is key to our continued success.Our team is responsible for researching, developing, and deploying methodologies to help NVIDIA's...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA's success lies in its cutting-edge analysis tools, empowering engineers to optimize performance and power efficiency. We seek innovative individuals to join our software team, characterized by high standards and multifaceted challenges.This software engineering role involves developing analysis tools for various OS and hardware combinations, from...


  • Shanghai, Shanghai, China NVIDIA Full time

    We are seeking a skilled Deep Learning Performance Software Engineer to expand our research and development in Inference. This role involves developing highly optimized deep learning kernels for inference, working with cross-collaborative teams, and occasionally traveling to conferences and customers.As a Deep Learning Performance Software Engineer at...


  • Shanghai, Shanghai, China NVIDIA Full time

    NVIDIA is a world-leading innovator in GPU computing. Our mission is to fuel the advancements in gaming, automotive, professional visualization, HPC, datacenters, and networking.We are seeking an experienced Senior Software QA Test Development Engineer to join our team. In this role, you will collaborate with multi-functional groups to design, develop, and...


  • Shanghai, Shanghai, China NVIDIA Full time

    Transform AI Training PerformanceNVIDIA is seeking senior engineers who excel at performance analysis and optimization to drive AI training efficiency. If you're passionate about squeezing every last clock cycle out of AI training, we want to hear from you. This role offers the opportunity to directly impact the hardware and software roadmap in a...


  • Shanghai, Shanghai, China NVIDIA Full time

    Job Title:Senior Custom SOC IP Verification EngineerAbout the Role:NVIDIA seeks a seasoned Senior IP Verification Specialist to drive the verification of cutting-edge SoC and IP solutions. As part of our team, you will contribute to delivering innovative products that transform lives.You will be responsible for ASIC design verification for various IPs at...