Senior GPU Cluster Software Engineer
20 hours ago
We are seeking a highly skilled Senior GPU Cluster Software Engineer to join our System Software team at NVIDIA. As a member of this team, you will be responsible for designing, developing, and deploying large-scale distributed systems infrastructure with monitoring, logging, visualization, and alerting capabilities.
Key Responsibilities:- Design and implement large-scale distributed systems infrastructure with monitoring, logging, visualization, and alerting capabilities.
- Develop internal profiling tools for real-world ML/DL applications running on HPC GPU clusters for failure and efficiency analysis.
- Collaborate with various application owners and research teams to add/improve profiling needs for current and potential future supported features.
- BS+ in Computer Science or related field (or equivalent experience) and 5+ years of software development experience in Python.
- Experience with Gitlab (or another source code management) branch/release, CI/CD pipeline, etc.
- Solid understanding of algorithms, data structures, and runtime/space complexity.
- Experience working with distributed system software architecture.
- Basic understanding of HPC GPU cluster, Slurm.
- Basic understanding of Machine learning concepts and terminologies.
- Background with databases - SQL and NoSQL (Prometheus, Elasticsearch, OpenSearch, Redis, etc.).
- Experience with distributed Data Pipeline, Telemetry, Visualizations (Kibana, Grafana, etc.), Alerting (PagerDuty, etc.).
- Experience debugging functional and performance issues in HPC GPU clusters.
- Background in running and instrumenting distributed LLM training on a multi-GPU HPC cluster.
- Knowledge of LLM training features and libraries - Checkpointing, Parallelism, PyTorch, Megatron-LM, NCCL.
- Experience with HPC schedulers such as Slurm.
- Background with OpenTelemetry.
-
Software Engineer
1 week ago
Shanghai, Shanghai, China Qualcomm Full timeJob Title: Software Engineer - GPUQualcomm is seeking a talented Software Engineer to join our GPU Software Engineering team. As a key member of our team, you will design and develop new features, debug issues, optimize software for performance and power, and work with our partners and OEMs.Responsibilities:Design and develop new features for our GPU...
-
Software Engineer
2 weeks ago
Shanghai, Shanghai, China Qualcomm Full timeJob Title: Software Engineer - GPUQualcomm is seeking a talented Software Engineer to join our GPU Software Engineering team. As a key member of our team, you will design and develop new features, debug issues, optimize software for performance and power, and work with our partners and OEMs.Responsibilities:Design and develop new features for our GPU...
-
Software Engineer
1 week ago
Shanghai, Shanghai, China Qualcomm Full timeJob Title: Software Engineer - GPUQualcomm is seeking a talented Software Engineer to join our GPU Software Engineering team. As a key member of our team, you will design and develop new features, debug issues, optimize software for performance and power, and work with our partners and OEMs.Responsibilities:Design and develop new features for our GPU...
-
Software Engineer
6 days ago
Shanghai, Shanghai, China Qualcomm Full timeJob Title: Software Engineer - GPUQualcomm is seeking a talented Software Engineer to join our GPU Software Engineering team. As a key member of our team, you will design and develop new features, debug issues, optimize software for performance and power, and work with our partners and OEMs.Responsibilities:Design and develop new features for our GPU...
-
GPU Software Engineer for AI Solutions
3 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeAre you passionate about developing high-performance software?We are seeking dedicated software developers to contribute to the design, development, and deployment of cuDNN: our GPU-accelerated library tailored for deep learning frameworks. The landscape of artificial intelligence is rapidly evolving, and we are at the forefront of this transformation. If...
-
Senior Product Manager
2 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeSenior Product Manager - Datacenter GPUWe are seeking a seasoned product leader to join our Data Center Product Management team. As a Senior Product Manager, you will be responsible for defining and marketing data center GPUs for enterprises and cloud service providers. Your expertise will help drive the growth of our GPU products, which have been used to...
-
Senior Software Development Engineer
1 week ago
Shanghai, Shanghai, China Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 Full timeJob Title: Senior Software Development EngineerThis role is responsible for designing and implementing graphics software on embedded systems, including GPU middleware, drivers, and virtualization.Key Responsibilities:Develop new features for graphics and display system engines to extend existing internal frameworks, particularly for automotive...
-
Senior Software Development Engineer
3 weeks ago
Shanghai, Shanghai, China Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 Full timeAbout the RoleThis is a challenging and rewarding opportunity to join the Amazon Innovation Center (Shenzhen) Company Limited Shanghai Branch - O93 team as a Senior Software Development Engineer - Graphics Software Expert. As a key member of our team, you will be responsible for designing and implementing advanced graphics software systems for embedded...
-
Senior Machine Learning Infrastructure Engineer
20 hours ago
Shanghai, Shanghai, China Optiver Full timeAbout Us:Optiver is a global market maker with a presence in multiple continents. Founded in 1986, we are a leading liquidity provider with a strong commitment to improving the market through competitive pricing, execution, and risk management.We provide liquidity to financial markets using our own capital, at our own risk, trading a wide range of products....
-
Senior Machine Learning Infrastructure Engineer
3 weeks ago
Shanghai, Shanghai, China Optiver Full timeAbout UsOptiver is a leading global market maker with a presence in multiple continents. Founded in 1986, we have grown to become a prominent liquidity provider, with a team of over 2,000 employees worldwide. Our mission is to improve the market through competitive pricing, execution, and risk management.Our Shanghai OfficeSince its establishment in 2012,...
-
Senior Software Quality Assurance Engineer
3 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeWe are seeking a highly skilled Senior Software Quality Assurance Engineer to join NVIDIA's Deep Learning Software Quality Assurance team.This team is responsible for defining, developing, and performing tests to validate the robustness and performance of NVIDIA's Deep Learning software and GPU infrastructure for various AI scenarios. The ideal candidate...
-
Senior Software Engineer
20 hours ago
Shanghai, Shanghai, China NVIDIA Full timeJob Title: Senior Software EngineerNVIDIA is seeking a highly skilled Senior Software Engineer to join its team and contribute to the development of its world-class AI Infrastructure and leading-edge software on NVIDIA's high-performance DRIVE platform for Autonomous Vehicles.Job Summary:This is a collaborative work with AV perception team, AV production...
-
Senior HPC Engineer
2 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeAbout NVIDIANVIDIA is a pioneer in the field of computer graphics, PC gaming, and accelerated computing. With a legacy of innovation spanning over 25 years, we're now harnessing the power of AI to redefine the future of computing. Our GPUs serve as the brains of computers, robots, and self-driving cars that can perceive and understand the world. To achieve...
-
Senior Software Quality Assurance Engineer
1 week ago
Shanghai, Shanghai, China NVIDIA Full timeWe are seeking a Senior Software Test Development Engineer to join NVIDIA's Deep Learning SWQA team.This role is part of NVIDIA's Deep Learning Software Quality Assurance team, which defines, develops, and performs tests to validate robustness and measure the performance of NVIDIA's Deep Learning software and GPU Infrastructure for various AI scenarios. The...
-
Senior Computer Vision Software Engineer
3 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeAbout the RoleNVIDIA is seeking a highly skilled Senior Computer Vision Software Engineer to join its team and contribute to the development of its world-class AI Infrastructure and leading-edge software on NVIDIA's high-performance DRIVE platform for Autonomous Vehicles.Key ResponsibilitiesCollaborate with the AV perception team, AV production team, and AI...
-
Senior Computer Vision Software Engineer
20 hours ago
Shanghai, Shanghai, China NVIDIA Full timeJob Title: Senior Computer Vision Software EngineerNVIDIA is seeking a highly skilled Senior Computer Vision Software Engineer to join its team and contribute to the development of its world-class AI Infrastructure and leading-edge software on NVIDIA's high-performance DRIVE platform for Autonomous Vehicles.Key Responsibilities:Collaborate with the AV...
-
Senior Software Engineer
3 weeks ago
Shanghai, Shanghai, China NVIDIA Full timeAbout the RoleNVIDIA is seeking a highly skilled Senior Software Engineer to join its team and contribute to the development of its world-class AI Infrastructure and leading-edge software on NVIDIA's high-performance DRIVE platform for Autonomous Vehicles.Key ResponsibilitiesCollaborate with the AV perception team, AV production team, and AI infrastructure...
-
Shanghai, Shanghai, China NVIDIA Full timeNVIDIA is a leader in GPU Computing, driving innovation in gaming, automotive, professional vision, HPC, datacenters, and networking. We're passionate about harnessing the power of AI to transform industries and improve lives. As a Senior Software QA Test Development Engineer, you'll play a critical role in ensuring the quality of our products, collaborating...
-
Shanghai, Shanghai, China NVIDIA Full timeNVIDIA is a leading technology company in the field of GPU Computing. We are passionate about innovation in various markets, including gaming, automotive, professional vision, HPC, datacenters, and networking. Our company is also at the forefront of AI Computing, and our GPUs are the driving force behind modern Deep Learning software frameworks, accelerated...
-
Senior HPC Systems Engineer
20 hours ago
Shanghai, Shanghai, China NVIDIA Full timeAbout NVIDIANVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. With a legacy of innovation spanning over 25 years, we're now harnessing the power of AI to redefine the future of computing.Job SummaryWe're seeking a highly skilled Senior HPC Engineer to join our Professional Services team. As a key member of our team,...