Staff Machine Learning Engineer

6 days ago


Shanghai Beijing Remote China Data Direct Networks Full time $120,000 - $180,000 per year


Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC 

 

"The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments" - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA 

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. 

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. 

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. 



Job Description

Staff Machine Learning Engineer - Infinia AI Performance

We are seeking a talented and experienced Sr ML Engineer to help us optimize training, inference, and Retrieval-Augmented Generation (RAG) pipelines for high-performance AI applications. You will lead the development of connectors to open-source frameworks for data streaming, such as, Mosaic Streaming, Ray Data, and Tf.Data and inference optimizations such as K-V caching and LORAX. You will guide a talented organization of engineers focused on advanced end-to-end data platform for ingestion, transformation, preparation, and streaming on high-performance AI applications. Collaborating closely with software developers, product teams, and partners, you will lead experiments with state-of-the-art models using open-source tools and cloud platforms.

Key Responsibilities:

  • Design and implement integration of data ingestion and streaming pipelines with open-source tools, like Ray Data, Mosaic Streaming, , Torch Dataloader.
  • Design of optimization for training like asynchronous checkpointing, and inference, like K-V caching and LORAX.
  • Guide the integration of MLFlow with DDN's Infinia product for comprehensive experiment tracking, model versioning, and deployment.
  • Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance.
  • Stay abreast of the latest developments in AIOps, AI frameworks, optimization, and accelerated execution.
  • Identify and implement solutions to optimize training and inference pipeline performance, runtime, and resource utilization on Infinia.

Qualifications:

  • Bachelor's or Master's degree in Computer Science, Data Science, Machine Learning, or related fields.
  • 4+ years of experience in machine learning operations (MLOps) or related roles.
  • Proven expertise in building and scaling AI/ML pipelines.
  • Strong understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo, vLLM, TensorRT-LLM).
  • Experience in deploying open-source vector databases at scale.
  • Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.
  • Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code.
  • Excellent problem-solving and troubleshooting skills, with attention to detail and performance optimization.
  • Strong communication and collaboration skills.

Preferred Qualifications:

  • Implementation-level understanding of ML frameworks, data loaders and data formats.
  • Experience with scaling RAG pipelines and integrating them with generative AI models.
  • Experience in operationalizing AI/ML models in production environments.

"Participation in a team on-call rotation providing seven-day week out of hours coverage, including the provision of after-hours and weekend support work when required".




  • Shanghai, Beijing, Remote, China Data Direct Networks Full time CN¥120,000 - CN¥240,000 per year

    OverviewThis is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to...


  • Shanghai, Shanghai, China Coupang Full time CN¥48,000 - CN¥63,000 per year

    Senior Staff/Staff Machine Learning Engineer-LLMBeijing or ShanghaiCompany IntroductionWe exist to wow our customers. We know we're doing the right thing when we hear our customers say, "How did we ever live without Coupang?" Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the...


  • Beijing, Beijing, China Coupand Full time CN¥200,000 - CN¥600,000 per year

    Senior Staff/Staff Machine Learning Engineer-LLMCompany IntroductionWe exist to wow our customers. We know we're doing the right thing when we hear our customers say, "How did we ever live without Coupang?" Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry...


  • Beijing, Beijing, China Liftoff Full time CN¥775,000 - CN¥1,000,000 per year

    Liftoff is a leading AI-powered performance marketing platform for the mobile app economy. Our end-to-end technology stack helps app marketers acquire and retain high-value users, while enabling publishers to maximize revenue across programmatic and direct demand.Liftoff's solutions, including Accelerate, Direct, Monetize, Intelligence, and Vungle Exchange,...


  • Beijing, Beijing, China Conviva Full time CN¥100,000 - CN¥120,000 per year

    Conviva is the intelligence layer for digital businesses, turning every consumer interaction into outcome-based intelligence—linking engagement patterns across AI agents, apps, websites, and streaming video to real results You will delve into a diverse array of ML technologies to drive innovative customer experiences. Collaborating closely with vertical...


  • Beijing, Beijing, China Conviva Full time CN¥120,000 - CN¥240,000 per year

    Conviva is the intelligence layer for digital businesses, turning every consumer interaction into outcome-based intelligence—linking engagement patterns across AI agents, apps, websites, and streaming video to real results such as purchases, bookings, and resolved support requests. Powered by its patented Time-State Technology, the Conviva Operational Data...


  • Shanghai, Shanghai, China Roche Full time CN¥120,000 - CN¥180,000 per year

    At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections,  where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure...


  • Beijing, Beijing, China Grab Full time CN¥120,000 - CN¥240,000 per year

    Company Description About Grab and Our WorkplaceGrab is Southeast Asia's leading superapp. From getting your favourite meals delivered to helping you manage your finances and getting around town hassle-free, we've got your back with everything. In Grab, purpose gives us joy and habits build excellence, while harnessing the power of Technology and AI to...


  • Beijing, Beijing, China Apple Full time

    Apple is seeking highly qualified people for the position of AI/ML Engineer and AI/ML Researcher. The team pursues research & development in the areas of machine learning (ML) with particular focus on deep learning (DL), computer vision (CV), Natural Language Processing (NLP), optimization, and reinforcement learning (RL). As a member of our team, you are...


  • Beijing, Beijing, China Grab Full time

    Company DescriptionAbout Grab and Our WorkplaceGrab is Southeast Asia's leading superapp. From getting your favourite meals delivered to helping you manage your finances and getting around town hassle-free, we've got your back with everything. In Grab, purpose gives us joy and habits build excellence, while harnessing the power of Technology and AI to...