H
Internship Work from home

AI/ML Inference Kernel Engineer Intern

Hubnine India Private Limited

Remote · Full Time internship

Be the first to apply

Stipend
Stipend: INR 10,000 – INR 25,000 / month
Duration
6 months
Start
Immediately
Openings
3
Who can apply

Available for a remote internship, able to start between 29 June 2026 and 3 August 2026, available for a 6-month duration, and possessing relevant skills and interest.

Work mode
Work from home
Resume
Required to apply

About the internship

About the Company

Hubnine India Private Limited is a software company focused on using modern AI and machine learning approaches to create better outcomes for clients.

About the Internship

This internship is for an ML Systems & Inference Engineer who will help build the core technical layer of a platform designed for Physical AI and advanced multimodal workloads. The work centers on model serving, GPU-oriented systems, performance tuning, and cloud infrastructure.

The position is not meant for a generic backend profile or a pure research profile. The selected intern should be comfortable reading model code, profiling execution, finding bottlenecks, implementing optimizations, validating the impact, and delivering dependable infrastructure solutions.

What You’ll Work On

  • Designing and improving cloud inference workflows for Physical AI, multimodal, generative, simulation, and world-model use cases.
  • Raising performance across startup time, queue latency, end-to-end latency, throughput, GPU utilization, reliability, and cost per output or job.
  • Building execution-layer components such as model packaging, warm pools, artifact and model caching, batching, queueing, scheduling, and model-aware execution rules.
  • Using methods like dynamic batching, quantized model variants, torch.compile, TensorRT, ONNX Runtime, caching, routing, and distributed execution to improve runtime efficiency.
  • Finding bottlenecks in GPU compute, memory bandwidth, CPU preprocessing, I/O, model loading, serialization, queueing, and serving overhead.
  • Creating benchmarking and evaluation tools to track latency, throughput, startup time, memory use, GPU utilization, cost, reliability, and workload quality.
  • Turning runtime telemetry into product features such as performance reports, cost visibility, configuration suggestions, and workload comparisons.
  • Developing reusable model adapters, container images, and serving templates to speed up onboarding for new model families.
  • Working with founders, design partners, and early customers to convert real use cases into product requirements and technical plans.
  • Partnering with GPU performance experts to assess and confirm runtime and kernel-level optimizations.
  • Helping shape engineering standards, technical hiring criteria, and the architecture of the core platform.

Eligibility

Applicants must be available for a work-from-home internship, able to begin between 29 June 2026 and 3 August 2026, willing to commit for 6 months, and have relevant skills and interest in the role.

Other Requirements

  • Prior exposure to serving or optimizing multimodal workloads such as video, image generation, 3D, VLM, VLA, world models, simulations, synthetic data, gaming, or similar AI applications.
  • Experience working with GPU cloud environments such as AWS, GCP, Azure, CoreWeave, Lambda, RunPod, Modal, or similar platforms.
  • Background in multi-GPU inference, cluster scheduling, or multi-tenant serving systems.
  • Familiarity with CUDA C++, Triton kernels, CUTLASS, PTX, or GPU performance optimization tools.
  • Knowledge of compiler and runtime stacks like TVM, MLIR, XLA, TorchInductor, or TensorRT.
  • Strong exposure to ML systems, inference serving, GPU-backed deployment, and performance optimization.
  • Deep PyTorch knowledge, including the ability to take research code and turn it into production-ready systems.
  • Experience running and maintaining large AI models in cloud GPU environments.
  • Comfort using profiling tools such as PyTorch Profiler and NVIDIA Nsight.
  • Hands-on work with inference frameworks such as TensorRT, Triton, ONNX Runtime, Ray Serve, vLLM, SGLang, Modal, or BentoML.
  • Understanding of batching, scheduling, quantization, compilation, distributed inference, Docker, cloud GPUs, and backend engineering.
  • Strong problem-solving ability, good engineering judgment, and readiness to operate in a fast-moving startup setting.

Perks

  • Certificate
  • Flexible work hours
  • 5-day work week

Internship Details

This is a 6-month work-from-home internship with an immediate start window. The stipend is INR 10,000 to 25,000 per month, and there are 3 openings. The application deadline is 29 July 2026 at 11:59 PM.

Skills Mentioned

Python, algorithms, data structures, machine learning, artificial intelligence, APIs, PyTorch, MLOps, AWS Lambda, LLMOps, and Python libraries.

Perks

Certificate Flexible hours 5-day week

Leave it if you'd like a reply — we won't use it for anything else.

Click to browse, drag & drop, or paste a screenshot

PNG, JPG, GIF, MP4, WebM, MOV · Max 20MB each · Up to 5 files