About the internship

About the Company

Hubnine India Private Limited is a software company focused on using modern AI and machine learning approaches to create better outcomes for clients.

About the Internship

This internship is for an ML Systems & Inference Engineer who will help build the core technical layer of a platform designed for Physical AI and advanced multimodal workloads. The work centers on model serving, GPU-oriented systems, performance tuning, and cloud infrastructure.

The position is not meant for a generic backend profile or a pure research profile. The selected intern should be comfortable reading model code, profiling execution, finding bottlenecks, implementing optimizations, validating the impact, and delivering dependable infrastructure solutions.

What You’ll Work On

Designing and improving cloud inference workflows for Physical AI, multimodal, generative, simulation, and world-model use cases.
Raising performance across startup time, queue latency, end-to-end latency, throughput, GPU utilization, reliability, and cost per output or job.
Building execution-layer components such as model packaging, warm pools, artifact and model caching, batching, queueing, scheduling, and model-aware execution rules.
Using methods like dynamic batching, quantized model variants, torch.compile, TensorRT, ONNX Runtime, caching, routing, and distributed execution to improve runtime efficiency.
Finding bottlenecks in GPU compute, memory bandwidth, CPU preprocessing, I/O, model loading, serialization, queueing, and serving overhead.
Creating benchmarking and evaluation tools to track latency, throughput, startup time, memory use, GPU utilization, cost, reliability, and workload quality.
Turning runtime telemetry into product features such as performance reports, cost visibility, configuration suggestions, and workload comparisons.
Developing reusable model adapters, container images, and serving templates to speed up onboarding for new model families.
Working with founders, design partners, and early customers to convert real use cases into product requirements and technical plans.
Partnering with GPU performance experts to assess and confirm runtime and kernel-level optimizations.
Helping shape engineering standards, technical hiring criteria, and the architecture of the core platform.

Eligibility

Applicants must be available for a work-from-home internship, able to begin between 29 June 2026 and 3 August 2026, willing to commit for 6 months, and have relevant skills and interest in the role.

Other Requirements

Prior exposure to serving or optimizing multimodal workloads such as video, image generation, 3D, VLM, VLA, world models, simulations, synthetic data, gaming, or similar AI applications.
Experience working with GPU cloud environments such as AWS, GCP, Azure, CoreWeave, Lambda, RunPod, Modal, or similar platforms.
Background in multi-GPU inference, cluster scheduling, or multi-tenant serving systems.
Familiarity with CUDA C++, Triton kernels, CUTLASS, PTX, or GPU performance optimization tools.
Knowledge of compiler and runtime stacks like TVM, MLIR, XLA, TorchInductor, or TensorRT.
Strong exposure to ML systems, inference serving, GPU-backed deployment, and performance optimization.
Deep PyTorch knowledge, including the ability to take research code and turn it into production-ready systems.
Experience running and maintaining large AI models in cloud GPU environments.
Comfort using profiling tools such as PyTorch Profiler and NVIDIA Nsight.
Hands-on work with inference frameworks such as TensorRT, Triton, ONNX Runtime, Ray Serve, vLLM, SGLang, Modal, or BentoML.
Understanding of batching, scheduling, quantization, compilation, distributed inference, Docker, cloud GPUs, and backend engineering.
Strong problem-solving ability, good engineering judgment, and readiness to operate in a fast-moving startup setting.

Perks

Certificate
Flexible work hours
5-day work week

Internship Details

This is a 6-month work-from-home internship with an immediate start window. The stipend is INR 10,000 to 25,000 per month, and there are 3 openings. The application deadline is 29 July 2026 at 11:59 PM.

Skills Mentioned

Python, algorithms, data structures, machine learning, artificial intelligence, APIs, PyTorch, MLOps, AWS Lambda, LLMOps, and Python libraries.

AI/ML Inference Kernel Engineer Intern

About the internship

About the Company

About the Internship

What You’ll Work On

Eligibility

Other Requirements

Perks

Internship Details

Skills Mentioned

Perks

Skills