Job Description
We belong to the Batch Infrastructure team, situated within the Platform organization in Bangalore. We are responsible for storing and serving exabyte of data for our data lake. Everyday hundreds of PB of data is being served via billions of RPCs. This data powers our pricing, payment, routing etc for our core business. It is also key to our new AI efforts to train the next generation of ML models.
Uber is evolving from an on prem data center model to an industry first multi cloud model. As part of this the entire data stack is getting migrated to the cloud and transforming from a single region model to a multi region model.
Our vision to build an unified storage layer that will support the evolving needs of data lake users in the new cloud world. With such a layer, users across the company will be able to produce data ( bytes to petabytes ) anywhere and from any cloud vendor and access it from anywhere across any other vendor or region. Such a layer has a huge cost efficiency advantage and it can leverage cheaper storage options in the cloud.
We will build new systems and standards ( which we plan to open source ) and lead the industry on a new unique architecture for storage.
What The Candidate Will Need / Bonus Points
—- What the Candidate Will Do —-
You will be sitting at the core of big data infra at Uber, building systems that handle PBs of data. You will have to scale at both micro and macro level. You will be squeezing the last of performance from CPU fitting larger and larger workload with the same RAM, at the same time you will be scaling out on a multi cloud infra to divide and conquer large problems
Basic Qualifications
BE/BS in Computer Science. Strong knowledge of any one programming knowledge. Good understanding of system designs
Preferred Qualifications
Java, Python, Big Data, Distributed Systems