Data Engineer - Graduate 2021/22
Bengaluru /
Technology – Engineering /
Full-time
Job Overview
We
are looking for a savvy Data Engineer to join our growing team of
analytics experts at Upstox. The hire will be responsible for expanding
and optimising our data and data pipeline architecture, as well as
optimising data flow and collection for cross functional teams. The
ideal candidate is an experienced data pipeline builder and data
wrangler who enjoys optimising data systems and building them from the
ground up.
You will be responsible for -
-
Creating complex data processing pipelines, as part of diverse, high
energy teamsDesigning scalable implementations of the models developed
by our Data Scientists.
- Hands-on programming based on TDD, usually in a pair programming environment.
-
Deploying data pipelines in production based on Continuous Delivery
practicesCreate and maintain clear documentation on data models/schemas
as well as transformation/validation rules.
- Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumer.
- Engage with stakeholders to gather requirements to deliver data solutions.
Advising
clients on the usage of different distributed storage and computing
technologies from the plethora of options available in the ecosystem
Ideally, you should have -
- Good understanding on building and deploying large scale data processing pipelines in a production environment
-
Experience building data pipelines and data centric applications using
distributed storage platforms like HDFS, S3, NoSql databases (Hbase,
Cassandra, etc) and distributed processing platforms like Hadoop, Spark,
Hive, Oozie, Airflow, etc in a production settingHands on experience in
MapR, Cloudera, Hortonworks and/or Cloud (AWS EMR, Azure HDInsights,
Qubole etc.) based Hadoop distributions.
- Strong communication and client-facing skills with the ability to work in a consulting environment is essential·
Desired Skills and Experience
- Comfortable working in Linux environment.
- SQL (Expert Level)Hands-on Experience in Distributed Processing platforms such as AWS EMR, MapR, Cloudera
- Distributed storage platforms like HDFS, S3, NoSql databases
No comments:
Post a Comment
If you have any doubts, Please let us know.