Senior Data Engineer CV Sample

Justin Lauren

Senior Data Engineer

Summary

Senior Data Engineer with 6 and a half+ years of experience in building data intensive applications, tackling challenging architectural and scalability problems, managing data repos for efficient visualization, for a wide range of products.

Highly analytical team player, with the aptitude for prioritization of needs/risks. Constantly striving to streamlining processes and experimenting with optimising and benchmarking solutions. Creative troubleshooter/problem-solver and loves challenges.

Experience in implementing ML Algorithms using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS Sagemaker. Additionally, NLP and Recommender Systems Products(POC).

Experience in shaping and implementing Big Data architecture for Medical Devices,Retail, Banking, Games and Transport Logistics domain (IOT).

Skills

  • Frameworks : Spark [Structured Streaming, MLlib, SQL], Flink, KafkaStreams
  • Databases : PostgreSQL/AWS Aurora, Neo4J/Azure Cosmos Graph, Cassandra, MongoDB/Azure Cosmos [Doc], Redshift, Clickhouse
  • Schedulers/Workflow : Airflow, Luigi, AWS Step Functions, Oozie
  • Visualization: Looker, SQL Analytics, Tableau
  • Programming Languages : Python, Scala, Java
  • Datalakes/Blobs: Azure DataLake, Databricks Delta, S3
  • Cloud Platforms: AWS, Azure, GCP, Databricks
  • Devops : Docker, Kubernetes, Terraform
  • ML Frameworks : Scikit Learn, TensorFlow

Work Experience

Senior Data Engineer

StrongArmTech, NY

Present

  • Creating streaming pipelines to ingest sensor data and process them in real time to populate dashboards and the warehouse
  • Created pipelines for Sensor data published into Kinesis (and S3 for failsafe reprocessing) ingested by a databricks job, written into azure delta tables and Clickhouse (GCP earlier)
  • Worked on Looker and SQL Analytics dashboards for Clickhouse/GCP( benchmarking/production)
  • Built pipelines as a part of a SOLID principled codebase including ad hoc time bound backruns, CDC jobs for metdata entities and the MLLib prod optimised code, in Python.
  • Designed and integrated entities of the product using azure delta and Clickhouse tables, exposed by Python web-service APIs on AWS Lambda.

Senior Data Engineer

Jones LaSalle Lang Technologies(JLL), India

Dec 2020

Property Web Based Product

  • Worked on multiple API source Ingestion, dump schema creation and entity modelling using Cosmos and Scala Azure Functions.
  • Worked on global multi region sources and associated rule based implementation of Spark Azure Databricks notebooks driven etl region specific pipelines. Databricks Delta.
  • Integrated entities in the property domain, using Azure Cosmos Graph and Azure Databricks Notebooks, followed by Scala web-service APIs deployed on Azure HDinsights for quick search
  • Worked on Streaming data Application element of the pipeline, detecting refreshes

Competitive analytics platform

  • Designing of, individual table based schema handling, ingestion and implementation of a data warehouse for KPI tracking, and it’s respective components for a full fledged reporting data-warehouse.
  • Created spark jobs for handling of daily data from Mongo, MySQL, Postgres and Folder dumps to update the data warehouses, using Airflow scheduling.
  • Managed scaled ingestion from public competitor apis for tracking relevant parameters in analytics warehouse on Redshift.
  • Worked on complex custom reporting spark logic driving insightful marketing strategy.
  • Benchmarked the real-time elements of the solution with Kafka Streams.

Senior Data Engineer

Robert Bosch Engineering Solutions, Germany

Dec 2018

Kiosk Monitoring Product

  • Created Spark batch jobs based on derivation from incoming data-model via a productionised ML model with associated business logic.
  • Implemented Flask APIs layer and simulator for the application
  • Testing end-to-end pipeline and DEVops of associated individual component log monitoring
  • Overall design and development of the lambda architecture MQTT based, Kafka, Spark pipeline for data ingestion, and alert detection. Cloud agnostic framework.

Flink Scala Akka Complex Event Processing Product

  • Created Scala Flink complex event processing and detection pipeline from incoming data-model with business logic.
  • Worked on APIs layer implementation in Akka and a simulator for data (ongoing)
  • Testing end-to-end pipeline and DEVops of associated individual component log monitoring on AWS(ongoing).
  • Created data format based overall design and development of a MQTT based, Kafka, Flink, RDBMS and Cassandra pipeline for data ingestion, and event/milestone detection.

Software Developer

General Electric Corp, India

Oct 2017

GE Healthcare’s Device Monitoring Product

  • Deployment and maintenance of the Azure cloud based cluster(DevOps), along with pipeline design and data handling constraint using a Data Virtualization tool.
  • Implemented detection algorithms, of different respiration and lung parameters, and accumulation algorithms for case-end aggregation requirements.
  • Data modeling for Cassandra for real-time data storage and case-end data aggregation.
  • Data Modeling for Data-warehousing and UI based consumption

Company Log Data Analytics 

  • Involved in PIG scripting and the HIVE database, to staging layer for processing before loading into final Hadoop table
  • Worked on OOZIE workflows for executing Java, pig and hive actions based on decision nodes, scheduled Oozie Workflow and Coordinator Jobs

Education

Bachelors in Engg.

San Jose State University

Jul 2014


Languages

  • English
  • French
  • Arabic
  • German

Career Expert Tips:

  • Always make sure you choose the perfect resume format to suit your professional experience.
  • Ensure that you know how to write a resume in a way that highlights your competencies.
  • Check the expert curated popular good CV and resume examples