Job Code: DEUSRL5
Job Location: Remote, USA
The ideal candidate will play a pivotal role in building and operationalizing the minimally inclusive data necessary for the enterprise data and analytics initiatives following industry standard practices and tools. The bulk of the candidate’s work would be in building, managing and optimizing data pipelines and then moving these data pipelines effectively into production for key data and analytics consumers like business/data analysts, data scientists or any persona that needs curated data for data and analytics use cases across the enterprise.
• Managed data pipelines consist of a series of stages through which data flows. These data pipelines must be created, maintained and optimized as workloads move from development to production for specific use cases.
• Drive Automation through effective metadata management. Using innovative and modern tools, techniques and architectures automate the most-common repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity.
• Assist with renovating the data management infrastructure to drive automation in data integration and management, including
learning and using modern data preparation, integration, and AI-enabled metadata management tools and techniques, tracking data consumption patterns, performing intelligent sampling and caching and monitoring schema changes
• Work in close relationship with data science teams and with business (data) analysts in refining their data requirements for various data and analytics initiatives and their data consumption requirements.
• Train counterparts such as [data scientists, data analysts, LOB users or any data consumers] in these data pipelining and preparation techniques, which make it easier for them to integrate and consume the data they need for their own use cases.
• Collaborate with data governance teams to ensure that the data users and consumers use the data provisioned to them responsibly through data governance and compliance initiatives.
• Travel to customer sites to deploy solutions and deliver workshops to educate and empower customers
• Strong experience with various Data Management architectures like Data Warehouse, Data Lake, Data Hub and the supporting processes like Data Integration, Governance, Metadata Management
• Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management.
• Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication/CDC, message-oriented data movement, API design and access and upcoming data ingestion and integration technologies such as stream data integration, CEP and data virtualization.
• Experience in working with data governance/data quality and data security teams and specifically information stewards and privacy and security officers in moving data pipelines into production with appropriate data quality, governance and security standards and certification. Ability to build quick prototypes and to translate prototypes into data products and services in a diverse ecosystem.
• Demonstrated success in working with large, heterogeneous datasets to extract business value using popular data preparation tools such as Trifacta, Paxata, Unifi, others to reduce or even automate parts of the tedious data preparation tasks.
• Hands-On experience with BigQuery, Looker, Pub/Sub, Dataflow, Cloud Data Fusion, Cloud Storage, Cloud Composer, and Data Catalog or equivalent AWS products.BigQuery, Looker, Pub/Sub, Dataflow, Cloud Data Fusion, Cloud Storage, Cloud Composer, and Data Catalog
• Strong experience with popular database programming languages including SQL, PL/SQL, others for relational databases and certifications on upcoming NoSQL/Hadoop oriented databases like MongoDB, Cassandra, others for non relational databases.
• Strong experience in working with SQL on Hadoop tools and technologies including HIVE, Impala, Presto, and others from an open source perspective and Hortonworks Data Flow (HDF), Dremio, Informatica, Talend, and others from a commercial vendor perspective.
• Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as R, Python, Java, C++, Scala, and others.
• Strong experience in working with both open-source and commercial message queuing technologies such as Kafka, JMS, Azure Service Bus, Amazon Simple queuing Service, and others, stream data integration technologies such as Apache Nifi, Apache Beam, Apache Kafka Streams, Amazon Kinesis, and stream analytics technologies such as Apache Kafka KSQL Apache Spark Streaming Apache Samza, others.
• Strong experience in working with DevOps capabilities like version control, automated builds, testing and release management capabilities using tools like Git, Jenkins, Puppet, Ansible.
• Strong experience in working with data science teams in refining and optimizing data science and machine learning models and algorithms
• Demonstrated success in working with both IT and business while integrating analytics and data science output into business processes and workflows.
• Basic experience working with popular data discovery, analytics and BI software tools like Tableau, Qlik, PowerBI and others for semantic-layer-based data discovery.
• Demonstrated ability to work across multiple deployment environments including cloud, on-premises and hybrid, multiple operating systems and through containerization techniques such as Docker, Kubernetes, AWS Elastic Container Service and others
• Adept in agile methodologies and capable of applying DevOps and increasingly DataOps principles to data pipelines to improve the communication, integration, reuse and automation of data flows between data managers and consumers across an organization
• 5+ years of work experience in data management disciplines including data integration, modeling, optimization and data quality, and/or other areas directly relevant to data engineering responsibilities and tasks.
• 3+ years of experience working in cross-functional teams and collaborating with business stakeholders in support of a departmental and/or multi-departmental data management and analytics initiative.
• A bachelor’s or master’s degree in computer science, statistics, applied mathematics, data management, information systems, information science or a related quantitative field or equivalent work experience is required.
• Experience in Fintech and/or Insurance industry will be a preferred