Data Engineer - Advanced Analytics - Remote
New Iron is leading the search for a Data Engineer to join an industrial manufacturing company on the cutting edge of materials science.
This is a direct-hire remote role for a customer based in Charlotte, NC. Remote candidates are strongly encouraged to apply.
You will join a core platform development team designing and implementing high-performance data pipelines using cutting edge technologies.
Your primary responsibility will be to architect reliable large scale data ingestion pipelines that land inbound data from various data stores throughout our clients on-premise and cloud based data lakes.
You will join a development team supporting advanced analytics projects by requiring data validation, data profiling automation, and CI/CD to ensure maintainability of inbound data flows.
If you are an expert on query languages and data cleansing, and have experience working on large scale data engineering projects, we'd love to talk to you!
Responsibilities may include:
- Design, test, deploy and maintain highly performant data ingestion pipelines from landed to cleansed, batch to streamed unstructured data using Apache Spark
- Participate in code reviews and improve software engineering standards and best practices and share knowledge with peers
- Work with cross functional teams using Agile development best practices, CI/CD methods with goal of automating the build integration, deployment and monitoring
- Stay current with industry trends and technologies advancements to improve your quality, productivity and performance
- Work with data source teams to define data cleansing and enrichment requirements for landed data. Also data ingestion requirements including implementation to result landed data as valid
- Provide support in a DevOps environment such as monitor tokens and overall system performance
- 5+ years in Big data engineering roles
- Expert with Apache Spark platform for developing batch, micro-batch and streaming ingestion pipelines and leveraging all levels of the API (e.g., SparkContext, DataFrames, DataSets, GraphFrames, SparkSQL, SparkML)
- Experience with AWS services such as S3, EC2, DMS, RDS, RedShift, DynamoDB, CloudTrail,EKS, IAM and CloudWatch
- Experience with Terraform / CloudFormation
- Experience developing and maintaining ETL and ELT pipelines for Data Warehousing (On-prem and Cloud)
- Expert using query languages (e.g., SQL or Spark SQL)
- Expert with agile development; CI/CD environment
- Expert with Data Cleansing tools
- Expert with traditional relational and polyglot persistence technologies
- Strong Hands on with Spark core architecture e.g S3, parquet and Delta Lake architecture, similar technologies and tools
- At least one modern JVM language such as Java, Scala
- At least one other language such as Python
- Strong hands on understanding of notebook environments such as JupyterHub
- Values and prioritizes well designed, testable, extensible and maintainable code
- Excellent technical communication, collaboration, time-management skills
Nice to Have:
- Master degree in Computer Science or a related field
- Full Stack experience developing large scale distributed systems
- Familiarity with Oracle, Microsoft SQL, SSIS, SSRS
- Familiarity with enterprise ETL and integrations tools such as Informatica, Mulesoft
- Familiarity with open source data integration and DAG tools such as NiFi, Airflow, Streamsets, etc.
- Familiarity with Data sources and integration solutions ( used in manufacturing enterprises) such as Maximo, Pi Integrator, etc.
- Reporting and analysis tools such as PowerBI, Tableau, etc.
Candidates must be authorized to work in the United States on a full-time basis for any employer. Principals only.Recruiters, please do not contact this job poster.