Epicareer Might not Working Properly
Learn More

Data Engineer

Salary undisclosed

Checking job availability...

Original
Simplified
Responsibilities: • Selects the right tools and technologies for data ingestion, transformation, and storage • Implements and maintains connectors for data ingestion from various sources • Ensures the quality, integrity, and consistency of data through comprehensive transformation processes • Configures and manages databases (SQL, NoSQL, graph, etc.), ensuring they are suitable for large-scale analytics and machine learning • Collaborates with Data Scientists to define the appropriate data pipeline strategy (ETL vs. ELT) • Monitors data systems to ensure smooth operation, identifying issues and resolving them promptly • Documents database schemas, models, and pipelines, ensuring transparency and scalability • Builds and maintains scalable search solutions to enable efficient data retrieval • Optimizes data pipelines and storage solutions to balance performance and cost Requirements: • Expertise in selecting and implementing data ingestion, transformation, and storage technolo-gies • Strong knowledge of database systems (SQL, NoSQL, Graph), data modeling, and schema design (OLAP focus) • Experience in building and managing ETL/ELT pipelines for large-scale data processing • Proficiency in data transformation techniques, ensuring quality, integrity, and scalability • Familiarity with file-based storage and hybrid data storage solutions • Knowledge of full-text search engines and semantic search techniques • Experience with graph databases and handling complex data relationships • Strong skills in monitoring and scaling data infrastructure according to performance and cost requirements • Understanding of compute architectures, including memory, cache, and bandwidth considera-tions • Experience in documenting and managing data models and pipelines • Strong collaboration skills, working effectively with data scientists, engineers, and business stakeholders • Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and distributed data processing tools