数据科学家

$1500 - $3000 兼职
数据科学家
职位描述

1. ETL 流程设计与开发 负责大数据 ETL 流程的设计、开发和优化,确保数据的准确性、完整性和及时性。理解业务需求,参与数据仓库架构设计,制定合理的 ETL 解决方案,满足不同业务场景的数据处理要求。 2.Spark 应用开发 使用 Spark 进行大规模数据处理和分析,开发 Spark 应用程序,实现数据的清洗、转换和加载等操作。优化 Spark 作业性能,对 Spark 任务进行调优,提高数据处理效率,降低资源消耗。 3.Python 编程与脚本开发 利用 Python 编写数据处理脚本和工具,用于数据采集、预处理、监控等任务。 与其他团队协作,将 Python 代码与 Spark 应用集成,实现更复杂的数据处理流程。 4.PySpark 集成与开发 在 PySpark 环境下进行开发,充分发挥 Python 和 Spark 的优势,实现高效的数据处理和分析。解决 PySpark 开发过程中遇到的技术问题,如数据类型转换、性能优化、内存管理等。 5.数据质量保障 制定和实施数据质量监控策略,对 ETL 过程中的数据进行质量检查和验证,及时发现和解决数据质量问题。建立数据质量报告机制,定期向相关团队汇报数据质量情况,为数据决策提供支持。 6.团队协作与技术支持 与数据分析师、数据科学家、数据仓库工程师等团队成员密切合作,共同完成项目任务,提供技术支持和解决方案。参与团队技术交流和分享,不断提升团队整体技术水平和开发效率。

职位要求

1.ETL process design and development Participate in the design, development and optimization of big data ETL processes to ensure the accuracy, integrity and timeliness of data.Understand business requirements, participate in data warehouse architecture design, and formulate reasonable ETL solutions to meet data processing requirements in different business scenarios. 2.Spark application development Use Spark for large-scale data processing and analysis, develop Spark application programs, and implement operations such as data cleaning, transformation and loading.Optimize the performance of Spark jobs, tune Spark tasks, improve data processing efficiency and reduce resource consumption. 3.Python programming and script development Use Python to write data processing scripts and tools for tasks such as data collection, preprocessing, and monitoring.Collaborate with other teams to integrate Python code with Spark applications to achieve more complex data processing flows. 4.PySpark integration and development Develop in the PySpark environment, give full play to the advantages of Python and Spark, and achieve efficient data processing and analysis.Solve technical problems encountered in the PySpark development process, such as data type conversion, performance optimization, and memory management. 5.Data quality assurance Formulate and implement data quality monitoring strategies, conduct quality inspections and validations on data in the ETL process, and discover and solve data quality problems in a timely manner.Establish a data quality reporting mechanism and regularly report data quality status to relevant teams to provide support for data decision-making. 6.Team collaboration and technical support Closely cooperate with team members such as data analysts, data scientists, and data warehouse engineers to jointly complete project tasks and provide technical support and solutions.Participate in team technical exchanges and sharing to continuously improve the overall technical level and development efficiency of the team.

福利待遇

Please email your resume to romola.wang@trustalabs.com