最具影响力的数字化技术在线社区

168大数据

 找回密码
 立即注册

QQ登录

只需一步,快速开始

1 2 3 4 5
打印 上一主题 下一主题
开启左侧

如何成为一名大数据工程师?

[复制链接]
跳转到指定楼层
楼主
发表于 2018-12-27 14:13:44 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式

马上注册,结交更多数据大咖,获取更多知识干货,轻松玩转大数据

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
本帖最后由 168主编 于 2018-12-27 14:14 编辑

未来,对熟练的大数据工程师的需求将急速增长。现实的情况是这样的,无论公司属于哪个行业,要想在当今竞争激烈的市场环境中取得成功,需要一个强大的软件架构用来存储和访问公司数据,最好从公司创立一开始就要搭建它。
在如今有时候有数据的地方就叫大数据,这未免有些夸张,在本文中统称为数据工程师和数据科学家。
先了解一下,数据工程师究竟做什么事?一个人怎么样成为数据工程师?我们将讨论这个有趣的领域以及如何成为数据工程师。
数据工程师都做什么?
数据工程师负责创建和维护分析基础架构,该基础架构几乎可以支持数据世界中的所有其他功能。他们负责大数据架构的开发、构建、维护和测试,例如数据库和大数据处理系统。大数据工程师还负责创建用于建模,挖掘,获取和验证数据集合等流程。
因此,数据工程师需要掌握通用脚本语言和工具,利用和改进数据分析系统,不断提高数据数量和质量。
数据工程师与数据科学家有何区别
虽然在技能和角色方面存在一定程度的重叠,但这两个职位正日益分化为不同的角色。
数据科学家更关注与数据基础设施的互动,而不是去创建和维护数据基础设施。通常负责进行市场和业务运营研究,以确定趋势和关系,数据科学家用各种复杂的机器和方法与数据进行交互并对其采取行动。
数据科学家通常精通机器学习和高级数据建模,因为他们希望借助高级数学模型和算法将原始数据转化为可操作的,可理解的内容。这些信息通常用作分析来源,以告诉决策者“更大的图景”。
那么是什么让数据科学家与数据工程师不同呢?两者主要区别在目标焦点。数据工程师更专注于构建用于数据生成和数据基础架构; 数据科学家专注于对生成的数据进行数学和统计分析。
数据工程师的关键技能
下面介绍数据工程师所需的几项关键技能。
1.大数据架构的工具与组件
数据工程师更关注分析基础架构,因此所需的大部分技能都是以架构为中心的。
2.深入了解SQL和其它数据库解决方案
数据工程师需要熟悉数据库管理系统,深入了解SQL至关重要。同样其它数据库解决方案,例如Cassandra或BigTable也须熟悉,因为不是每个数据库都是由可识别的标准来构建。
3.数据仓库和ETL工具
数据仓库和ETL经验对于数据工程师至关重要。像Redshift或Panoply这样的数据仓库解决方案,以及ETL工具,比如StitchData或Segment都非常有用。此外,数据存储和数据检索经验同样重要,因为处理的数据量是个天文数字。
4.基于hadoop的分析(HBase,Hive,MapReduce等)
对基于Apache Hadoop的分析有深刻理解是这个领域的一个非常必要的需求,一般情况下HBase,Hive和MapReduce的知识存储是必需的。
5.编码
说到解决方案,编码与开发能力是一个重要的优点(这也是许多职位的要求),你要熟悉Python,C/C++,Java,Perl,Golang或其它语言,这会非常有价值。
6.机器学习
虽然数据工程师主要关注的是数据科学,但对数据处理技术的理解会加分,比如一些统计分析知识和基础数据建模。
机器学习已经成为标准数据科学,该领域的知识可以帮我们构建同类产品的解决方案。这种知识还有一个好处,就是让你在这个领域极具市场价值,因为在这种情况下能够“戴上两顶帽子”会让你成为一个更强大的工具。
7.多种操作系统
最后,需要我们对Unix,Linux和Solaris系统有深入了解,许多数学工具基于这些操作系统,因为它们有Windows和Mac系统功能没有的访问权限和特殊硬件需求。
如何成为数据工程师?
与其他职业相比,数据工程师需要用更复杂的学习方法。数据工程师通常有计算机科学技术相关学位会更好,然后再进一步学习供应商特定的认证计划和培训课程。
计算机相关学位虽然重要,但只是故事的一部分,获得适合的认证可能非常有价值,市场上也有一些大数据工程师专门认证,如下:
Google认证专家 – 数据工程。该认证表明学生熟悉数据工程原理,可以作为该领域的助理或专业人员。
IBM认证数据工程师  – 大数据。此认证更侧重于数据工程技能集的大数据特定应用,而不是一般技能,这被许多人视为黄金标准。
Cloudera的CCP数据工程师:该认证针对Cloudera解决方案,体现学生在ETL工具和分析方面的经验。
二级技能认证,例如MCSE(微软认证解决方案专家),涵盖更广泛的主题,但具有特定的子认证,如MCSE:数据管理与分析。
当然,在线教育平台提供该领域的重要培训,Udemy提供了数据工程众多的课程和数据科学,其他如EDX和Memrise也提供了类似课程,DataCamp专注于数据科学和工程,Galvanize的品类则更为广泛。
小结
虽然这些数据解决方案可以帮助您踏进大数据工程领域,虽然它们有分发或授予认证,但只是提供证书或文凭。虽然一般学习够了,但它们不能被认视为实际认证或实践的替代品。
希望本文能够给大家阐明数据工程师所需的特定知识,技能和要求。这个领域正在迅速发展,但它也充满了挑战与险阻。在工作中通过适当的认证填补技能组合的空白,实现最好学习的关键一步。
How to Become a Data Engineer
The demand for skilled data engineers is projected to rapidly grow. No wonder that’s the case; no matter what your company does, to succeed in today’s competitive environment, you need a robust infrastructure to both store and access your company’s data, and you need it from the very beginning.
What exactly does a data engineer do, though? And how does one become a data engineer? In this article, we’re going to talk about this interesting field and how you can become a data engineer.
What Does a Data Engineer Do?
Data engineers are responsible for the creation and maintenance of analytics infrastructure that enables almost every other function in the data world. They are responsible for the development, construction, maintenance, and testing of architectures, such as databases and large-scale processing systems. As part of this, Data Engineers are also responsible for the creation of data set processes used in modeling, mining, acquisition, and verification.
Engineers are expected to have a solid command of common scripting languages and tools for this purpose and are expected to use this skill set to constantly improve data quality and quantity by leveraging and improving data analytics systems.
The Difference Between Data Engineer and Data Scientist
While there is a certain amount overlap when it comes to skills and responsibilities, these two positions are being increasingly separated into distinct roles.
Data scientists are much more focused on the interaction with the data infrastructure rather than the building and maintenance thereof. They are often tasked with conducting high-level market and business operation research to identify trends and relations, and as part of this, they use a variety of sophisticated machines and methods to interact with and act upon data.
Data scientists are often well-versed in Machine Learning and advanced statistical modeling, as they are expected to take the raw data and turn it into actionable, understandable content with the help of advanced mathematical models and algorithms. This information is often used as an analysis source to tell the “bigger picture” to the decision makers.
So what makes a data scientist different from a data engineer? Generally speaking, the main difference is one of focus. Data engineers are much more focused on building infrastructure and architecture for data generation; data scientists are focused rather on advanced mathematics and statistical analysis on that generated data.
Data Engineers Key Skills
Here's a couple of the key skills needed from data engineers.
Tools and Components of Data Architecture
Since data engineers are much more concerned with analytics infrastructure, most of their required skills are, predictably, architecture-centric.
In-Depth Knowledge of SQL and Other Database Solutions
Data Engineers need to understand database management, and as such, in-depth knowledge of SQL is hugely valuable. Likewise, other database solutions, such as Cassandra or Bigtable, are great to know if you plan on doing freelance or for hire engineering, as not every database is going to be built in the recognizable standard.
Data Warehousing and ETL Tools
Data warehousing and ETL experience is essential to this position. Data warehousing solutions likeRedshift or Panoply, as well as familiarity with ETL Tools, such as with StitchData or Segment is hugely valuable. Similarly, experience with data storage and retrieval is equally vital, as the amount of data being dealt with is simply astronomical.
Hadoop-Based Analytics (HBase, Hive, MapReduce, etc.)
Having a strong understanding of Apache Hadoop-based analytics is a very common requirement in this space, with knowledge of HBase, Hive, and MapReduce often considered a requirement.
Coding
Speaking of solutions, knowledge of coding is a definite plus here (and also possibly a requirement for many positions). Familiarity, if not outright expertness, is very valuable in Python, C/C++, Java, Perl, Golang, or other such languages.
Machine Learning
While mainly the focus of data scientist, some level of understanding of how to act upon this data is also invaluable for Data Engineers. For this reason, some knowledge of statistical analysis and the basics data modeling are hugely valuable.  
While machine learning is technically something relegated to the Data Scientist, knowledge in this area is helpful to construct solutions usable by your cohorts. This knowledge has the added benefit of making you extremely marketable in this space, as being able to “put on both hats” in this case makes you a formidable tool.
Various Operating Systems
Finally, intimate knowledge of UNIX, Linux, and Solaris is very helpful, as many math tools are going to be based in these systems due to their unique demands for root access to hardware and operating system functionality above and beyond that of Microsoft’s Windows or Mac OS.
How Can I Become a Data Engineer?
Data engineering typically requires a more hybrid approach to education than other, more traditional careers. While teachers often have a degree specifically in teaching, Data Engineers often have a Computer Sciences or Information Technology degree that was then further parlayed with vendor specific Certification programs and training materials.
As such, your degree, while important, is only part of the story; getting the proper certifications can be hugely valuable. There are a few data engineering-specific certifications:
  • Google’s Certified Professional — data engineering. This certification establishes that the student is familiar with data engineering principles and can function as either an associate or a professional in the field.
  • IBM Certified Data Engineer — Big Data. This certification focuses more on Big Data-specific applications of data engineering skill sets rather than general skills but is considered a gold standard by many.
  • CCP Data Engineer from Cloudera: Specific to Cloudera’s solutions, this certification shows the student has experience in ETL tools and analytics.
  • Secondary certifications, such as the MCSE (Microsoft Certified Solutions Expert), cover a wide range of topics but have specific sub-certifications such as MCSE: Data Management and Analytics.
There are, of course, online courses that purport to offer significant training in this field. Udemy offers numerous courses in Data Engineering and data science, and other sites, such as EdX and Memrise offer similar coursework. Some sites, such as DataCamp, are heavily focused specifically on data science and engineering, while others, such as Galvanize, are more broad-based.
While these solutions can help you get your feet in the water, so to speak, they come with the caveat that they rarely dispense or confer certification, and at best, many only offer a certificate or diploma. As such, while they are great for general learning, they should not be considered a replacement for actual certification or accredited diploma issuance.
Hopefully, this piece has illuminated the specific talents, skills, and requirements expected of a data engineer. While the field is rapidly growing, it is fraught with obstacles. Therefore, attaining the best education possible while filling any gaps in skill sets with proper certification is key.

编译:勇哥
楼主热帖
分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 转播转播 分享分享 分享淘帖 赞 踩

168大数据 - 论坛版权1.本主题所有言论和图片纯属网友个人见解,与本站立场无关
2.本站所有主题由网友自行投稿发布。若为首发或独家,该帖子作者与168大数据享有帖子相关版权。
3.其他单位或个人使用、转载或引用本文时必须同时征得该帖子作者和168大数据的同意,并添加本文出处。
4.本站所收集的部分公开资料来源于网络,转载目的在于传递价值及用于交流学习,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。
5.任何通过此网页连接而得到的资讯、产品及服务,本站概不负责,亦不负任何法律责任。
6.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源,若标注有误或遗漏而侵犯到任何版权问题,请尽快告知,本站将及时删除。
7.168大数据管理员和版主有权不事先通知发贴者而删除本文。

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐上一条 /1 下一条

关于我们|小黑屋|Archiver|168大数据 ( 京ICP备14035423号|申请友情链接

GMT+8, 2024-5-1 23:49

Powered by BI168大数据社区

© 2012-2014 168大数据

快速回复 返回顶部 返回列表