本路线图旨在为现代数据工程提供一幅完整的图景,并作为数据工程师的学习指南。

初学者不应该对这里列出的大量工具和框架感到不知所措。一个典型的数据工程师会根据他/她的公司和职业选择在数年内掌握这些工具的一个子集。

https://github.com/datastacktv/data-engineer-roadmap

Modern Data Engineer Roadmap 2020

Roadmap to becoming a data engineer in 2020

Twitter YouTube Website

This roadmap aims to give a complete picture of the modern data engineering landscape and serve as a study guide for aspiring data engineers.


Note to beginners

Beginners shouldn’t feel overwhelmed by the vast number of tools and frameworks listed here. A typical data engineer would master a subset of these tools throughout several years depending on his/her company and career choices.


Data Engineer Roadmap

Nice to have 😎

Data Engineer Roadmap Extras

Contributions are welcome 💜

Please raise an issue to discuss your suggestions or open a Pull Request to request improvements.

Reviewers 🔎

Huge thank you to @whydidithavetobebugs, @sawidis, @marclamberti and @mpyeager for reviewing this roadmap.

About us 👋🏼

datastack.tv is the learning platform for the modern data stack. We create concise screencast video tutorials for data engineers. Browse our courses here!

License 🗞

Copyright © 2020 Alexandra Abbas — hello@datastack.tv

成为VIP会员查看完整内容
0
17

相关内容

Generative Adversarial Nets (GAN) have received considerable attention since the 2014 groundbreaking work by Goodfellow et al. Such attention has led to an explosion in new ideas, techniques and applications of GANs. To better understand GANs we need to understand the mathematical foundation behind them. This paper attempts to provide an overview of GANs from a mathematical point of view. Many students in mathematics may find the papers on GANs more difficulty to fully understand because most of them are written from computer science and engineer point of view. The aim of this paper is to give more mathematically oriented students an introduction to GANs in a language that is more familiar to them.

0
18
下载
预览

When I started out, I had a strong quantitative background (chemical engineering undergrad, was taking PhD courses in chemical engineering) and some functional skills in programming. From there, I first dove deep into one type of machine learning (Gaussian processes) along with general ML practice (how to set up ML experiments in order to evaluate your models) because that was what I needed for my project. I learned mostly online and by reading papers, but I also took one class on data analysis for biologists that wasn’t ML-focused but did cover programming and statistical thinking. Later, I took a linear algebra class, an ML survey class, and an advanced topics class on structured learning at Caltech. Those helped me obtain a broad knowledge of ML, and then I’ve gained deeper understandings of some subfields that interest me or are especially relevant by reading papers closely (chasing down references and anything I don’t understand and/or implementing the core algorithms myself).

成为VIP会员查看完整内容
0
33

学习自然语言处理路线图,要总结了NLP相关的路线图(思维导图)和关键词(知识点),包括概率和统计、机器学习、文本挖掘、自然语言处理几个部分。 ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP https://github.com/graykode/nlp-roadmap

成为VIP会员查看完整内容
0
56

Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.

0
5
下载
预览

Keeping the dialogue state in dialogue systems is a notoriously difficult task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue manager that keeps the state of the conversation, provides a basis for anaphora resolution and drives the conversation via domain ontologies. The banking and finance area promises great potential for disambiguating the context via a rich set of products and specificity of proper nouns, named entities and verbs. We used ontologies both as a knowledge base and a basis for the dialogue manager; the knowledge base component and dialogue manager components coalesce in a sense. Domain knowledge is used to track Entities of Interest, i.e. nodes (classes) of the ontology which happen to be products and services. In this way we also introduced conversation memory and attention in a sense. We finely blended linguistic methods, domain-driven keyword ranking and domain ontologies to create ways of domain-driven conversation. Proposed framework is used in our in-house German language banking and finance chatbots. General challenges of German language processing and finance-banking domain chatbot language models and lexicons are also introduced. This work is still in progress, hence no success metrics have been introduced yet.

0
3
下载
预览
小贴士
相关主题
相关VIP内容
专知会员服务
11+阅读 · 2020年10月24日
专知会员服务
52+阅读 · 2020年9月8日
专知会员服务
36+阅读 · 2020年8月5日
打怪升级!2020机器学习工程师技术路线图
专知会员服务
74+阅读 · 2020年6月3日
专知会员服务
17+阅读 · 2020年4月12日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
专知会员服务
46+阅读 · 2019年10月12日
强化学习最新教程,17页pdf
专知会员服务
51+阅读 · 2019年10月11日
机器学习入门的经验与建议
专知会员服务
33+阅读 · 2019年10月10日
学习自然语言处理路线图
专知会员服务
56+阅读 · 2019年9月24日
相关资讯
计算机类 | ISCC 2019等国际会议信息9条
Call4Papers
5+阅读 · 2018年12月25日
最全数据科学学习资源:Python、线性代数、机器学习...
人工智能头条
8+阅读 · 2018年5月14日
Python机器学习教程资料/代码
机器学习研究会
5+阅读 · 2018年2月22日
自然语言处理 (NLP)资源大全
机械鸡
35+阅读 · 2017年9月17日
GitHub万星推荐:黑客成长技术清单
程序猿
8+阅读 · 2017年8月25日
【推荐】Python机器学习生态圈(Scikit-Learn相关项目)
机器学习研究会
5+阅读 · 2017年8月23日
相关论文
Jui-Ting Huang,Ashish Sharma,Shuying Sun,Li Xia,David Zhang,Philip Pronin,Janani Padmanabhan,Giuseppe Ottaviano,Linjun Yang
9+阅读 · 2020年6月20日
AliCoCo: Alibaba E-commerce Cognitive Concept Net
Xusheng Luo,Luxin Liu,Yonghua Yang,Le Bo,Yuanpeng Cao,Jinhang Wu,Qiang Li,Keping Yang,Kenny Q. Zhu
10+阅读 · 2020年3月30日
Sean Welleck,Jason Weston,Arthur Szlam,Kyunghyun Cho
5+阅读 · 2018年11月1日
Andrzej Stanisław Kucik,Konstantin Korovin
3+阅读 · 2018年7月26日
Elena Voita,Pavel Serdyukov,Rico Sennrich,Ivan Titov
3+阅读 · 2018年5月25日
Elias Pimenidis,Nikolaos Polatidis,Haralambos Mouratidis
7+阅读 · 2018年5月6日
Sahisnu Mazumder,Nianzu Ma,Bing Liu
5+阅读 · 2018年2月16日
Mahsa Sadat Shahshahani,Mahdi Mohseni,Azadeh Shakery,Heshaam Faili
5+阅读 · 2018年1月30日
Top