Roadmap to becoming a data engineer in 2020

This roadmap aims to give a complete picture of the modern data engineering landscape and serve as a study guide for aspiring data engineers.

### Note to beginners

Beginners shouldn’t feel overwhelmed by the vast number of tools and frameworks listed here. A typical data engineer would master a subset of these tools throughout several years depending on his/her company and career choices.

## Contributions are welcome 💜

Please raise an issue to discuss your suggestions or open a Pull Request to request improvements.

## Reviewers 🔎

Huge thank you to @whydidithavetobebugs, @sawidis, @marclamberti and @mpyeager for reviewing this roadmap.

datastack.tv is the learning platform for the modern data stack. We create concise screencast video tutorials for data engineers. Browse our courses here!

### 相关内容

Generative Adversarial Nets (GAN) have received considerable attention since the 2014 groundbreaking work by Goodfellow et al. Such attention has led to an explosion in new ideas, techniques and applications of GANs. To better understand GANs we need to understand the mathematical foundation behind them. This paper attempts to provide an overview of GANs from a mathematical point of view. Many students in mathematics may find the papers on GANs more difficulty to fully understand because most of them are written from computer science and engineer point of view. The aim of this paper is to give more mathematically oriented students an introduction to GANs in a language that is more familiar to them.

When I started out, I had a strong quantitative background (chemical engineering undergrad, was taking PhD courses in chemical engineering) and some functional skills in programming. From there, I first dove deep into one type of machine learning (Gaussian processes) along with general ML practice (how to set up ML experiments in order to evaluate your models) because that was what I needed for my project. I learned mostly online and by reading papers, but I also took one class on data analysis for biologists that wasn’t ML-focused but did cover programming and statistical thinking. Later, I took a linear algebra class, an ML survey class, and an advanced topics class on structured learning at Caltech. Those helped me obtain a broad knowledge of ML, and then I’ve gained deeper understandings of some subfields that interest me or are especially relevant by reading papers closely (chasing down references and anything I don’t understand and/or implementing the core algorithms myself).

Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.

Keeping the dialogue state in dialogue systems is a notoriously difficult task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue manager that keeps the state of the conversation, provides a basis for anaphora resolution and drives the conversation via domain ontologies. The banking and finance area promises great potential for disambiguating the context via a rich set of products and specificity of proper nouns, named entities and verbs. We used ontologies both as a knowledge base and a basis for the dialogue manager; the knowledge base component and dialogue manager components coalesce in a sense. Domain knowledge is used to track Entities of Interest, i.e. nodes (classes) of the ontology which happen to be products and services. In this way we also introduced conversation memory and attention in a sense. We finely blended linguistic methods, domain-driven keyword ranking and domain ontologies to create ways of domain-driven conversation. Proposed framework is used in our in-house German language banking and finance chatbots. General challenges of German language processing and finance-banking domain chatbot language models and lexicons are also introduced. This work is still in progress, hence no success metrics have been introduced yet.

11+阅读 · 2020年10月24日

52+阅读 · 2020年9月8日

36+阅读 · 2020年8月5日

74+阅读 · 2020年6月3日

17+阅读 · 2020年4月12日

46+阅读 · 2019年10月12日

51+阅读 · 2019年10月11日

33+阅读 · 2019年10月10日

56+阅读 · 2019年9月24日

Call4Papers
5+阅读 · 2018年12月25日

8+阅读 · 2018年5月14日

5+阅读 · 2018年2月22日

35+阅读 · 2017年9月17日

8+阅读 · 2017年8月25日

5+阅读 · 2017年8月23日

Jui-Ting Huang,Ashish Sharma,Shuying Sun,Li Xia,David Zhang,Philip Pronin,Janani Padmanabhan,Giuseppe Ottaviano,Linjun Yang
9+阅读 · 2020年6月20日
Xusheng Luo,Luxin Liu,Yonghua Yang,Le Bo,Yuanpeng Cao,Jinhang Wu,Qiang Li,Keping Yang,Kenny Q. Zhu
10+阅读 · 2020年3月30日
Sean Welleck,Jason Weston,Arthur Szlam,Kyunghyun Cho
5+阅读 · 2018年11月1日
Andrzej Stanisław Kucik,Konstantin Korovin
3+阅读 · 2018年7月26日
Elena Voita,Pavel Serdyukov,Rico Sennrich,Ivan Titov
3+阅读 · 2018年5月25日
Elias Pimenidis,Nikolaos Polatidis,Haralambos Mouratidis
7+阅读 · 2018年5月6日
Sahisnu Mazumder,Nianzu Ma,Bing Liu
5+阅读 · 2018年2月16日