Roadmap to becoming a data engineer in 2020
This roadmap aims to give a complete picture of the modern data engineering landscape and serve as a study guide for aspiring data engineers.
Beginners shouldn’t feel overwhelmed by the vast number of tools and frameworks listed here. A typical data engineer would master a subset of these tools throughout several years depending on his/her company and career choices.
Please raise an issue to discuss your suggestions or open a Pull Request to request improvements.
Copyright © 2020 Alexandra Abbas — email@example.com
Generative Adversarial Nets (GAN) have received considerable attention since the 2014 groundbreaking work by Goodfellow et al. Such attention has led to an explosion in new ideas, techniques and applications of GANs. To better understand GANs we need to understand the mathematical foundation behind them. This paper attempts to provide an overview of GANs from a mathematical point of view. Many students in mathematics may find the papers on GANs more difficulty to fully understand because most of them are written from computer science and engineer point of view. The aim of this paper is to give more mathematically oriented students an introduction to GANs in a language that is more familiar to them.
When I started out, I had a strong quantitative background (chemical engineering undergrad, was taking PhD courses in chemical engineering) and some functional skills in programming. From there, I first dove deep into one type of machine learning (Gaussian processes) along with general ML practice (how to set up ML experiments in order to evaluate your models) because that was what I needed for my project. I learned mostly online and by reading papers, but I also took one class on data analysis for biologists that wasn’t ML-focused but did cover programming and statistical thinking. Later, I took a linear algebra class, an ML survey class, and an advanced topics class on structured learning at Caltech. Those helped me obtain a broad knowledge of ML, and then I’ve gained deeper understandings of some subfields that interest me or are especially relevant by reading papers closely (chasing down references and anything I don’t understand and/or implementing the core algorithms myself).
Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.
Keeping the dialogue state in dialogue systems is a notoriously difficult task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue manager that keeps the state of the conversation, provides a basis for anaphora resolution and drives the conversation via domain ontologies. The banking and finance area promises great potential for disambiguating the context via a rich set of products and specificity of proper nouns, named entities and verbs. We used ontologies both as a knowledge base and a basis for the dialogue manager; the knowledge base component and dialogue manager components coalesce in a sense. Domain knowledge is used to track Entities of Interest, i.e. nodes (classes) of the ontology which happen to be products and services. In this way we also introduced conversation memory and attention in a sense. We finely blended linguistic methods, domain-driven keyword ranking and domain ontologies to create ways of domain-driven conversation. Proposed framework is used in our in-house German language banking and finance chatbots. General challenges of German language processing and finance-banking domain chatbot language models and lexicons are also introduced. This work is still in progress, hence no success metrics have been introduced yet.