Data Management and Programming
Enseignant responsable :
Volume horaire : 36Description du contenu de l'enseignement :
This course provides an introduction to programming and to data management, with a data- oriented point of view. The course contains two parts. The data management part introduces the data life cycle in data oriented projects from data collection to data exploration. While the main focus of the course is tabular data, it contains also an introduction to entity-relationship models and to relational databases. The programming part of the course introduces the fundamental aspects of imperative programming and the use of the main Python data structures. The two aspects of the course are tightly integrated: each aspect of data management is illustrated by adapted programming constructs and uses specific data structures from Python. In addition, an introduction to computational complexity is provided and the scalability of all the methods presented in the course is assessed.
Pré-requis recommandés :
Most of the course is self-contained but the students are expected to be familiar with all the mathematical tools associated to an economics curriculum: Linear algebra, calculus, continuous optimization, probability and statistics, all at an undergraduate level. A significant part of the examples of data manipulation from the course will make use of this mathematical knowledge. However, the course should be accessible even with only a cursory knowledge of most of the listed concepts.
Coefficient : 1 (Pour le M1 Affaires Internationales et Développement) 1 (Pour le M1 Quantitative Economics)Compétence à acquérir :
The first objective of the course is to introduce students to data-driven projects, by presenting the first steps of such projects from data collection to data exploration. Acknowledging the strong limitations of integrated software that rely solely (or mostly) on graphical user interfaces, the second major objective of the course is to provide all the programming knowledge and tools needed to implement all those data management steps, relying on Python language.
After having attended the classes, the students will be able to:
- specify a data management chain adapted to a data-driven project;
- identify the potential data value increase at the different steps of the chain;
- implement those steps in Python: data cleaning, data storage, data aggregation and other requests, data exploration;
- more generally implement non-obvious data manipulation schemes in Python;
- assess the computational complexity of Python scripts
Mode de contrôle des connaissances :
The final grade will be made of two types of grading: A continuous assessment grade, made mostly of grades obtained to quizzes (approximately 50 % of the grade) and integrating oral participation during the class and regular attendance; A grade obtained on a full data-oriented project from data collection to data exploration (preferably done in groups of 2 students).
Bibliographie, lectures recommandées
Python for Data Analysis, Wes McKinney, OReilly, 2017.