Colazzo Dario - CV

LAMSADE

Colazzo Dario

Full Professor

Biography

My research and teaching activities are in the context of efficient processing of massive semi-structured datasets, mainly relying on shared-nothing parallelism and distribution (Hadoop, Spark, Flink, ...). I am particularly interested in formal aspects and applications of static analysis techniques for processing massive graph and JSON data.
Currently I am working on the following projects: Schema inference for massive JSON data sets, Pre-filtering techniques for context-aware recommender systems, Incremental saturation of massive RDF data, Type systems for graph databases and queries.

Publications

Articles

Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2024), Validation of Modern JSON Schema: Formalization and Complexity, Proceedings of the ACM on Programming Language, vol. 8, p. 1451-1481

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2023), Negation-closure for JSON Schema, Theoretical Computer Science, vol. 955, p. 113823

Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2022), Witness Generation for JSON Schema, Proceedings of the VLDB Endowment, vol. 15, n°13, p. 4002-4014

Baazizi A., Colazzo D., Ghelli G., Sartiani C. (2019), Parametric schema inference for massive JSON datasets, The VLDB Journal, vol. 28, n°4, p. 497-521

Bidoit N., Colazzo D., Malla N., Sartiani C. (2018), Evaluating Queries and Updates on Big XML Documents, Information Systems Frontiers, vol. 20, n°1, p. 63-90

Colazzo D., Ghelli G., Sartiani C. (2017), Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation, ACM Transactions on Database Systems, vol. 42, n°4, p. 1-44

Camacho-Rodriguez J., Colazzo D., Manolescu I. (2015), PAXQuery: Efficient Parallel Processing of Complex XQuery, IEEE Transactions on Knowledge and Data Engineering, vol. 27, n°7, p. 1977-1991

Nguyen B., Dudouet F-X., Colazzo D., Vion A., Manolescu I., Senellart P. (2011), XML content warehousing : Improving sociological studies of mailing lists and web data, BMS : Bulletin de méthodologie sociologique, vol. 112, n°1, p. 5-31

Dudouet F-X., Nguyen B., Colazzo D., Manolescu I., Vion A. (2010), Webstand, une plateforme de gestion de données web pour applications sociologiques, TSI : Technique et Science Informatiques, vol. 29, n°8-9, p. 1055-1080

Chapitres d'ouvrage

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2020), Scalable Saturation of Streaming RDF Triples, in Abdelkader Hameurlain, A Min Tjoa, Philippe Lamarre, Karine Zeitouni, Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV : Special Issue on Data Management – Principles, Technologies, and Applications Springer, p. 1-40

Communications avec actes

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2021), An Empirical Study on the “Usage of Not” in Real-World JSON Schema Documents, in Aditya Ghose ; Jennifer Horkoff ; Vítor E. Silva Souza ; Jeffrey Parsons ; Joerg Evermann, Berlin Heidelberg, Springer International Publishing, 102-112 p.

Baazizi M-A., Berti C., Colazzo D., Ghelli G., Sartiani C. (2020), Human-in-the-Loop Schema Inference for Massive JSON Datasets, in , 23rd International Conference on Extending Database Technology, EDBT 2020, Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 635-638 p.

Fruth M., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2020), Challenges in Checking JSON Schema Containment over Evolving Real-World Schemas, in Georg Grossmann ; Sudha Ram, Berlin Heidelberg, Springer International Publishing, 220-230 p.

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2019), Streaming saturation for large RDF graphs with dynamic schema information, in Alvin Cheung, Kim Nguyễn, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages (DBPL 2019 ), New York, NY, ACM - Association for Computing Machinery, 42-52 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), A Type System for Interactive JSON Schema Inference (Extended Abstract), in Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, Stefano Leonardi, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 101:1--101:13 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), Schemas And Types For JSON Data, in Melanie Herschel, Helena Galhardas, Berthold Reinwald, et.al, 22nd International Conference on Extending Database Technology (EDBT 2019), Konstanz, OpenProceedings.org, 437-439 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), Schemas and Types for JSON Data: From Theory to Practice, in Peter Boncz, Stefan Manegold, SIGMOD '19 Proceedings of the 2019 International Conference on Management of Data, New York, NY, ACM - Association for Computing Machinery, 2060-2063 p.

Vahidi Ferdousi Z., Colazzo D., Negre E. (2018), CBPF: Leveraging Context and Content Information for Better Recommendations, in Guojun Gan, Bohan Li, Xue Li, Shuliang Wang, Advanced Data Mining and Applications 14th International Conference, ADMA 2018, Springer, 381-391 p.

Vahidi Ferdousi Z., Colazzo D., Negre E. (2018), Correlation-Based Pre-Filtering for Context-Aware Recommendation, in George Roussos, Achilles Kameas, 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), IEEE - Institute of Electrical and Electronics Engineers

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2017), Counting types for massive JSON datasets, in , New York, NY, ACM - Association for Computing Machinery, 1-12 p.

Vahidi Ferdousi Z., Negre E., Colazzo D. (2017), Context factors in context-aware recommender systems, in , AISR 2017 : Atelier interdisciplinaire sur les systèmes de recommandation, Paris, Conservatoire national des arts et métiers

Camacho-Rodríguez J., Colazzo D., Herschel ., Manolescu I., Roy Chowdhury S. (2016), Reuse-based Optimization for Pig Latin, in Snehasis Mukhopadhyay, ChengXiang Zhai, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM'16), New York, Association Française de Marketing, 2215-2220 p.

Colazzo D., Sartiani C. (2015), Typing regular path query languages for data graphs, in James Cheney, Thomas Neumann, DBPL 2015 Proceedings of the 15th Symposium on Database Programming Languages, Association Française de Marketing, 69-78 p.

Colazzo D., Sartiani C. (2014), Typing query languages for data graphs, in Elisa Bertino, Goce Trajcevski, 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), IEEE - Institute of Electrical and Electronics Engineers, 28-31 p.

Colazzo D., Roatis A., Manolescu I., Goasdoué F. (2014), RDF Analytics: Lenses over Semantic Graphs, in Suel, Torsten, WWW '14 Proceedings of the 23rd international conference on World wide web, Séoul, ACM, 467-478 p.

Camacho-Rodríguez J., Colazzo D., Manolescu I. (2014), PAXQuery: A Massively Parallel XQuery Processor, in , DanaC'14 Proceedings of Workshop on Data analytics in the Cloud, Association Française de Marketing, 1-4 p.

Bidoit N., Colazzo D., Malla N., Ulliana F., Nolè M., Sartiani C. (2013), Processing XML queries and updates on map/reduce clusters, in Giovanna Guerrini, Norman W. Paton, EDBT '13 Proceedings of the 16th International Conference on Extending Database Technology, Association Française de Marketing, 745-748 p.

Communications sans actes

Attouche L., Baazizi M-A., Colazzo D., Ding Y., Fruth M., Ghelli G., Sartiani C., Scherzinger S. (2021), A Test Suite for JSON Schema Containment, ER Posters/Demo 2021 - 40th International Conference on Conceptual Modeling, St. John's, Canada

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2020), Not Elimination and Witness Generation for JSON Schema (short version), 36ème Conférence sur la Gestion de Données – Principes, Technologies et Applications, Paris, France

Camacho-Rodriguez J., Colazzo D., Herschel M., Manolescu I., Roy Chowdhury S. (2014), Reuse-based Optimization for Pig Latin, BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France, Grenoble-Autrans, France

Camacho-Rodriguez J., Colazzo D., Manolescu I. (2014), PAXQuery: Efficient Parallel Processing of Complex XQuery, BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France, Grenoble-Autrans, France

Colazzo D., Goasdoué F., Manolescu I., Roatis A. (2013), Warehousing RDF Graphs, BDA' 2013: 29e journées Bases de Données Avancées, Oct 2013, Nantes, France, Nantes, France

Vion A., Manolescu I., Nguyen B., Colazzo D., Senellart P., Dudouet F-X. (2009), The WebStand Project, WebSci'09 : Society On-Line Conference, Athènes, Grèce

Prépublications / Cahiers de recherche

Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2023), Validation of Modern JSON Schema: Formalization and Complexity, Paris, Preprint Lamsade, 1-49 p.

Rapports

Nguyen B., Dudouet F-X., Colazzo D., Manolescu I. (2008), A Source Centric Temporal Model, 6 p.

Back to the list