Csaba István Sidló, Ph.D.
I am a researcher and software engineer with skills and (more than 10 years) experience related to large scale data management and analytics. My main interests are “big data” and “data science”, with applications requiring scalable and elastic data management - e.g. processing and analyzing IoT, cellphone or web data. I have an informatics PhD. I am keen to learn, use and develop new technologies, work in motivated teams, always broadening my skills and experiences.
Education
-
Eötvös Loránd University, Budapest
-
2003: MSc in Computer Science
-
-
2012: PhD Informatics, Information Systems program, thesis: Business Intelligence on Scalable Architectures
-
2002: Friedrich Schiller Universität, Jena, Germany
Research Interests and Selected Publications
-
Business intelligence on scalable architectures: data integration and cleaning
-
Entity resolution or deduplication is my primary research area – the task of identifying and merging database records of the same real world entities is computationally hard, but required in lots of analytics scenarios.
-
Csaba István Sidló, András Garzó, András Molnár, András A. Benczúr: Infrastructures and Bounds for Distributed Entity Resolution. In Proceedings of the 9th International Workshop on Quality in Databases (QDB) in conjunction with VLDB 2011, 2011.
-
-
Large scale real-time analytics
-
Analyzing big data sets – e.g. IoT or mobile phone data – with low latency requires data streaming approaches, where achieving scalability and elasticity is still challenging.
-
Garzó A, Benczúr A A, Sidló Cs. I. , Tahara D, Wyatt E. F.: Real-time streaming mobility analytics, In: Xiaohua Hu (editor) IEEE International Conference on Big Data., IEEE, 2013. pp. 697-702., ISBN:978-1-4799-1292-6, 2013
-
-
IoT data processing, crowd-sensing
-
Viharos Zs J, Sidló Cs I, Benczúr A A, Csempesz J, Kis K B, Petrás I, Garzó A: ”Big Data” Initiative as an IT Solution for Improved Operation and Maintenance of Wind Turbines, In: European Wind Energy Association (EWEA), Bécs: pp. 184-188., 2013
-
K. Farkas, G. Fehér, A. Benczúr, Cs. Sidló: Crowdsensing Based Public Transport Information Service in Smart Cities, IEEE COMMUNICATIONS MAGAZINE 53:(8) pp. 158-165., 2015
-
Selected Projects
- Extensive comparison of (IoT) data ingestion solutions, 2018
- Collecting and loading large amounts of data - produced by e.g. IoT devices - to analytical data processing platforms is a challenging task, where capabilities of traditional business intelligence ETL tools may easily become a bottleneck. We comprehensively reviewed, tested and ranked the most promising solutions to the distributed “data ingestion” big data problem: NiFi, Kafka Connect, Gobblin, Spring Integration, Streamsets, Flume and Camel / ServiceMix.
- Mobile session drop prediction models, 2017-2018
- We were building ML models predicting mobile call and network session drop on cell phones. Activities included maintaining Android data collection and test applications, managing and analyzing the data gathered data and building prediction models.
- Background data services for an IoT data market, 2016-2017
- Building a scalable background service to support storing, sharing and transforming large scale IoT data (e.g. GPS locations or home automation), applying distributed and cloud-enabled technologies like Kafka, Spark, Couchbase, with intensive testing and profiling as main priorities.
- Integration and cleaning insurance client data, 2015-2016
- Providing an integrated view of client data of a Hungarian insurance company, to start a new customer master database and service. Task required scalable methods for deduplication, data transformation and other ETL tasks.
- Network data visualization and search, 2009-
- Graph (network) data is hard to search and analyze. We develop and provide client and server side tools and methods to efficiently search, analyse and visualize network data – e.g. insurance data for fraud detection.
- Data warehousing IT audit logs and webserver logs, 2009-2011
- Collecting, processing, storing and analyzing information-access events, designing and implementing a data warehouses for an insurer and a telecom company.
Grants and Awards
- 2003: CEEPUS scholarship, Plovdiv, Bulgaria
- 2004: scientific team grant, ELTE, Faculty of Informatics for beeing a co-author of the book „Algorithms of Informatics”
- 2005: ETIK grant (Inter-University Centre for Telecommunications and Informatics)
- 2008, 2011: MTA SZTAKI grants