Tutorials

Tutorial 1: Mining smartphone and mobility data

Spiros Papadimitriou, Rutgers University
Tina Eliassi-Rad, Northeastern University
Katharina Morik, TU Dortmund
Dimitrios Gunopulos, University of Athens

Tuesday 13th, 11:00h-16:10h - Room A1

The availability of fast mobile data networks has spurred demand for more capable mobile computing devices. Conversely, the emergence of new devices has increased demand for better networks, creating an innovation cycle. Although smartphones (always-connected computing devices with multiple sensors) are less than a decade old, they will soon outnumber “traditional” computers, enabling data collection and analysis across a broad range of applications. We survey the state-of-the-art in mining mobility data across different application areas, in three parts. (1) We summarize the possibilities and challenges in data collection from various sensing modalities. (2) We outline cross-cutting algorithms for mobile mining, such as context-aware analytics, and resource-constrained models. (3) We focus on how these can be usefully applied to broad classes of applications, including app usage mining, mobile advertising and search. We conclude by showcasing the opportunities for new data collection techniques and mining methods, to meet the challenges and applications that are unique to the mobile arena.

URL: http://mobilemining.clusterhack.net

Spiros Papadimitriou is mainly interested in data mining for graphs and streaming data, clustering, time series, large-scale data processing, and mobile applications. His interests span from the very small (embedded devices, and sensors; Arduino) to the very large (large-scale data processing and analysis; Hadoop). He has published more than forty papers on these topics in refereed conferences and journals. He received the best paper award in SDM 2008, has three invited journal publications in best paper issues, several book chapters and he has filed multiple patents. He has also been invited to give keynote talks and tutorials on various topics, including graph and social network analysis, time series, stream mining, and large-scale analytics. In the past, he has also developed and released a number of Android applications that have over 50,000 downloads. He is currently an associate professor at Rutgers University (MSIS-RBS). Prior to that, he was a research scientist at Google, and a research staff member at IBM Research. He was a Siebel scholarship recipient in 2005. He obtained his MSc and PhD degrees from Carnegie Mellon University.

 

Tina Eliassi-Rad is an Associate Professor of Computer Science at Northeastern University in Boston, MA. She is also on the faculty of Northeastern’s Network Science Institute. Prior to joining Northeastern, Tina was an Associate Professor of Computer Science at Rutgers University; and before that she was a Member of Technical Staff and Principal Investigator at Lawrence Livermore National Laboratory. Tina earned her Ph.D. in Computer Sciences (with a minor in Mathematical Statistics) at the University of Wisconsin-Madison. Her research is rooted in data mining and machine learning; and spans theory, algorithms, and applications of massive data from networked representations of physical and social phenomena. Tina’s work has been applied to personalized search on the World-Wide Web, statistical indices of large-scale scientific simulation data, fraud detection, mobile ad targeting, and cyber situational awareness. Her algorithms have been incorporated into systems used by the government and industry (e.g., IBM System G Graph Analytics) as well as open-source software (e.g., Stanford Network Analysis Project). In 2010, she received an Outstanding Mentor Award from the Office of Science at the US Department of Energy. For more details, visit http://eliassi.org.

 

Katharina Morik is full professor for computer science at the TU Dortmund University, Germany. She earned her Ph.D. (1981) at the University of Hamburg and her habilitation (1988) at the TU Berlin. Starting with natural language processing, her interest moved to machine learning ranging from inductive logic programming to statistical learning, then to the analysis of very large data collections, high-dimensional data, and resource awareness. Her aim to share scientific results supports strongly as well open source products as students contributing to them. For instance, RapidMiner started out at her lab which continues to contribute to it. Since 2011 she is leading the collaborative research center SFB876 on resource-aware data analysis, an interdisciplinary center comprising 12 projects, 19 professors, and about 50 Ph D students or Postdocs. She was in the first Steering Committee of the IEEE International Conference on Data Mining and chairing the program of this conference in 2004.. She was the program chair of the European Conference on Machine Learning (ECML) in 1989 and one of the program chairs of ECML PKDD 2008. She is in the editorial boards of the international journals “Knowledge and Information Systems” and “Data Mining and Knowledge Discovery”.

 

Dimitrios Gunopulos got his PhD from Princeton University in 1995. He was a Postoctoral Fellow at the Max-Planck-Institut for Informatics, Research Associate at the IBM Almaden Research Center, Visiting Researcher at the University of Helsinki, Assistant, Associate, and Full Professor at the Department of Computer Science and Engineering in the University of California Riverside, and Visiting Researcher in Microsoft Research, Silicon Valley. His research is in the areas of Data Mining, Data Management, Databases, Sensor and Peer-to-Peer systems, and Algorithms. He has co-authored over a hundred journal and conference papers that have been widely cited (h-index 62) and a book. He has 12 Ph.D. students that have joined industry labs or have taken academic positions. His research has been supported by NSF, the DoD, the European Commission, the General Secretariat of Research and Technology, AT&T, Yahoo, and Nokia. He has served as a General co-Chair in SIAM SDM 2017, HDMS 2011 and IEEE ICDM 2010, and as a PC co-Chair in ECML/PKDD 2011, IEEE ICDM 2008, ACM SIGKDD 2006, SSDBM 2003 and DMKD 2000.

Tutorial 2: Network Representation Learning: A Revisit in Big Data Era

Peng Cui, Tsinghua University
Wenwu Zhu, Tsinghua University

Tuesday 13th, 16:30h-18:10h - Room A1

Nowadays, more and more applications are based on larger and larger networks. It is well recognized that network data is sophisticated and challenging. To process graph data effectively, the first critical challenge is network data representation, that is, how to represent networks properly so that advanced analytic tasks, such as pattern discovery, analysis, and prediction, can be conducted efficiently in both time and space. In this tutorial, we will present the recent thoughts and achievements on network representation. More specifically, the fundamental problems in network representation learning, including why we need to revisit network representation, what are the research goals of network representation, and how network representations can be learned, will be discussed.

Peng Cui is an Assistant Professor in Tsinghua University. He got his PhD degree from Tsinghua University in 2010. His research interests include network representation learning, social dynamics modeling and human behavioral modeling. He has published more than 60 papers in prestigious conferences and journals in data mining and multimedia. His recent research won the ICDM 2015 Best Student Paper Award, SIGKDD 2014 Best Paper Finalist, IEEE ICME 2014 Best Paper Award, ACM MM12 Grand Challenge Multimodal Award, and MMM13 Best Paper Award. He is the Area Chair of ICDM 2016, ACM MM 2014-2015, IEEE ICME 2014-2015, ICASSP 2013, Associate Editor of ACM TOMM, Elsevier Journal on Neurocomputing. He was the recipient of ACM China Rising Star Award in 2015.

 

Wenwu Zhu is with Computer Science Department of Tsinghua University as Professor of “1000 People Plan” of China. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and the Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. Wenwu Zhu is an IEEE Fellow, SPIE Fellow and ACM Distinguished Scientist. He has published over 200 referred papers in the areas of multimedia computing, communications and networking. He is inventor or co-inventor of over 40 patents. His current research interests are in the area of social media computing and multimedia communications and networking. He served(s) on various editorial boards, such as Guest Editor for the Proceedings of the IEEE, IEEE T-CSVT, and IEEE JSAC; Associate Editor for IEEE Transactions on Mobile Computing, IEEE Transactions on Multimedia, and IEEE Transactions on Circuits and Systems for Video Technology. He served as TPC Co-Chair of IEEE ISCAS 2013 and serves as TPC Co-Chair for ACM Multimedia 2014.

Tutorial 3: The Evolution of Natural Language Understanding and Prediction Technologies

Nicolae Duta, Microsoft New England Research and Development Center

Wednesday 14th, 11:00h-16:10h - Room A1

Scientists have long dreamed of creating machines humans could interact with by voice. Although one no longer believes Turing’s prophecy that machines will be able to converse like humans in the near future, real progress has been made in the voice and text-based human-machine interaction. After five decades of research, natural language understanding and prediction technology has become an essential part of many human-machine interaction systems (and even human-to-human: automated translation and speech-to-speech systems). There are now voice-based personal assistants, search and transactional systems for most smart phone platforms. The technology is pushed even further by the search engines which have evolved from simple keyword search to semantic search (they can now provide direct answers to a wide range of questions).

This tutorial is aimed at providing the machine learning and data mining community an overview of the deployed natural language technologies and their historical evolution. We review two fundamental problems involving natural language: the language prediction problem and the language understanding problem. The presentation focuses on the theory and algorithms used to build voiced/text-based human-computer interaction systems from the early automated directory assistance to today’s smart-phone virtual assistants and semantic web search.

Nicolae Duta received the B.S. degree in applied mathematics from the University of Bucharest (Romania) in 1991, the D.E.A. degree in statistics from the University of Paris-Sud (France) in 1992, the M.S. degree in computer science from the University of Iowa in 1996 and the Ph.D. degree in computer science and engineering from Michigan State University in 2000. He is currently a senior scientist in the Advanced Data Science Group at Microsoft in Cambridge, MA working on machine learning technologies applied to speech, language and vision systems. Previously he was part of the  Applications & Services Group at Microsoft working on query understanding for Bing and Cortana personal assistant. From 2006 to 2013 he was a member of the Natural Language Understanding and Language Modeling groups at Nuance Communications, Burlington, MA where he developed Dragon Go – the first generation of voice-based personal assistants for smart phones. From 2000 to 2005 he was a scientist in the Speech and Language Processing department at BBN Technologies, Cambridge, MA. He also held temporary research positions at INRIA-Rocquecourt (France) in 1993 and Siemens Corporate Research (Princeton, NJ) from 1997 to 1999. He is a member of IEEE and his current research interests include computer vision, pattern recognition, language understanding, automatic translation, machine and biological learning.

Tutorial 4: Core Decomposition of Networks: concepts, algorithms and applications

Fragkiskos D. Malliaros, University of California, San Diego
Apostolos N. Papadopoulos, Aristotle University of Thessaloniki, Greece
Michalis Vazirgiannis, Ecole Polytechnique, France

Thursday 15th, 11:00h-16:10h - Room A1

Graph mining is an important research area with a plethora of practical applications. Core decomposition in networks, is a fundamental operation strongly related to more complex mining tasks such as community detection, dense subgraph discovery, identification of influential nodes, network visualization, text mining, just to name a few. In this tutorial, we present in detail the basic concepts and properties related to core decomposition in graphs, the associated algorithms for its efficient computation and some of the most important applications that benefit from it.

URL: http://fragkiskos.me/projects/core_tutorial/

Fragkiskos D. Malliaros is currently a data science postdoctoral scholar in the Department of Computer Science and Engineering at UC San Diego. Right before that, he was a postdoctoral researcher in Ecole Polytechnique, France from where he also received his Ph.D. degree in 2015. He obtained his Diploma and his M.Sc. degree from the University of Patras, Greece in 2009 and 2011 respectively. He is the recipient of the 2012 Google European Doctoral Fellowship in Graph Mining and the 2015 Thesis Prize by Ecole Polytechnique. During the summer of 2014, he was a research intern at the Palo Alto Research Center (PARC), working on anomaly detection in social networks. His research interests span the broad areas of data mining, algorithmic data analysis and data management, with focus on mining and analysis of large, time-evolving graphs.

 

Apostolos N. Papadopoulos received his 5-year Diploma degree in Computer Science and Engineering from the University of Patras and his Ph.D. degree from Aristotle University of Thessaloniki in 1994 and 2000 respectively. His research interests include databases, data mining and big data analytics. In 2008, the paper entitled “SkyGraph: An Algorithm for Important Subgraph Discovery, received the award for the best Knowledge Discovery paper in ECML/PKDD 2008. Moreover, the paper “Metric-Based Top-k Dominating Queries” that was presented in EDBT 2014, has been selected as the best paper and an extended version appears in ACM Transactions on Database Systems. Currently, he is an Associate Professor at the Department of Informatics of Aristotle University of Thessaloniki and a member of the Data Science and Engineering Lab.

 

Michalis Vazirgiannis is a Professor in Ecole Polytechnique, France and the leader of the Data Science and Mining (DaSciM) team. He holds a degree in Physics, a M.Sc. in Robotics, both from University of Athens, Greece, and a M.Sc. in Knowledge Based Systems from Heriot-Watt University (Edinburgh, UK). He acquired his Ph.D. degree from the Dept. of Informatics, University of Athens. He has worked as a researcher in different places: NTUA, GMD-IPSI (currently Frauhofer-IPSI), Germany Fern-Universitaet Hagen, in project VERSO (later GEMO) in INRIA/Paris, in IBM India Research Laboratory and in MPI fur Informatik (Saarbruecken, Germany). He held a Marie Curie Intra-European fellow in area of P2P Web Search, hosted by INRIA FUTURS, Paris. His research interests are on graph mining, text mining and recommendations algorithms. He is chairing the “AXA Data Science” chair in Ecole Polytechnique and has collaborations with the industry including Google and Airbus.

ICDM 2016