COMAD 2016

Title:
How to Stop Under-Utilization and Love Multicores

Abstract:
Hardware trends oblige software to overcome three major challenges against systems scalability: (1) taking advantage of the implicit/vertical parallelism within a core that is enabled through the aggressive micro-architectural features, (2) exploiting the explicit/horizontal parallelism provided by multicores, and (3) achieving predictively efficient execution despite the variability in communication latencies among cores on multisocket multicores. In this tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them. The first part of the tutorial describes the instruction- and data-level parallelism opportunities in a core coming from the hardware and software side. In addition, it examines the sources of under-utilization in a modern processor and presents insights and hardware/software techniques to better exploit the microarchitectural resources of a processor by improving cache locality at the right level of the memory hierarchy. The second part focuses on the scalability bottlenecks of database applications at the level of multicore and multisocket multicore architectures. It first presents a systematic way of eliminating such bottlenecks in online transaction processing workloads, which is based on minimizing unbounded communication, and shows several techniques that minimize bottlenecks in major components of database management systems. Then, it demonstrates the data and work sharing opportunities for analytical workloads, and reviews advanced scheduling mechanisms that are aware of non-uniform memory accesses and alleviate bandwidth saturation.

Presenter Biography: Danica Porobic is a final year PhD student working under the supervision of Professor Anastasia Ailamaki in Data-Intensive Applications and Systems (DIAS) Laboratory at EPFL. Her research focuses on designing scalable transaction processing systems for non-uniform hardware platforms. She has graduated top of her class with MSc and BSc in Informatics from University of Novi Sad and has worked at Oracle Labs and Microsoft SQL Server.

Title:
Towards Querying Uncertain Graphs

Abstract:
Large-scale, highly-interconnected networks pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database research community. The challenges in uncertain graph processing are both semantics and computation driven. From the perspective of the semantics, there is no uniform model of uncertain graphs; rather assignment and interpretation of the probabilities are application specific. For example, how can we define the shortest path between two nodes in an uncertain graph? From the computation perspective, while many classical graph algorithms such as reachability and shortest path queries become #P-complete, and hence, more expensive in uncertain graphs; various complex queries are also emerging over uncertain networks, e.g., pattern matching, information diffusion, and influence maximization queries. In this tutorial, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical and emerging graph queries. We emphasize the current challenges and highlight some future research directions.

Presenter Biography: Arijit Khan is an assistant professor in the School of Computer Engineering at Nanyang Technological University. His research interests span in the area of big-data, big-graphs, and graph systems. He received his PhD from the Department of Computer Science, University of California, Santa Barbara, and did a post-doc in the Systems group at ETH Zurich. Arijit is the recipient of the prestigious IBM PhD Fellowship in 2012-13. He published several papers in premier database and datamining conferences and journals including SIGMOD, VLDB, TKDE, ICDE, SDM, EDBT, and CIKM. Arijit copresented tutorials on emerging graph queries, big-graph systems, and uncertain graphs at ICDE 2012, VLDB 2014, VLDB 2015, and served in the program committee of KDD, SIGMOD, ICDM, EDBT, WWW, and CIKM. Arijit served as the co-chair of Big-O(Q) workshop co-located with VLDB 2015.

Title:
Tools, Techniques and Case Studies for Text Analytics for Real World Problems

Abstract:
In this tutorial, we will present the commonly used techniques for Entity Extraction, Entity Resolution & Text Mining to handle real world problems. The tutorial will focus on few case studies based on social data. We will present the unique challenges which were encountered during client engagements and how we extended the traditional techniques to handle the nuances. We will also cover some of the state-of-the-art tools towards this purpose Finally, we will conclude the tutorial with some open ended problems and in-progress work. The brief tentative outline is -
Introduction to Big Text Data
Key Steps in Extracting Insights from Text Data
Motivating examples to show nuances and finer points in Sentiment Mining
Introduction to Entity Extraction and Integration Languages
Case Study - Monitoring Social Channels for Elections
Joint Analysis of Text and Links - Followers, Friends etc

Open/WIP Research Problems: Identifying News Broadcasters & Activity Miner

	Speaker Biographies: Sameep Mehta is a Senior Researcher and Manager at IBM Research - India. He received his Ph.D. in Data Mining and Visualization from Ohio State University. His current research interests are Data Mining, Text Mining, Machine Learning, Big Data Technologies, Social Data Analytics and Knowledge Graph. He has published extensively in top conferences in Data Mining, Services and Visualization. He is a regular speaker at conferences and is PC chair for Big Data Analytics Conference 2014. He also serves as Adjunct Faculty at IIIT-Delhi in the area of Data Analytics.

	Dr. Dhaval Patel received PhD from National University of Singapore in 2011. His PhD work focus on analyzing complex medical data for building efficient classifier. After 2011, Dr. Patel worked two year as post-doctoral research fellow in the area of spatio-temporal data analysis. Dr. Patel joined IIT-Roorkee in 2013. His current research interest lies in analyzing real world dataset (mostly text and location-aware) for building useful applications. He continues to publish papers in top-tier conferences (IJCAI-2015, CIKM-2015, DASFAA-2015, DAWAk-2015).