COMAD 2017

Keynote Talks

Day 1:

Challenges for an Analytical Ecosystem
Ramesh Bhashyam, Teradata Fellow and CTO Teradata India

Day 2 and 3 (Joint with CODS)

Lise Getoor (Univ. California, Santa Cruz)
Sriram Raghavan (Director, IBM Research India) (Industrial Keynote)
Srini Parthasarathy (Ohio State Univ.)

Challenges for an Analytical Ecosystem

Ramesh Bhashyam, Teradata Fellow, and CTO Teradata India

Bio:

Ramesh Bhashyam has been with Teradata Corporation for over 20 years. He interest areas include query optimization and parallel execution. He was voted as a Teradata Fellow in 2010. Prior to Teradata he worked for about 6 years at Inference Corporation - an AI company in Los Angeles. Ramesh has a bachelor's in electrical engineering and a master's in computer science. Ramesh has several patents to his credit.

Abstract
Data generation has evolved from transactional data to first behavioral data and then onto observational or sensor data. The first step in this evolution was from transactional to web log and machine generated data. Social media data increased with more devices such as smart phones that enable human interactions. Sensor data marks the next phase of this evolution.

It is not data volume alone that requires attention. It is the complexity of the data model. There are strongly coupled well-structured data such as transactional data in relational tables. There are loosely coupled semi structured data such as XML and JSON data. There are finally unstructured data from free text, voice, and social media data. All these have different complexities and different models for data or lack thereof.

Deriving value from such data require combining these different sources. Unlike in the past when analysis was restricted to highly-structured transactional data analytic value is derived from combining the different sources. Data is made more meaningful when context is added to the data items. For example knowing the repair history of a part makes it possible to derive meaning from the current sensor data. More contexts add more insight. We show some use cases for such analysis in our talk.

An integrated analytic ecosystem is needed to derive value from the different forms of data. The ecosystem can be categorized as: 1) Data ingest where all data is acquired and stored. The metric here is cost per TB storage. 2) Data platform where the right platform is used to store and manage the data. Data warehouses are a part of this category. There are different types of data and whether different forms of file system are appropriate or a polymorphic file system for storing all types are some consideration in this category. 3) Analytic where machine learning and other forms of deep analytic is applied on the data. Considerations such as standalone systems versus multi-genre analytics are a part of this category.

There are many challenges in building and managing such an analytic ecosystem. There are different aspects to programming for access to cross-platform data and cross-analytic engines. The options range from virtual data frames using procedural languages like R and Python to enhancements to SQL. Query optimization is no longer about optimizing for a single platform but across platforms with more dimensions that must be optimized. Support must be provided for complex analytics from UDF in SQL to stand alone deep learning open system sources like TensorFlow. Finally acceleration technologies such as GPU that speed up execution must find a place in the ecosystem for efficient execution. There are many other challenges. We will consider some of these in our talk.