Title: Big Data Curation (Slides )
A new mode of inquiry, problem solving, and decision making has become pervasive in our society, consisting of applying computational, mathematical, and statistical models to infer actionable information from large quantities of data. This paradigm, often called Big Data Analytics or simply Big Data, requires new forms of data management to deal with the volume, variety, and velocity of Big Data. Many of these data management problems can be described as data curation. Data curation includes all the processes needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to data. In this talk, I describe our experience in curating some open data sets. I overview how we have adapted some of the traditional solutions for aligning data and creating semantics to account for (and take advantage of) Big Data.
Prof. Renée Miller received BS degrees in Mathematics and in Cognitive Science from the Massachusetts Institute of Technology. She received her MS and PhD degrees in Computer Science from the University of Wisconsin in Madison, WI. She is a Fellow of the Royal Society of Canada (Canada's National Academy) and the Bell Canada Chair of Information Systems at the University of Toronto. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE) , the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers and the National Science Foundation Career Award. She is a Fellow of the ACM, a former President of the VLDB Endowment, and was the Program Chair for ACM SIGMOD 2011 in Athens, Greece. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She was a co-recipient of the ICDT Test-of-Time Award for an influential 2003 paper establishing the foundations of data exchange.
Title: The Sublinear Approach to Big Data Problems (Slides )
We will discuss approaches to solving Big Data problems that use sublinear resources such as storage, communication, time, processors etc. We will also discuss potential models of computing that arise from this perspective. Finally, we will discuss new Big Data problems that arise from social network analysis, including ranking, scoring and others.
Muthu is a Professor in Rutgers Univ. and on leave. His research focus is on algorithms and databases. His recent research is on analyzing massive data streams and on Economics and optimization problems in online ad systems.
Title: Lessons Learned in Building Real-time Big Data Systems (Slides)
In the Age of the Customer, enterprises must modernize their application infrastructure to use real-time big data to attract,
engage and retain consumers across devices, media and channels. As
more companies realize the impact that processing massive amounts of
data in real-time will have on business, it has become a top
priority to make this change.
In his presentation, Srini will address the scale and scope of
issues that developers face when building out real-time big data systems:
Srini brings 20-plus years of experience in designing, developing and operating Web-scale infrastructures, and he holds over a dozen patents in database, Internet, mobile, and distributed system technologies. Srini co-founded Aerospike to solve the scaling problems he experienced with Oracle databases at Yahoo! where, as senior director of engineering, he had global responsibility for the development, deployment and 24X7 operations of Yahoo!'s mobile products, in use by tens of millions of users. Srini also was chief architect of IBM's DB2 Internet products, and he served as senior architect of digital TV products at Liberate Technologies. Srini has a B.Tech in Computer Science from IIT Madras and a M.S. and PhD in Databases from University of Wisconsin-Madison.