========================================================================

Impact of Cloud Computing on Emerging Software System Architecture and Solutions

Hamid Pirahesh, IBM Almaden Research Center

Abstract

Information technology is going through a fundamental change, influenced primarily by (1) Flexible provisioning and scalability of Cloud Computing, (2) Rise of analytics around semi-structured and unstructured data in the context of semantically rich data objects in the main stream data processing, (3) Much increased human interaction with the web due to the use of mobile devices, particularly for more critical financial transactions and purchasing services and goods, (4) Web Scale programming community with Web 2.0, search and open software, (5) Rise of SaaS (Software As A Service). Continuous arrival of huge amount of data from numerous sources requires continuous discovery of information. Unstructured and semi-structured data dominates this space. Web scale solutions require new approaches to integration and information composition, such as Web 2.0 mash-ups. Variability of incoming information requires semi-structured repositories with flexible schema and the associated query language. Cloud Computing is mainly driven by the commercial applications. However, high Performance scientific Computing can significantly benefit from cloud computing. There is a particular emphasis on breaking the complexity barrier of today's solutions through simplification. The lifetime cost of ownership of solutions is dominated by the human time spent in building, operating and evolving these solutions. Much increased compute power in cloud computing enables us to reduce this complexity by reducing the use of fragile and complex machine optimized programs in favor of simpler and more stable and scalable ones. Flexibility and much quicker provisioning of cloud computing combined with much reduced cost per terra byte/flop are key factors in much faster deployment of solutions.

About the Speaker

Hamid Pirahesh, Ph.D., is an IBM fellow and the manager of DataBase Technology Institute (DBTI) at IBM Almaden Research Center in San Jose, California. Pirahesh is an IBM master inventor, and is a member of IBM Academy. He also has direct responsibilities in various aspects of IBM information management products, including DB2, in the areas of architecture, design and development. He is a senior manager responsible for the exploratory database research department at IBM Almaden Research Center areas.

========================================================================

Scalable Text Processing with MapReduce

Jimmy Lin, University of Maryland

Abstract

Over the past couple of decades, text processing has seen the emergence and later dominance of empirical techniques and data-driven research. An impediment to progress today is the need for scalable algorithms to cope with the vast quantities of available data. The only practical solution to large-data challenges today is to distribute the computation across multiple machines. Cluster computing, however, is fraught with difficulties ranging from scheduling to synchronization. Recently, MapReduce has emerged as an attractive alternative to traditional programming models. It provides a simple functional abstraction that hides many system-level issues, allowing the researcher to focus on actually solving the problem. In this talk, I will overview "cloud computing" projects at the University of Maryland, illustrating how we've used MapReduce to tackle a range of problems in text processing.

About the Speaker

Jimmy Lin is an assistant professor in the iSchool at the University of Maryland. He is affiliated with the Laboratory for Computational Linguistics and Information Processing (CLIP) and the Human-Computer Interaction Laboratory (HCIL) in UMD's Institute for Advanced Computer Studies (UMIACS). Jimmy graduated with a Ph.D. in computer science from MIT in 2004, and his research lies at the intersection of information retrieval and natural language processing. He leads the Google/IBM Academic Cloud Computing Initiative at the University of Maryland.

========================================================================

Scalable Text Processing with MapReduce

Jim Rivera, Salesforce

Abstract

As we look to build upon the success of the Software-as-a-Service delivery model, custom application platforms delivered as a service has emerged as one of the hottest topics of the day. In this talk, we will discuss the concept of Platform-as-a-Service (PaaS), look at some of the different types of PaaS offerings available today, and explore Force.com, the PaaS offering from Salesforce.com.

About the Speaker

Jim Rivera is VP of Product Management for the Force.com platform at Salesforce.com. Jim is a frequent speaker on Software-as-a-Service, distributed computing and Service Oriented Architecture and has over 15 years of experience in development of software, distributed computing systems, and electrical engineering. Prior to joining Salesforce.com, Jim has held various product leadership positions for enterprise software companies including BEA Systems and Cape Clear Software.

========================================================================

Hive: Datawarehousing and Analytics on Hadoop

Joydeep Sen Sarma and Ashish Thusoo, Facebook

Abstract

FB uses Hadoop for a variety of tasks - classic log aggregation, graph mining, text analysis and indexing. The first half of this talk will cover Hive - a soon to be open-source data warehousing layer on top of Hadoop that allows sql like queries (along with streaming based scripting) on log and dimension tables stored as flat files and directories in Hadoop. Hive includes loading utilities to import data from various sources, support for object oriented data types (from RecordIO/Thrift), a shell to browse tables and issue queries, sampling etc. In the second half we will focus on a summary of our experience with Hadoop - feature requests, lessons learned and our plans to contribute some features back to the open source community.

About the Speakers

Joydeep is an engineer in the Facebook Data Infrastructure team. He has worked at Oracle, NetApp and Yahoo in the past. He is interested in building scalable systems for data storage and processing, advertising systems for the internet and data mining.

Ashish is an engineer in the Facebook Data Infrastructure team. In the past he has worked at Oracle in the Parallel Query group and then the XML DB group. He is interested in large scale distributed systems for analytics, data warehousing and data mining.

========================================================================

Take an internal look at Hadoop

Hairong Kuang, Yahoo

Abstract

The talk will present the architecture of Hadoop, an efficient, scalable, and reliable framework for storing and processing petabytes of data on large clusters built of commodity hardware. Hadoop implements a computational paradigm named Map/Reduce. It also provides a distributed file system, HDFS, that stores data on the computer nodes as blocks and with multiple replicas.

About the Speaker

Hairong Kuang received her Ph.D. from the Department of Information and Computer Science at the University of California, Irvine. She started her career as an Assistant Professor in the Department of Computer Science at the California Polytechnic University, Ponoma. Now she is a senior software engineer working in the grid computing team at Yahoo.

========================================================================

App Engine: Building a Scalable Web Application on Google's infrastructure

Mano Marks, Google

Abstract

Google App Engine enables you to build web applications on the same scalable systems that power Google applications. This talk will give an overview of Google App Engine, cover some best practices for taking full advantage of App Engine, and cover the new features recently added, the Memcache and Image APIs.

About the Speaker

Mano Marks is a Developer Advocate at Google. He works on helping developers use Google APIs and infrastructure to build their applications.

========================================================================

Jaql: Querying JSON data on Hadoop

Kevin Beyer, IBM Almaden Research

We introduce Jaql, a query language for the JSON data model. JavaScript Object Notation or JSON has become a popular data format for many Web-based applications because of its simplicity and modeling flexibility. In contrast to XML, which was originally designed as a markup language, JSON was actually designed for data. JSON makes it easy to model a wide spectrum of data, ranging from homogenous flat data to heterogeneous nested data, and it can do this in a language-independent format. We believe that these characteristics make JSON an ideal data format for many Hadoop applications and databases in general. This talk will describe the key features of Jaql and show how it can be used to process JSON data in parallel using Hadoop's map/reduce framework.

About the Speaker

Kevin Beyer is a Research Staff Member at the IBM Almaden Research Center. His research interests are in information management, including query languages, analytical processing, and indexing techniques. He has been designing and implementing Jaql, in one form or another, for the past several years. Previously, he led the design and implementation of the XML indexing support in DB2 pureXML.

========================================================================

DryadLINQ - a language for data-parallel computation on computer clusters

Mihai Budiu, Microsoft

Abstract

Dryad is a distributed execution platform for computer clusters. LINQ is an extension to .Net for declarative data-parallel programming. DryadLINQ is a compiler mapping LINQ programs to distributed computations running on the Dryad platform. It is designed to bring the "best-of-all worlds:" reliable distributed computation (using Dryad), a strongly-typed high-level language (LINQ) and a great development environment (Visual Studio).

About the Speaker

Mihai Budiu is a researcher at Microsoft Research in Silicon Valley. He has been working on high-level hardware synthesis, reconfigurable hardware, program security, and distributed computing.

========================================================================

Cloud Architectures – New way to design architectures by building it in the cloud

Jinesh Varia, Amazon

Abstract

In this session, Seattle-based Jinesh Varia, Evangelist for Amazon Web Services, will discuss the latest innovations and new technology trends like Utility computing (Paying by the hour, paying by the Gigabyte usage), Virtualization and Web Services in the Cloud and most importantly, discuss a new and emerging way to designing architectures.

In this session, we will learn about a unique way to build service oriented architectures in the cloud. Cloud Architectures are Service-Oriented Architectures designed for On-demand Infrastructure. Infrastructure come alive on trigger of user request or a job request, scale elastically up and down and dispose themselves automatically when the job is processed. We will learn how developers can take advantage of this new concept that can quickly scale up their infrastructure programmatically (using web services) and without any upfront heavy infrastructure investment. Often termed as Cloud Computing, we will see how these technologies are changing the way we do business today.

Amazon Web Services provides Amazon Elastic Compute Cloud (Amazon EC2) that allows requisition of machines on-demand using simple web service call and paying for computation by the hour. Amazon Simple Storage Service (Amazon S3) which is infinite storage in the cloud and Amazon SimpleDB which is the Database in the cloud and how these services can help local companies to scale-out and go live quickly. Also, we will see some exciting apps and some unique business models that are built on AWS that have become profitable businesses and others that are just simply cool to see.

About the Speaker

JINESH VARIA, EVANGELIST, AMAZON WEB SERVICES

As a Technology Evangelist at Amazon, Jinesh Varia helps developers take advantage of disruptive technologies that are going to change the way we think about computer applications, and the way businesses compete in the new web world. Jinesh has spoken at more than 50 conferences/User Groups. He is focused on furthering awareness of web services and often helps developers on 1:1 basis in implementing their own ideas using Amazon’s innovative services. Jinesh has over 9 years experience in XML and Web services and has worked with standards-based working groups in XBRL. Prior to joining Amazon as an evangelist, he held several positions in UBmatrix including Solutions Architect, Enterprise Team Lead and Software engineer, working on various financial services projects including Call Modernization Project at FDIC. He was also lead developer at Penn State Data Center, Institute of Regional Affairs. Jinesh’s publications have been published in ACM and IEEE. Jinesh is originally from India and holds a Master’s degree in Information Systems from Penn State University. He plays tennis and loves to trek.