Topic: Next Generation Web and Many Core Computing

Abstract: Computing platforms and the World Wide Web are undergoing significant architectural transitions, which have the potential to enable a new class of user capabilities and experiences. This talk explores the growing need for a general-purpose processing platform that can model events, objects and concepts based on user inputs to build a representative model, which is capable of being iteratively refined in real-time. We find immense processing needs at the heart of new set of emerging applications and services. This set of applications includes diverse market segments such as graphics, gaming, media-mining, unstructured information management, financial analytics, and interactive virtual communities and they present a focused set of common platform challenges going forward.

Speakers' Biography

Dr. Kathy Yelick, Professor, UC Berkeley

Katherine Yelick is a Professor in the EECS Department at the University of California at Berkeley and Head of the Future Technologies Group at Lawrence Berkeley National Laboratory. Her research in high performance computing addresses parallel programming languages, compiler analyses for explicitly parallel code, and optimization techniques for communication and memory system architectures. Her parallel language and compiler projects include Split-C, UPC, and Titanium. She also led the compiler effort for the Berkeley IRAM project, a single chip system that combines vector processing computing in a low-power Processor-in-Memory chip, and the Sparsity code generation system for automatic tuning of sparse matrix kernels. She currently leads the Berkeley Institute for Performance Studies, which does performance benchmarking, analysis and modeling, and the Berkeley UPC team, which has built a widely use open compiler for UPC. She co-leads the Titanium language project and the BeBOP (Berkeley Benchmarking and Optimization) group, which is focused on developing autotuning technology for modern machines.

Professor Yelick received her Bachelor's, Master's, and Ph.D. degrees from the Massachusetts Institute of Technology, where she worked on parallel programming methods and automatic theorem proving. She won MIT's Award for an outstanding Ph.D. dissertation, the ARO Young Investigator award, the Okawa Foundation award, multiple teaching awards from the EECS Departments at both MIT and Berkeley, and was cited in the 2006 HPCWire's "People to Watch.

Topic: The Berkeley View: Applications-Driven Research in Parallel Programming Models and Architectures Abstract: The sequential processor era is now officially over, as the IT industry has bet its future on multiple processors per chip. The new trend is doubling the number of cores per chip every two years instead the regular doubling of uniprocessor performance. This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs in novel software and architectures for parallelism; instead, this plunge into parallelism is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional uniprocessor architectures.

A diverse group of researchers from Berkeley met for two years to discuss parallelism from many angles: circuit design, computer architecture, massively parallel computing, computer-aided design, embedded hardware and software, programming languages, compilers, scientific programming, and numerical analysis. (See view.eecs.berkeley.edu for a technical report.) The overarching conclusion was the need for better programming methods that make it easy to write programs that execute efficiently on highly parallel computing systems. In this talk I will summarize the main recommendations of this report, which include a focus on chip with 1000s of cores per chip, a shift away from benchmarks towards a deeper understanding of communication and computation patterns captured in a list called the "dwarfs," and an alternative to traditional compilers based on autotuning software. I will also describe some specific work on programming models for parallel machines, autotuners for a variety of algorithmic domains, and architectural evaluation on existing multicore platforms.

Dr. Lawrence Meadows, Principal Engineer, Intel Corporation

Larry Meadows is a Principal Engineer at Intel, working on Cluster OpenMP, a system to run OpenMP programs on shared-nothing clusters. Larry began his career at Floating Point Systems in Beaverton, Oregon. In 1989, he co-founded the Portland Group (PGI), a leading compiler vendor. After leaving PGI in 1999, he worked at Sun for five years on compilers and tools before moving to Intel in 2004.

Topic: Parallel programming: It's not just for servers anymore Abstract: Programming for parallel shared-memory architectures is not a new problem. The challenge that multicore brings is its pervasiveness: everyone will have multiple processors on their desk or lap. How do hardware vendors encourage applications that exploit this ubiquitous hardware? Is there any hope? This talk will touch on parallelism in the client space and ways to encourage that, with some observations on current practice, enabling tools, and some direction for the future.

Kevin Haas, Reserach Manager, IBM Almaden Research

Kevin Haas is a manager in IBM's Almaden Services Reseach team. He earned his M.S. in Computer Science from Stanford University in 1996 with an emphasis on semistructured object-oriented database systems. His interests include enterprise solutions, unstructured analytics, and database systems. Before joining IBM Research, Kevin was a I/T architect for IGS and responsible for successful implementations of complex enterprise applications.

Dr. Daniel Gruhl, Researcher, IBM Almaden Research

Dr. Daniel Gruhl is a researcher at the IBM Almaden Research Center. He earned his Ph.D. in electrical engineering from the Massachusetts Institute of Technology in 2000 with thesis work on distributed text analytics systems. His interests include stenography (visual, audio, text and database), machine understanding, user modeling and very large scale text analytics.

Gruhl is the chief architect for WebFountain, with responsibility for overall hardware, software and systems design. He is also co-architect of IBM's Unstructured Information Management Architecture.

Topic: Impact of Multicore Technologies on Business Solution Abstract: The adoption of multi-core processors have had a range of effects on the class of "big data" computational business solutions. In some cases they transform a problem from infeasible to possible, and in others they eliminate substantial costs from the computing environment. In this talk we will illustrate some technical challenges where we have found use of multi-core processors to be a win, and some where it has not mattered at all. Additionally we will talk about where the big impact on solutions has been for multicore - and that impact usually has less to do with complex problem solving and more to do with net resource reduction.

Dr. John Owens, Assistant Professor, UC Davis

John Owens is Assistant Professor of Electrical and Computer Engineering at the University of California, Davis, where he leads research projects in graphics hardware/software and GPGPU. Prior to his appointment at Davis, John earned his Ph.D. (2002) and M.S. (1997) in electrical engineering from Stanford University. At Stanford he was an architect of the Imagine Stream Processor and a member of the Concurrent VLSI Architecture Group and the Computer Graphics Laboratory. John earned his B.S. in electrical engineering and computer sciences from the University of California, Berkeley, in 1995.

Topic: GPU Computing: Past, Present, and Future Abstract: Over the last five years, commodity graphics processors (GPUs) have evolved from fixed-function graphics units into powerful, programmable data-parallel processors. These streaming processors are capable of sustaining computation rates substantially greater than today's modern CPUs, with technology trends indicating a widening gap in the future. Researchers in the rapidly evolving field of general-purpose computation on graphics processors (GPGPU) have demonstrated mappings to these processors for a wide range of computationally intensive tasks.

In this talk I will begin by discussing the motivation and background for GPU computing and describe some of the recent advances in the field. The field of GPU computing has substantially changed over its short lifetime due to new applications, techniques, programming models, and hardware. As parallel computing has decidedly moved into the mainstream, the lessons of GPU computing are applicable to both today's systems as well as to the designers of tomorrow's systems.

Dr. Lurng-Kuo Liu, Solutions Architect, IBM T.J. Watson Research

Lurng-Kuo Liu is a Solutions Architect and RSM at IBM T.J. Watson Research Center. He is currently leading several emerging solutions development projects as part of IBM's strategy directions for Cell Broadband Engine (Cell/B.E.) processor. Prior to his current position, he was a Program Manager for the Blue Gene (BG/L) System at IBM's Explorer Server Systems department, where he has lead to the success of BG/L and ranked as No. 1 in the top 500 supercomputer list. He has worked on a broad range of projects such as video codec processors, media signal processor, broadband e-commerce, interactive TV, Set-Top Box, MP3 audio, video compression (MPEG-2, MPEG-4, H.263, etc.), immersion computer game systems, vision-enhanced human computer user interface (HCI) system, and high performance computing (HPC) system. His research interests include digital signal processing, multimedia, computer vision, interactive games, broadband e-business, mobile computing, financial modeling, and HPC. Dr. Liu received a Ph.D. in Electrical Engineering at University of Maryland at College Park in 1993.

Topic: Cell Broadband Engine Processor: From Game Console to HPC Serve Abstract: Traditionally, increasing clock frequency is the main dimension for conventional processors to achieve higher performance gains. This technique has reached a point of diminishing returns. Multicore processors, also known as Chip multiprocessors (CMPs), promise a dramatic increase in performance and become more prevalent in vendors' solutions. The Cell Broadband Engine? (Cell/B.E.) processor, served as the core of Sony PS3, is a state-of-the-art multicore processor. It is the result of collaboration between Sony, Toshiba, and IBM known as STI. The Cell/B.E. is an innovative and powerful microprocessor architecture based on Power Architecture? technology. It has the potential to accelerate applications in numerically intense, graphical, and streaming applications, among others, in a variety of industries. In this talk, I will present an overview of the Cell/B.E processor including the history of the Cell/B.E., the hardware architecture, and the software development environment. I will then discuss the Cell/B.E. blade server offering and its application affinity.

Dr. Christos Kozyrakis, Assistant Professor, Stanford University

Christos Kozyrakis is an Assistant Professor of Electrical Engineering and Computer Science at Stanford University. He holds a B.S. degree from the University of Crete in Greece and a Ph.D. from the University of California at Berkeley. Christos' works on architectures, runtime environments, and programming models for parallel computer systems. His current work focuses on transactional memory, architectural support for security, and power management techniques. For further information: https://csl.stanford.edu/~christos

Topic: Practical Parallel Programming with Transactional Memory Abstract: As multi-core chips become ubiquitous, it is critical to make parallel programming practical for the average software developer. Transactional Memory (TM) is emerging as a promising technology to address this challenge. The key idea behind TM is to provide programmers with a high level abstraction for parallelism (transactions) and transfer the responsibility of low-level concurrency control to the system. Hence, the programmer simply identifies tasks in her program that are likely to be independent and should execute as atomic transactions. The system is responsible for detailed scheduling and contention management.

This talk will review transactional memory technology. Apart from parallel programming with transactions, we will discuss how TM facilitates a range of tuning and debugging tools that further simplify parallel application development. We will describe the TM implementation space and the role that hardware and software can play in such a system. Finally, we will discuss future directions in parallel computing research that follow in the spirit of TM: provide the user with easy-to-use, high-level abstractions and offload the detailed management of parallelism to the system infrastructure.

Dr. Pradeep K. Dubey, Senior Principal Engineer, Intel Corporation

Pradeep Dubey is a senior principal engineer and manager of Innovative Platform Architecture (IPA) in the Microprocessor Technology Lab, part of the Corporate Technology Group. His research focus is computer architectures to efficiently handle new application paradigms for the future computing environment. Dubey previously worked at IBM's T.J. Watson Research Center, and Broadcom Corporation. He was one of the principal architects of the AltiVec* multimedia extension to Power PC* architecture. He also worked on the design, architecture, and performance issues of various microprocessors, including Intel? i386TM, i486TM, and Pentium? processors. He holds 26 patents and has published extensively. Dr. Dubey received a BS in electronics and communication engineering from Birla Institute of Technology, India, an MSEE from the University of Massachusetts at Amherst, and a PhD in electrical engineering from Purdue University. He is a Fellow of IEEE.