Cloud Computing: Data-Intensive Computing and Scheduling (Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series)
Format: PDF / Kindle (mobi) / ePub
As more and more data is generated at a faster-than-ever rate, processing large volumes of data is becoming a challenge for data analysis software. Addressing performance issues, Cloud Computing: Data-Intensive Computing and Scheduling explores the evolution of classical techniques and describes completely new methods and innovative algorithms. The book delineates many concepts, models, methods, algorithms, and software used in cloud computing.
After a general introduction to the field, the text covers resource management, including scheduling algorithms for real-time tasks and practical algorithms for user bidding and auctioneer pricing. It next explains approaches to data analytical query processing, including pre-computing, data indexing, and data partitioning. Applications of MapReduce, a new parallel programming model, are then presented. The authors also discuss how to optimize multiple group-by query processing and introduce a MapReduce real-time scheduling algorithm.
A useful reference for studying and using MapReduce and cloud computing platforms, this book presents various technologies that demonstrate how cloud computing can meet business requirements and serve as the infrastructure of multidimensional data analysis applications.
priority of one task depends on its release rate. The higher the rate is, the higher the priority. Period Ti is the length of time between two successive instances, and computation time Ci is the time spent on task execution. Since the release rate is inverse to its period, Ti is usually the direct criterion to determine task priority. A schedulability test is to determine whether temporal constraints of tasks can be met at runtime. Exact tests are ideal but intractable, because the complexity of
resources in a dynamic and scalable manner over a network. According to the National Institute of Standards and Technology, the Àve essential characteristics of the cloud are the following: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured services. The applications of cloud computing are huge and impact nearly all sectors. For example, some automotive Àrms are convinced that the future lies in cloud computing: in some recent vehicles, the software
slightly different. Each Group-by query contained in a GROUPING SETS query could have more than one group-by dimension, i.e. one Group-by query aggregates over more than one dimension, whereas in this work, one Group-by query aggregates over only one dimension. In data exploration environment, processing Multiple Group-by query has several challenges. The Àrst challenge is large data volume. In a very common case, the historical dataset is often of large size. The generated materialized view is
the worker nodes, and worker nodes sending intermediate aggregate tables to the master node. Considering the size of transmitted data and the network status we estimate the communication cost as: nbm Ccmm = Cn · (nbm · γ · sizem + γ · sizeagg i ) i=1 where sizeagg i represents the size of the ith aggregate table produced by mappers. 7.5.1 Horizontal partitioning For the implementation over horizontal partitions, the mapper takes a horizontal partition as input data, searches in its Lucene
a useful reference for people who want to study and utilize MapReduce and cloud computing platforms. The various technologies presented in this book demonstrate the wide aspects of interest in cloud computing, and the many possibilities and venues that exist in the research in this area. This interest is only going to further evolve, and many exciting developments are still awaiting us. Fr´ed´eric Magoul`es Ecole Centrale Paris, France Jie Pan Klee Group, France Fei Teng Southwest Jiaotong