Big Data in the Cloud explain what is the relationship between these two new information technologies which is necessary to approach to the extensive network. As already mentioned in other articles, often the name of the cloud is nothing more than a group of virtualized servers : a computing resource that presents itself as a normal server. This is generally called Infrastructure as a Service (IaaS) and is offered by platforms such as Rackspace Cloud or Amazon EC2. You can buy resources for these services and you install and configure your own software, such as Hadoop or NoSQL database. Most of the solutions described below can later be distributed over IaaS services.
Big Data in the Cloud : Basics
You can use an orchestrated framework, which can deal with the management of the resources and infrastructure automation tools that manage themselves – the server installation and configuration. For example, RightScale provides a platform for multi-cloud management that mitigates some of the problems of managing servers in the cloud. Frameworks such as OpenStack and Eucalyptus help in the presentation of a uniform interface for both private and public data center cloud. The current challenge is to make easier to use private cloud and IaaS services in the next two years the use of a cloud of resources should in fact become more immediate with the acceptance by the companies with new standards. There will be a uniform interface, which uses a private or public cloud, or a mixture of both.
Especially for big data involving several configuration tools already explicitly involving Hadoop and Dell, which has the goal of making the distribution and cluster configuration easier; and Apache Whirr, which is specialized in the performance of services and other processing systems via data cluster. If we want services there is a wide choice of IaaS cloud providers or if we can choose to serve a private cloud, getting a complete control on the infrastructure is possible with this growth.
In the latter case we will have the responsibility to deploy, manage and maintain our own cluster. The use of infrastructure as a service allows us to perform big data applications: managing the creation of storage and computing resources, but it do not address issues of the highest level, so the configuration of Hadoop and Hive or similar solutions, it is up to us. In addition to several IaaS cloud services provide support at the application level for work with big data. Sometimes they are called managed solutions or more commonly Platform as a Service (PaaS), these services eliminate the need to configure or scale database or frameworks, reducing your workload and commitment to the maintenance and control.
Furthermore, providers of PaaS can ensure the hosting at application level and save a lot of money of their customers. We will take one example.
Big Data in the Cloud : Google and Prediction API
Google’s cloud platform stands out from other competitors, because instead of offering virtualization, they provides a container for application with the appropriate APIs and services defined by them. Developers do not need to worry about hardware; applications running in the cloud, have the access to the power of exact processing that require, within certain limitations.
To use the Google platform is the limit of having to work within the constraints of its API. The application service of Google AppEngine offers a tool for parallel computation on the data, but this is more useful when used as part of complex applications, which perform analytics on data. Instead, BigQuery and Prediction API form the core offers for Big Data, offers options for analysis and machine learning. Both of these services are only available via REST API, consistent with the vision of Google’s web-based computing.
BigQuery is currently available in beta version, which we talked about here a few days ago. Prediction API The term of machine learning encompasses within it ans uses for the purposes of classification, analysis or recommendation generation. To meet these needs Google offers its product as Prediction API . Applications that use it work by creating and inducing a hosted model within the system of Google. Once you are used to this model, they can be used to make predictions, such as the spam detection and prevention. Google is currently working to allow these models to be shared, with an optional fee.