The elegance of big data solutions belies the complexity contained in the underlying infrastructure.
With cloud computing technology, large pools of resources can be connected through private or public networks. This technology simplifies infrastructure planning and provides dynamically scalable infrastructure for cloud based applications, data, and file storage. The key features of this infrastructure are:
- Resource pooling
- Multi tenancy
- Metered self service
- Broad access across networks
- Elasticity and flexibility
Usually, the database and its operating environment is implemented on servers, called nodes and grouped together in clusters. Clusters can manage fragments of the same data set to increase multitasking performance and throughput. Clusters can also work on different data sets to work on different workloads and applications. These clusters may be implemented in one physical location though complex solutions require many clusters distributed over different geographies to deliver global applications. It is even possible to implement virtual data centers which are operational copies of each other, performing in an active-active operational and backup capacity relative to each other. This can all be installed on your premises and servers or using public cloud solutions from Amazon, Rackspace and other providers.
One of the key tools to managing this infrastructure is a graphical control panel that eases the management and operation of the entire data infrastructure. It allows for the provisioning of new servers and removal of excess capacity. New nodes and clusters can be deployed instantaneously across the entire hosting geography to deliver resources closest to where the customers reside. Data can be replicate within or across nodes and clusters. Security policies and data governance can be centrally managed across the entire enterprise. External components can be added and removed from the overall infrastructure. Interoperability and integration with network and application layers can be monitored and managed. Collectively, this provides a comprehensive system to manage the overall big data infrastructure and all of its key capabilities.