Each stream of study, especially scientific ones generate massive data. These humongous amounts of data can be described as big data if it possesses the six Vs of volume, variety, velocity, variability, validity, and volatility.
There has been the enormous growth of data in both the commercial and scientific sectors due to advances in data generation and collection technologies. The current approach world over is to gather maximum data. The big data analytics envisage and expect that there will be great value addition in human understanding based on this data and the vastness of its future use has not been fully enumerated.
Say, for example, if we look at the domains of astrophysics, earth sciences, climate modeling, drug discovery, molecular imaging etc., we can understand how huge the volume of the data potentially is. Thus, there arises an urgent need for seamless, affordable and efficient connectivity to these databases to pursue scientific collaboration both nationally and globally.
If climate data is efficiently integrated with morbidity data, patterns hitherto little understood may emerge. Similarly climate change projections can be integrated with energy demands to enable better planning and efficiency of service.
The fundamental challenge is to analyze the huge amount of data currently generated by the community and plan for the future. Till date, the traditional approach has been working well to analyze the data and visualize the analysis. However, there is a need to implement large-scale search engines like a Hadoop Framework on such data in order to map it.
Big data thus needs to be reduced and focused onto an area of interest which is essentially data analytics. Once this is done, one can use statistical relationships to predict outcomes for the future. This is how analytics can be applied for big data. Technologies to efficiently process large quantities of data within tolerable elapsed time are thus extremely pertinent to any program associated with big data analytics.
In India till date, we do not have a clear policy or systematic approach towards developing national databases. The office of the Principal Scientific Advisor to the Indian government has taken an initiative to recreate National Data Bases in specific domains and make it available to various stakeholders who would need this data for various purposes.
It is envisaged to be a virtual community in a specific domain connecting research centers across the country with access and connection to global centers of research. The National Knowledge Network (NKN) is seen as the way forward to create this virtual community. NKN is seen as the way forward for science in the modern lexicon where science itself has become collaborative science creating big data.
NKN at present connects about 1600 knowledge institutions spanning universities, research and development laboratories of Department of Atomic Energy, Department of Space, Defense Research and Development Organization, Council of Scientific and Industrial Research, Ministry of Earth Sciences, Medical institutions of Central and State Governments, Agricultural Universities, Centrally Funded Technical Institutions (CFTI) of Ministry of Human Resources Development, and some select institutions in higher education and research.
However, there are various other challenges that have to be dealt with when working with big data. Some of them include, data cleansing, high-dimensionality and data reduction, data representations and distributed data sources, data sampling, scalability of algorithms, data visualization, parallel and distributed data processing, real-time analysis and decision making, crowdsourcing and semantic input for improved data analysis, data discovery and integration, parallel and distributed computing, exploratory data analysis and interpretation, integrating heterogeneous data, developing new models for massive data computation, among others.