Recent studies have indicated that 90% of all data worldwide has been created in the last two years, capturing, storing, processing, and providing information ten or more times than all previous years of humanity. Current estimates indicate that the amount of data generated in 2018 reached 33 zettabytes (one zettabyte or ZB equals 1 billion terabytes), equivalent to 16 times more than that generated in the previous 10 years. This Big Bang of data continues to accelerate, and is expected to exceed 175 zettabytes by 2025, quintupling the amount in 2018.

This new scenario of great information production forces us to improve our capacity for decision-making based on data, as a relevant factor in public policies, the private sector and the community in general. A good use and analysis of the data provides the possibility of knowing characteristics or insights of their actions, either to understand an event, its causes, or to anticipate its repercussions. In this way, it enables the use of data to make decisions on a solid and reliable basis. However, this new scenario brings forth the need to develop the necessary mechanisms for data capture, storage, processing, security and availability, as well as the ability to use them in a simple and comprehensive way. This ability to acquire, understand, process, extract, value, visualize, and communicate data, will be an enormously important skill for decades to come.

Data Science can be defined as the use of data to achieve specific objectives by designing or applying computational methods for inference or prediction. This considers the study of data, where they come from, what they represent and the ways in which they can be transformed into valuable contributions and resources to create scientific, commercial and social strategies. This is an interdisciplinary field that involves scientific methods, processes, and systems to extract knowledge or a better understanding of (large volumes of) data. Some of the characteristics of Data Science are:

  1. Achieve Specific Objectives: depending on the domain and context, it can mean exploration, discovery, decision making, prediction, optimization or similar objectives and tasks;
  2. Design or Apply: Represents activities such as designing, understanding, or examining inference methods (for example, studying data learning in machine learning [ML]) or applying methods in a particular problem context (for example, using statistical analysis or inference methods);
  3. Computational methods: refers to the use of computers to conduct a direct search or to help a human formulate or optimize a model;
  4. Inference or prediction: this includes automated hypothesis formulation, automated exploration of definitions of new attributes or representations, etc., as well as producing an optimized predictive model without necessarily obtaining information on how it works;
  5. Data (structured or not): this requires the acquisition, cleaning, transformation, quality estimation, curation, security and provision of data.

El Data Observatory (DO) is focused on the development and the promotion of national datasets that are relevant in their specific areas, as well as on the development of innovative solutions that add value to areas such as science, economy, and society, with the use of data management and solution innovation. In particular, the main actions to be taken in these areas are as follows:

1. Development of datasets. DO will acquire and manage datasets from different industrial areas, whereas the different offers that are received will be analysed in terms of exploration and visualization; and will also develop different tools for data access and governance in order to maximize the exploitation of high-value datasets, combining methods used for public access and other aspects.

2. Development of solutions. DO and its members will create solutions to address the challenges arising from the analysis of the valuable data that they will be acquiring in the process. It is a well-known fact that data can be used for more than just analysis, since they can also be used for creating different responses and building predictive models to face the latest challenges with the use of the processed information. The objective of these solutions is to create value for the community that goes beyond the domain in which the data were originated.