Recent studies have indicated that 90% of all data worldwide has been created in the last two years, capturing, storing, processing, and providing information ten or more times than all previous years of humanity. Current estimates indicate that the amount of data generated in 2018 reached 33 zettabytes (one zettabyte or ZB equals 1 billion terabytes), equivalent to 16 times more than that generated in the previous 10 years. This Big Bang of data continues to accelerate, and is expected to exceed 175 zettabytes by 2025, quintupling the amount in 2018.

This new scenario of great information production, forces us to improve our capacity for decision-making based on data, as a relevant factor in public policies, the private sector and the community in general. A good use and analysis of the data provides the possibility of knowing characteristics or Insights of their actions, either to understand an event, its causes and anticipate its repercussions. In this way, it enables the use of data to make decisions on a solid and reliable basis. However, this new scenario brings with it the need to develop the necessary mechanisms for data capture, storage, processing, security and availability, as well as the ability to use it in a simple and comprehensive way. This ability to acquire data, understand it, process it, extract value from it, visualize it, and communicate it, will be an enormously important skill for decades to come.

Data Science can be defined as the use of data to achieve specific objectives by designing or applying computational methods for inference or prediction. This considers the study of data, where they come from, what they represent and the ways in which they can be transformed into valuable contributions and resources to create scientific, commercial and social strategies. This is an interdisciplinary field that involves scientific methods, processes, and systems to extract knowledge or a better understanding of (large volumes of) data. Some of the characteristics of Data Science are:

  1. Achieve specific objectives: depending on the domain and context, it can mean exploration, discovery, decision making, prediction, optimization or similar objectives and tasks;
  2. Design or Apply: Represents activities such as designing, understanding, or examining inference methods (for example, studying data learning in machine learning [ML]) or applying methods in a particular problem context (for example, using statistical analysis or inference methods);
  3. Computational methods: refers to the use of computers to conduct a direct search or to help a human formulate or optimize a model;
  4. Inference or prediction: this includes automated hypothesis formulation, automated exploration of definitions of new attributes or representations, etc., as well as producing an optimized predictive model without necessarily obtaining information on how it works;
  5. Data (structured or not): this requires the acquisition, cleaning, transformation, quality estimation, curatorship, security and provision of data.

El Data Observatory (DO) está orientado al desarrollo y promoción de data sets (conjunto de datos) nacionales relevantes en los ámbitos de competencia, así como al desarrollo de soluciones innovadoras que aporten valor en las áreas de la ciencia, economía y sociedad, a través del manejo de datos e innovación de soluciones. En particular, las principales acciones en esta área son las siguientes.

  1. Desarrollo de data sets. El DO adquirirá y gestionará data sets de distintas industrias y sectores, para lo que se analizarán las distintas ofertas que existan, en términos de exploración y visualización; y desarrollará distintas herramientas de acceso y gobernanza para maximizar la explotación de conjuntos de datos de alto valor, combinando modos de acceso público y otros.
  2. Desarrollo de soluciones. El DO y sus miembros crearán soluciones a los desafíos que surjan del análisis de los conjuntos de datos valiosos que se irán adquiriendo. Como es sabido, los datos no solo sirven para analizarlos, sino también para crear diferentes respuestas y modelos predictivos a los desafíos existentes utilizando la información procesada. El objetivo de estas soluciones será crear valor para la comunidad más allá del campo de origen de los datos.