In Ecuador there have been two main projects that have collected national soil information. These projects are: a) “Generación de Geoinformación para la Gestión de territorio y valoración de tierras rurales de la Cuenca del Río Guayas, escala 1:25.000” (2007-2015) developed by the Instituto Espacial Ecuatoriano (IEE), and b) “Generación De Geoinformación Para La Gestión Del Territorio A Nivel Nacional" (2009-2012), developed by Sistema Nacional de Información de Tierras Rurales e Infraestructura Tecnológica (SIGTIERRAS). These projects followed a similar methodology to collect and analyze soil information. However, the resulting databases have different data structures, and they show differences in the way these projects store and present soil information.
Only a portion of the original databases was digitized. Most of the available data was only available in PDF files. These PDF files need to be digitized into an easy-to-manage format (e.g., *csv). The difficulty is that each PDF represents one soil profile containing morphological and analytical soil information. Thus, given the volume of soil information available in hundreds of PDF files, manual extraction (e.g., capturing soil data one by one) was not feasible. Therefore, automatic extraction of soil information from each PDF file was developed using open-source programming for data management and statistical computing (in Python and R). This process was developed to optimize data extraction from PDF files. The soil information from both projects in PDF files has been digitized and unified into one harmonized database. We present a new database for Ecuador containing soil information from 13,542 soil profiles, 5 368 are from the IEE project and 8 174 profiles from the SIGTIERRAS project. The new database includes 5368 are from the IEE project and 8174 profiles from the SIGTIERRAS project. The new database includes data from 51,692 soil horizons and information of about 20 morphological and 46 analytical variables.
Given the difficulty of designing a single structure for information linking soil profiles and soil horizons in the same database, the database was organized into two relational databases linked by a unique identifier. This relational database allows performing queries of information in a more efficient fashion. The unique (e.g., column ID_PER) identifier allows to couple soil horizon and soil profile information records. Both files (soil horizon database and soil profile database) are provided in *csv files, and they can easily be imported into statistical software (such as R) to perform analysis.
The soil profile database (Table 1) contains information associated with soil profile site-level data. The variables Include geographic localization, taxonomic soil profile information (soil taxonomy, humidity, and temperature regimes), soil environmental characteristics associated with soil-forming factors (landscape attributes, land cover type, slope), land use, and soil conservation state. The horizon database (Table 2) includes extensive information from the soil profile description on a soil horizons basis. The variables include qualitative and quantitative information. We find morphological information in the soil horizon database (e.g., designation or depth of horizon, presence or absence of roots, the abundance of rock fragments). We also find over 40 analytical variables representing physical (textural properties of bulk density) and chemical soil properties, including the organic fraction, fertility, and salinity.