Resumen:
The process of building data analytics systems, including big data systems, is currently
being investigated from various perspectives that generally focus on specific aspects, such
as data security or privacy, to the detriment of an engineering perspective on systems
development. To address this limitation, our proposal focuses on developing analytics
systems through a reuse-based approach, including stages ranging from problem definition
to results analysis by identifying variations and building reusable, context-based assets.
This study presents the reuse process by constructing two case studies that address the
water table level prediction problem in two different contexts: the irrigated period and the
non-irrigated period in the same study area. The objective of this study is to demonstrate
the influence of context on the performance of widely used predictive models for this
problem, including long short-term memory (LSTM), artificial neural networks (ANNs),
and support vector machines (SVMs), as well as the potential for reusing the developed
analytics system. Additionally, we applied the permutation feature importance (PFI) to
determine the contribution of individual variables to the prediction. The results confirm that
the same problem hypotheses yield different performance in each case in terms of coefficient
of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and
mean square error (MSE). They also show that the best-performing predictive models
differ for some of the hypotheses (ANN in one case and LSTM in another), supporting
the assumption that context can influence model selection and performance. Reusing
assets allows for more efficient evaluation of these alternatives during development time,
resulting in analytics systems that are more closely aligned with reality, while also offering
the advantages of software system composition.