This study aims to determine the optimal model to predict the Total Construction Spending of Health Care by using the Seasonal Autoregressive Integrated Moving Average Model (SARIMA). SARIMA Model was performed during 22 years from January 2002 to December 2023 of Total Construction Spending of HealthCare (SHC), Millions of Dollars, from Federal Reserve Economic Data. The researcher concluded that the estimated model of the first-order difference for the logarithm of the SHC (DLSHC) series is SARIMA (1,1,2) (0,1,2)12. With coefficients: C = 0.003845, AR (1) = 0.970015, MA (1) = -1.147784, MA (2) = 0.219215, MA (12) = -0.89710 & MA (24) = -0.227258. This Model has more than 50% of the coefficients that are statistically significant at the 5% level. The jointly significant F-statistic value equals (3.893122) with a P-value (0.000981), S.E. of regression equals (0.019284). The ability to predict SARIMA (1, 1, 2) (0,1,2)12 Model is satisfactory, with a highly predictive power, with Theil Inequality Coefficient equals (0.000898) and Biaproportion equals (0.000087).
Health care is the core of community. It is the most significant among other things as it gives genuine and true benefits to people. Health care requires a substantive expansion in its utilities from clinics to hospitals and so on. Health systems are organizations established to meet the health needs of targeted populations. According to the World Health Organ-ization (WHO), a well-functioning healthcare system requires a monetary mechanism, a tight and a sufficient well-arranged paid workforce, authentic database on which to base decisions and policies, and well-maintained health facilities particularly to deliver high quality medicines and technologies. A competent healthcare system can contribute to a significant part in a countrys economy, develop-ment, and industrialization. Health care is conven-tionally regarded as the most important determinant in promoting the general physical and mental health and serves in the well-being of people around the world (WHO, 2019). Healthcare facilities may vary across nations and communities according to several factors that are influenced by socio-economic condi-tions as well as political factors. Providing health care services means "the timely use of personal health services to achieve the best possible health outcomes" (Millman M., 1993; Sultan MA., 2023).
Many empirical papers have applied the SARIMA model: Prista N. et al. (2011) “Use of the SARIMA Models to Assess Data-Poor Fisheries: A Case Study with A Sciaenid Fishery Off Portugal”, conclude that the SARIMA model was able to find adequately fitted and has forecasted the time series of meagre landings (12-month forecasts; mean error: 3.5 tons (t); annual absolute percentage error: 15.4%), in spite of its limited sample size. Therefore, we derive model-based prediction intervals and demonstrate the idea of how they can be used to detect problem-atic situations in the fishery Chhabra et al., (2023). “A Comparative Study of ARIMA and SARIMA Models to Forecast Lockdowns due to SARS-CoV-2”, a brief comparison between trained ARIMA & SARIMA models which are the presented, where ARIMA model gained an upper hand due to its accuracy. Additionally, the models are able to pre-dict confirmed death and confirmed cases of COVID Liu et al., (2023). “Application of SARIMA model in forecasting and analyzing inpatient cases of acute mountain sickness”, conclude that AMS inpatients have an evident periodicity and seasonality. The SARIMA model has a perfect ability and is accurate in predicting on the short-term. It helps in exploring various characteristics of AMS disease & provide any relevant medical resources for AMS inpatients.
Monthly data of the Total Construction Spending of Health Care in United State (SHC), Millions of Dollars, were obtained from the Federal Reserve Economic Data https://fred.stlouisfed.org/series/ TLHLTHCONS). SARIMA Model was performed during 22 years from January 2002 to December 2023 by using Stationary test (Unit Root of Augmented Dickey-Fuller) which was performed on the SHC series, as well as autocorrelation and partial autocorrelation function graphs was performed to determine the laying of difference and the appropriate transformation that should be used to convert it to stationary series. The researcher will determine the appropriate model of SARIMA (p, d, q) (P, D, Q)S, by selecting the model that has a larger significant coefficient and the highest R-squared value along with the smallest values of Akai Info. Criterion, Schwarz Criterion and SIGMASQ (Box et al., 2015; Gujarati et al., 2009; Fan et al., 2009).
SARIMA is an extended algorithm that has a sea-sonal component along with the ARIMA (Auto Regressive Integrated Moving Average) method. The model assumes that the Total Construction Spending of Health Care in the United States (SHC) data comprises trends, seasonal components, and irregular terms. For ARMA (p, q) equation we will use L operator, which denotes the lag operator,
Where L^n x_t=x_(t-n)
x_t=α+∑_(i=1)^p▒〖α_i L^i x_t+μ+〗 ∑_(i=1)^q▒θ_i L^i ε_t+ε_t (1)
Which can be represented as follow:
x_t=α(L)^p x_t+θ(L)^q ε_t+ε_t (2)
It can be assumed that ARIMA (p, d, q) equation will turn out to be:
∆^d x_t=α(L)^p ∆^d x_t+θ(L)^q ∆^d ε_t+∆^d ε_t (3)
By using seasonal lags and an ARMA (P, Q) model on the different values, we can extract any remaining structure. In other words, we use L^S rather than the standard lag operator L. Once more, P and Q are seasonal time lags.
∆_S^D x_t=A(L^S )^P 〖∆_S^D x〗_t+ϑ(L^S )^Q ∆_S^D ε_t+∆_S^D ε_t (4)
We can now apply another ARIMA(p, d, q) model to ∆_S^D x_t by multiplying the seasonal model by the new ARIMA model in order to remove any remaining seasonality and obtain a mathematical representation of SARIMA(p,d,q)(P,D,Q)S
∆^d ∆_S^D x_t=α(L)^p A(L^S )^P ∆^d 〖∆_S^D x〗_t+θ(L)^q ϑ(L^S )^Q ∆^d ∆_S^D ε_t+∆^d ∆_S^D ε_t (5)
(Gujarati et al., 2009; Carter et al., 2011) Seasonal Auto-Regressive Integrated Moving Average (SARIMA) was established to:
• Analyze and explore the intrinsic structure of the series
• Determine the seasonal variations.
• Determine the optimum model for prediction.
• Analyze the performance of SARIMA Model.
• Forecasting for the next year during the months using the SARIMA Model.
The data were analyzed with Econometrics Views (EViews) Release 10.
Fig. 1: Monthly Data of the Total Construction Spending of Health Care in USA during January 2002 - December 2023.
The above figure shows that the SHC series has exponential shape and have some seasonality affect.
Table 1: Descriptive Statistics for Monthly Data of the Total Construction Spending of Health Care in USA during January 2002 – December 2023.
According to the above table, the Total Construction Spending of Health Care in millions of dollars is range between (25438 - 61749) with mean value (41570.63), median value (41262) and std. Dev. (7122.802).
Table 2: Augment Dickey-Fuller Unit Root Test on SHC.
Table 2 shows that the Augment Dickey-Fuller statistic is (-0.287389) with P-value (0.9235) which is not a statistically significant value at level 1%, 5%, 10% respectively. Therefore, we wouldnt be able to reject the null hypothesis; that SHC has a unit root, and we conclude that the series of SHC is non- stationary. As in Fig. 1, the original series has exponential shape, so we should try to eliminate its non-stationary by using the logarithm of the SHC.
Fig. 2: The LSHC Data During January 2002 -December 2023: is Plotted in Fig. 2.
Table 3: Augment Dickey-Fuller Unit Root Test on LSHC.
According to Fig. 2 and Table 3, the results show that the Augment Dickey-Fuller statistic of LSHC is (-1.437988) with P-value (0.5634) which is not statistically significant value at level 1%, 5%, 10% respectively. Therefore, we wouldnt be able to reject the null hypothesis; that LGDP has a unit root, and we conclude that the series of LSHC is still non stationary. Further, the first order difference is performed and the D (LSHC) series is obtained as in the following table:
Table 4: Augment Dickey-Fuller Unit Root Test on D (LSHC).
The Augment Dickey-Fuller statistic of D (LSHC) is (-18.57187) with P-value (0.0000) and is a statisti-cally significant value at level 1%, 5%, 10% respect-tively. Therefore, we wouldnt be able to reject the null hypothesis; that D (LSHC) has a unit root, and we conclude that the series of D (LSHC) is the stationary. The autocorrelation and the partial corre-lation function graphs of D (LSHC) series are plotted in the figure below.
In the above Table 5 the autocorrelation of the D (LSHC) series is significantly non zero when the lag order is q=1 or q=2, as it is basically in confidence band when the lag order is greater than 2. The same goes as well for partial autocorrelation where we take p=1 or p=2, hence the final order with 0, 1, 2 in autoregressive moving average pre-estimation is performed on sample series. In the seasonal part, we can take q=1 or q=2 as the same as p=1 or p=2.
Table 5: Correlogram of D (LSHC).
Table 6: Automatic ARMA Forecasting.
According to Akaike Information Criteria in Fig. 3 and Automatic ARMA Forecasting in Table 5, the selected ARMA Model is (1,2)(0,2) with AIC* value (-5.026258), which is the best one out from 81 estimated ARMA Models that have significant para-meters with the highest R-squared value and the lowest values of Akai Info. Criterion, Schwarz Criterion and SIGMASQ.
Fig. 3: Akaike Information Criteria.
Table 7: The Estimated Results of SARIMA (1, 1, 2) (0, 1, 2)12 Model.
According the above results shown in Table 7, the estimated model is SARIMA (1, 1, 2) (0, 1, 2)12 has more than 50% of the coefficients that are statis-tically significant at level 5%. R-squared value is equal to (0.085765), and the jointly significant F-statistic value equals (3.893122) with P-value (0.000981). Durbin-Waston statistic (1.967391) is found to be 2, so there is no first-order auto-correlation neither positive nor negative. In addition to it, Durbin-Waston statistic is more than R-
squared, which emphasize that this model is not spurious. So, the estimated model of the D (LSHC) series SARIMA (1, 1, 2)(0, 1, 2)12 is:
DLSHC=0.003845+0.970015AR(1)-1.147784MA(1)+ 0.219215MA(2) -0.089710SMA(12)-0.227258SMA(24) with S.E. of the regression equals (0.019284) By the using residual diagnostics, we examine the normality of the Model SARIMA (1, 1, 2)(0, 1, 2)12 as shown in the following figure:
Fig. 4: Normality Test of the Model SARIMA (1, 1, 2) (0, 1, 2)12.
The P-value of Jarque-Bera Normality Test is equal to (1.245555) and is not statistically significant at level 5%; so we accept the null hypothesis; that the residuals are normally distributed.
The autocorrelation and the partial autocorrelation function graphs of residual series in the above figure show that the residuals are the white noise which indicates that the model is valid.
Table 8: Correlogram of the Residuals of SARIMA (1, 1, 2) (0, 1, 2)S.
Fig. 5: Actual, Fitted, Residual Graph.
As shown in Fig. 5, the actual & fitted series are passing through 50% confidence interval, so the forecasting of D (LSHC) is significant and the ability of forecasting the model is satisfactory. Firstly, we do the forecast inside the sample to check the power of the model in forecasting (Hossain et al., 2020).
The above graph shows that the forecasting value of LSHC in 2023M05 is (0.01503) while the actual value is equal to (-0.01436) with a poor relative error 2.93%, so the forecasted value is close to the actual value. Hence, it signifies that the model has a good fitting effect.
Fig. 6: Forecast LSHC.
As shown in the above figure, the root mean squared error equals (0.019092), while Theil Inequality Coefficient equals (0.000898), which is close to zero, this means that the predictive power of this model is very strong. Bia proportion equals (0.000087), which means there is no obvious gap between the actual LSHC and the predictive value and they are moving closely, and passing through 50% confidence interval so, the forecasting of LSHC is significant and the ability of forecasting SARIMA (1, 1, 2)(0, 1, 2)12 Model is satisfactory. Secondly, by using Box-Jenkies for forecasting SHC during the upcoming year from 2024M01 to 2024M12, the results are shown in the table below:
Table 10: Forecasting of the Total Construction Spending of Health Care in USA: Outside the Sample from January 2024 to December 2024.
Seasonal Autoregressive Integrated Moving Average Model SARIMA (1, 1, 2) (0, 1, 2)12 is acceptable to the predictive purpose of forecasting the Total Construction Spending of Health Care in USA (SHC):
DLSHC=0.003845+0.970015AR(1)-1.147784MA(1)+ 0.219215MA(2) -0.089710SMA(12)-0.227258SMA(24) with S.E. of regression equals (0.019284), Durbin-Waston statistic (1.967391) and the probability of F-statistic equals (0.000981). The ability of forecasting SARIMA (1, 1, 2) (0, 1, 2)12 Model is satisfactory and carries a highly predictive power, with Theil Inequality Coefficient equals (0.000898) and Bia proportion equals (0.000087).
With due respect and obeisance, I would like to express my utmost gratitude, indebtedness and app-reciation to my family for their immaterial support.
The author confirms that have no conflict of interest
Academic Editor
Dr. Abduleziz Jemal Hamido, Deputy Managing Editor (Health Sciences), Universe Publishing Group (UniversePG), Haramaya, Ethiopia.
Assistant Professor of Statistics, Dept. of Mathematics, Faculty of Science, University of Hafr Al Batin, Saudi Arabia.
Sultan MA. (2023). Predicting the total construction spending of health care by using SARIMA model: United States case, Eur. J. Med. Health Sci., 5(5), 159-165. https://doi.org/10.34104/ejmhs.023.01590165