LSTM (Long Short-Term Memory) has proven its worth in terms of predicting Stock prices through questioning market conditions. This research focuses on the quality of LSTM predictions when various activation functions are applied within the context of noisy market data. In this research, we have used 25 different stocks from diverse stock exchanges and observed the predictions created by different activation functions such as Relu, Elu, and TanH. Our research would involve this accuracy within the context of average loss accumulation and price predictions for the stock sample. The market conditions will imply the features of similar epoch runs, and the same training and testing period, which are irrespective of SE and LSTM feature parameters defined by market-benefitting suggestions. This research has found an accuracy of 80% through the multivariable prediction method derived from the Hyperbolic Tangent activation function, suggesting that this function is the best for price prediction based on LSTM through the multivariable method.
Stock prediction has been an issue of interest for investors since the beginning of an investment-based business. Walter E. Brown defined that among the various alternatives for making investments stocks are the best possible alternatives, which define the lowest volatility but highest possible return (W. E. Brown, 1982). Yu & Yan in their research supported the idea of predicting stocks using a deep neural network which has been highly accurate rather than SVM & random forest regressor (Yu & Yan, 2019). JW Lee has shown that among the various prediction methods, the prediction process that uses reinforce-ment learning-based forecasting processes has been the most capable in terms of making accurate pre-dictions (JW Lee, 2001). Nikou et al. have shown that deep learning-based price prediction algorithms have better prediction quality rather than machine learning-based prediction methods (Nikou et al., 2019). Hence with research suggestions, we can sub-stitute the understanding that deep learning-based prediction processes are suitable for stock price forecasting. It has also been observed that in highly unstable markets like DSE (Dhaka Stock Exchange), LSTM could predict stock prices effectively (Sami et al., 2021). Sami et al. also have shown that LSTM is capable of predicting stock prices of stable markets like NYSE (Sami et al., 2021).
In this research, we evaluated the best possible acti-vation functions for which we can predict stock prices with the highest level of accuracy. Farzad et al. have evaluated among various activation fun-ctions sigmoid is the most popular one for prediction (Farzad et al., 2019). Elsayed et al. have found that the most popular and effective prediction processes are RELU, ELU, and tanh functions, which have accurate predictions (Elsayed et al., 2018). Rana et al. consequently agree with Elsayeds suggestion for effectively using RELU, ELU, and tanh activation functions for predicting stock prices with high accuracy (Rana et al., 2019).
Hence in this research, we are going to evaluate RELU, ELU, and tanh activations functions and their capability to make accurate predictions within the multivariable price parameters.
Literature Review
Campbell & Kyle suggested that noise trading occ-urs when the market remains highly volatile and such market behavior is prevalent for stock trading (Campbell & Kyle, 1993). In the case of price fore-casting, LSTM maintains a feature of exploding and vanishing gradient that helps reduce the noise impact in the case of forecasting. Hence LSTM can be considered as a price prediction method for Stocks (Siami et al., 2019). In terms of selecting stocks of variant markets, its highly imperative that the activation function chosen should have the best acc-uracy in terms of price prediction and accuracy. Chunhachinda et al. have shown that different stock exchanges in the whole world with variants repre-senting stocks as a background will define the best possible data sample for stocks (Chunhachinda et al., 1997). Haroon & Rizvi have shown how diff-erent stock markets have differences in terms of liq-uidity and easy profit opportunity (Haroon & Rizvi, 2020). Our research will evaluate the competitive quality of activation functions applied in the LSTM process and how these functions will qualify with accurate price predictions. In this research, the liqui-dity and easy profit opportunity which could be un-derstood through fundamental financial analysis will be compared with statistical evaluations through LSTM. Istiake et al. have suggested that in the case of LSTM-based price predictions, its always better to make the least Epoch runs for more than 35 and till 100 as the loss process remains pretty much the same till 1000 epoch runs because the data becomes noisy (Istiake et al., 2020). But Istiake et al. have also shown that at least an epoch run of 35 makes the loss process effective enough for all stock prices. Hence this research is compiled within the frame-work of more than 50 epoch runs at a minimum for effective predictions of prices. Beyaz et al. have shown that in the case of stock price forecasting the ideal training and testing data split should be accom-panied by an 80% training dataset (Beyaz et al., 2018). Following this data set split Bloomberg and Goldman Sachs got the best result of price predict-tions (Jiang et al., 2017) applied through machine learning parameters defined by the best possible feature settings suggested by predicting stock market conditions. Classification algorithms play a huge role in the selection of stocks in this research. Using K Nearest Neighbor HM Sami success-fully created a portfolio that has selected stocks from NYSE (Sami HM, 2021). Similarly, HM Sami has shown how K Nearest Neighbor has been incredibly quali-fied enough to make a selection of qualified Human resources (Sami HM, 2021). Chen and Hao also showed that by using the K Nearest Neighbor algori-thm with financial ratios as background we can effectively select stocks (Chen & Hao, 2017).
Karaven et al. have found the importance of Tanh for making accurate predictions in reference to making weather predictions (Karaven et al., 2020). With multivariable factors in consideration, weather predictions for a specific time slot have a high degree of similarity with noisy stock prices accor-ding to the research performed by Karaven. Hence Tanh is considered an important activation function for this research. As financial time series forecasting needs to have implications for both forget gate and output gate, it was found that the integration in both RELU and ELU through multivariable price series of the S & P 500 index can make the most accurate predictions (Borovykh et al., 2018).
Theoretical Background
LSTM (Long Short-Term Memory)
LSTM is a neural network structure that is primarily based on artificial recurrent neural networks and has the sole right to predict the correct prediction phe-nomenon. It can analyze data sequences due to its network structure. The LSTM unit is made up of a component, an entry gate, an exit gate, and a for-getting gate (Jiang et al., 2017).
(a) Instead of arbitrary time intervals, the unit struc-ture remembers the value structure.
(b) Gates regulate the flow of information into and out of the cell.
The goal of the LSTM organization is to eliminate the gradient problem in order to achieve the best failure solution. The constant flow of input across gradient LSTM units enables the solution method (Wu et al., 2018). Due to the calculation process and the participation of finite precision numbers (roun-ding errors) in nonlinear time series predictions involving vanilla RNN via backpropagation, RNN continues to use gradients as missing or explosive basic feeds without changes (they tend to be at zero) or infinite change (towards infinity). In general, LSTMs include a forget gate, which allows you to ignore error values propagating backward from the output layer sequentially and slowly cut them through the feed-forward neural network through repetition (Siami et al., 2019). The loop only allows the LSTM to train gradients with weight updates that appear to be valid for future value benchmarks (Alahi et al., 2016). Finally, in terms of precision, the actual weight update reference and gradient growth indicate the propagation factor associated with the time series prediction based on any value of the training and test values.
Activation Functions
ELU (Exponential Linear Unit)
The Exponential Linear Unit (ELU) is a neural net-work activation function. ELUs, unlike ReLUs, have negative values, allowing them to push mean unit activations closer to zero, similar to batch normalize-ation, but with less computational complexity. Be-cause of the reduced bias shift effect, mean shifts toward zero accelerate learning by bringing the nor-mal gradient closer to the unit natural gradient. Al-though LReLUs and PReLUs have negative values, they do not guarantee a noise-resistant deactivation state. With smaller inputs, ELUs saturate to a nega-tive value, reducing forward propagated variation and information.
Tanh
The Tanh (also "tanh" and "TanH") function is ano-ther name for the hyperbolic tangent activation function. It is very similar to and even has the same S-shape as, the sigmoid activation function. The fun-ction accepts any real value as input and returns values ranging from -1 to 1. The larger the input (more positive), the closer the output to 1.0, whereas the smaller the input (more negative), the closer the output to -1.0.
RELU (Rectified Linear Unit)
The activation function of the rectifier or ReLU (Rectified Linear Unit) is defined as the positive part of its argument positive; otherwise, it outputs zero. It has become the default activation function for many types of neural networks because it is easier to train and often results in better perform-ance. To train deep neural networks using stochastic gradient descent with backpropagation of errors, an activation function that looks and acts like a linear function but is actually a nonlinear function that allows complex relationships in the data to be lear-ned is required. In addition, the function must be more sensitive to the activation sum input and avoid oversaturation.
The activation function is the first step in the RNN process in general. Using the activation function, the weighted sum of input is converted into an output from a node or nodes in a network layer. The symbol a
All of these parameters affect the RNN structure, which is used to produce the following results:
The activation functions bias is related to the acti-vation functions weight allocation of Wax for input and Waa for the activation function. Similarly, for the output activity:
The resulting activation is linked to the output bias factor and the output allocated weights, which are represented by Wya. It is related to bias and market factor g1 for price prediction, while g2 remains the activation functions that take effect one after the other. Because of all of these functions, the recurrent network is effective. As a result, the RNN unit as a whole is defined as
Whereas each time step motion allows for successful training based on the data generated by the network. Each of these processes exemplifies the positive aspect of the previous asset price and its relationship to the current asset price.
As many examples as possible would be incur-porated into this feed-forward neural network system
to train the information to acquire appropriate in-formation in response to situational developments. The RNN design allows biased conditional methods to be completely reliant on the training process. Bias is the most important factor in price prediction because each stock is related to the market via beta but also has its own performance capacity (Stosic et al., 2019). The LSTM process employs a variety of gated structures, such as those shown below. The RNN design allows biased conditional methods to be completely reliant on the training process. Bias is the most important factor in price prediction because each stock is related to the market via beta but also has its own performance capacity (Stosic et al., 2019). The LSTM process employs a variety of gated structures, such as the ones listed below:
Represent the gate notation in the RNN procedure. The symbols above represent the update, relevance, forget, and output gates. As a result, the gates enable the recall of features for price prediction or the obli-teration of data for price prediction. The following function shows how gates are linked:
Fig. 1: Illustration of the research process.
Application through LSTM
When used in conjunction with RNN, LSTM is discovered to also have multiple gates within the processing unit. As a result, the goal of LSTM is to aid in the development of effective evaluation and training solutions that aid in: As a result, the goal of LSTM is to aid in the development of effective eva-luation and training solutions that aid in the output values that would aid the activation functions for future values are highlighted using RNN trimmed techniques to intentionally manage the exploding gradient problem, as RNN encounters both explo-ding gradient and vanishing gradient problems. By preserving parallel processing during the boundary line looping process, RNN decreases the risk of in-finite. Although RNN can solve the exploding gra-dient problem, it cannot solve the vanishing gradient problem. When we look at the gated structure of the RNN unit in the LSTM, we can see that it has four gates. C
In the case of the final output function c
Furthermore, because the activation function remem-bers which essential information is required for the prediction process at each phase of the final output, this procedure makes the activation function for the following stage very efficient.
Epochs for Loss Reduction
When the prior step in an RNN supplies irrelevant information to the forward step, the loss function is designated as the loss occurrence jurisdiction during the training process. We can see here that if the main output has less loss than the final output, then the prior information should be used in the feed-forward process. Because LSTM wants to employ its gated structure approach to achieve the precise step of loss reduction, the loss in terms of predictions has been decreed with successful training. As seen in our enlarged Epoch computation, the epoch of 50 - 100 has produced effective forecasts for the procedure to be successful with favorable data. This Epoch is suggested through scholarly suggestions.
Back propagation Method
In order to make the financial price forecasting issue suggestively good with respect to time-based propa-gations, the derivative of loss L concerning matrix W must be updated about each time step. When the relevant information is updated and transferred via the activation stages, the corresponding weights that will be used to predict the price will be altered in each step, according to the findings.
Its been proven that using LSTMs observation stages and feed forwarding method, the loss function for large datasets may be reduced over time. The loss does not become greater if we try taking the same number of training steps on smaller datasets. Furthermore, it has been discovered that smaller datasets and shorter training stages result in lower losses.
In this research, we have made evaluations of vari-ous stocks of Bangladesh, Thailand, India, Malaysia, Japan, and Indonesian SEs. This research focuses to find the accuracy of LSTM through various acti-vation functions where each predicted price output is compared in terms of accuracy. Through this rese-arch method, we have accumulated that ELU makes the best possible accuracy among the three different activation functions in comparison to the other method within similar LSTM parameters. Our rese-arch has observed a complete finding of 80% accu-racy in terms of predicting stock prices with tanH (Hyperbolic Tangent) in comparison to ELU and ReLU which makes prediction comparably at 72% and 48% on average for these stocks.
We have found that ReLU has provided the least amount of positive output. As the accuracy seems widely accepted in most of the stocks of variant SEs then LSTM could considerably be used as an impor-tant TSF (time series fore-casting) method widely used by financial organizations to predict future prices of assets. Similarly, the emphasis on classi-fication through segregation is observed in pre-dicting qualified customers with judgmental feature sets defined by acceptable criteria for selecting loan-providing customers by the Naïve Bayes classifier (HM Sami et al., 2021). Similarly, classifying the predicted results in terms of accuracy is also per-formed here to find the best possible activation function to predict stock prices.
Table 1: Prediction accuracy using different activation functions.
Fig. 2: Illustration of the accuracy of each activation function.
Moreover, we have found that using the same para-meter of activation functions in terms of epochs of 50 - 100, the loss starts getting reduced at 10 onwards and remains extremely low till 20 which shows that by scholarly suggestions we make effect-tive predictions of stock prices.
Drawbacks & Further improvements
In this research, we did not make predictions based on standard market practices suggested by Wall Street or other major trading organizations target processes but rather through scholarly suggestions. Through this process, we have not yet evaluated stock price predictions based on market practicing parameters. Moreover, we did not make selections of assets within the parameter of market implications but with random selection processes. Through this process, although we made incredibly good predict-tions, we found that nearly 60% of the assets that we selected showed negative growth. So, our prediction method needs to rely more on financial benchmarks.
LSTM is a strongly suggested method that streng-thens the time series-based forecasting method, especially for financial asset pricing. The findings from our research show an average of 67% accuracy of prices in terms of financial pricing. Moreover, it has also been observed that tanH as an activation function is the most effective function to make accu-rate predictions of prices with an accuracy rate of above 80%. Hence by evaluating stocks of different stock exchanges and different market parameters based on similar training and testing of dataset split, similar epoch runs, similar time frames for test choice, and similar LSTM feature sets, we can jus-tify tanH as the best possible activation function that makes the best possible accuracy selections of assets in terms of prediction of prices.
We acknowledge the sources of the asset infor-mation. The online databases of Yahoo Finance, Simply Wall St., and Investing have greatly aided in knowledge acquisition and dataset selection. The scholarly writings, research papers, and blogs that assisted in the creation of this paper are properly cited throughout this article and in the reference list that follows.
All authors declare that they have no conflicts of interest with the contents of this research work.
Academic Editor
Dr. Toansakul Tony Santiboon, Professor, Curtin University of Technology, Bentley, Australia.
Department of Accounting & Finance, North South University, Dhaka, Bangladesh
Sami HM, Ahshan KA, and Rozario PN. (2023). Determining the best activation functions for predicting stock prices in different (stock exchanges) through multivariable time series forecasting of LSTM. Aust. J. Eng. Innov. Technol., 5(2), 63-71. https://doi.org/10.34104/ajeit.023.063071