Training a practical and effective model for stock selection has been a greatly concerned problem in the field of artificial intelligence. Because of the uncertainty and sensitivity of the finance market, there are many factors which may influence the stock price such as significant events, society’s economic condition,political turmoils and etc.
To handle these complicated components and make a precise prediction, a lot of scholars choose to use machine learning to create a model.Shunrong Shen et al. proposed data from different global financial markets with SVM (support vector machine) and reinforce learning to predict stock index movements in the U.S. market. Kai Chen et al. introduce the application of LSTM (Long Short-Term Memory) in stock index prediction by using low-frequency data.
Although there are existing low-frequency features created by some experts, constructing useful high-frequency features with high-level information is difficult for us. Moreover, many existing features which are calculated by the U.S. stock market index different from the China stock market. Therefore, we prefer to apply methods without constructing features by ourselves. In this paper, we introduce two machine learning algorithms LSTM(long short-term memory) and CNN(convolutional neural network) to find the most beneficial strategy of stock trading in China stock market.
We have chosen the closing price, opening price, highest price, lowest price, trading volume, transaction amount, number of transactions, commission ratio, volume ratio, commission purchase, commission sale of the Chinese A-share market. These 11 volume-price features are used as elements to describe the state of the stock. In order to train with marketrepresented stocks and reduce data inconsistency (such as stock suspension) and noise, we selected the constituents of the (CSI 300) index, denoted as I, as the source of the sample data set. The model uses two types of data, every 15 minutes of data and every 120 minutes of data. The sample data is exhibited as follows: (picture) To speed up the convergence of the neural network, the negative influence of the dimension of the feature data on the model is eliminated , and the feature vector x of each stock i at each time t is normalized by the equation as follows:
Because the model is to process classification problem, we chose the cross-entropy cost function as the loss function:
For the optimizer, we selected Adam, Adadelta, and RMSProp three adaptive optimizers for testing (learning rate is 0.001), which is performed on I data from January 2019 to May 2019.The performance of different optimizers is represented as follows. The test uses batch gradient descent method: there are 30 samples per batch and all samples do 50 iterations. Also, all the following tests are the same. We use the early stopping to avoid overfitting athe 30th epoch. As can be seen from the graph, the Adadelta optimizer is inferior, and Adam optimizer and the RMSProp optimizer are equally effective. Therefore, according to this result, we chose Adam optimizer in our model.
In the end, we determine to use CNN+2Dense as the final framework and feed it with 15-minute price-volume trading data in 5 past consecutive days. As for sample set, we choose the data from July 1,2014 to December 31, 2018, during which Chinese stock market witnessed periods of sharp rise, sharp fall, slight rise and slight fall, providing sufficient samples for each of our four labels. We can see that the accuracy approaches 42%, compared to 25% if the stocks are randomly chosen. Finally, we choose January 1, 2019 to May 31,2019 as back test period. It can be seen that, not taking the transaction fee into account, the strategy based on our CNN model outperforms the market. Therefore, it can be concluded that our model is effective in dealing with stock return prediction with high frequency primary price-volume data.
We would like to express sincere appreciations to Maxwell Liu from ShingingMidas Private Fund, Xingyu Fu from Sun Yat-sen University for their generous guidance throughout the project. Also, we are grateful to Kangkang Jiang from Sun Yat-sen University for his assistance all the way. Without their supports, we cannot complish such a challenging task.