Hi, now is the time that AI will conquer the world. So, today I will introduce you a method to trade stock using AI (Simple technique, so you understand it easily — I hope). For this post, the information is for study and experiment purpose only. I don’t recommend you to buy or sell using this algorithm and if you do, please don’t blame me about that, haha. We will follow this paper and all the code that used in this experiment are in this repository (within stock_trading branch).
The reinforcement learning is teaching agent to predict the reward of the action and take the good action from the reward. By define the reward function and state space of game and using linear regression or others algorithm to calculate reward.
Let’s start. First, we will start with select a graph price of stock (in my case, I use ABICO — stock from Stock Exchange Thailand: SET — because I live in Thailand). In this experiment, I use all of data such as open, closed, high, low and volume of one day. You can see in the picture that shown graph price of ABICO (from 1990 to 1998).
Because reinforcement learning mostly use with game criteria, so I program a game from stock data. This game consist of 4 action (buy, waiting for buy, sell, waiting for sell). The game start with 5000 unit of money and when you take action buy or sell, it mean buy or sell all of your asset that you have. For evaluate the algorithm, we need a comparison algorithm which we used random action. We random each action equally and we running trading for 50,000 times of game. The result of random action is shown below, so you can see that the average is about ~2500 unit asset left (Actually,the price in this period is downside, to win the game is to wait and don’t buy anything, you will get 5000 unit left).
To use reinforcement learning, we must define the reward for the agent to understand that is it good action ?. For this game, I used the percent of profit when selling as reward, positive number for good profit and negative for loss. Also the sell action is terminal state for each buy. You can see the following algorithm which come from paper. For the Ø function, I used the history data — previous 60 days data. And to optimize computation time, I define 4 function to do a gradient descent for each action. The different is I didn’t use Deep Learning Structure but I use normal neural network with 4 output for each action, instead.
The result of this experiment is stunning, because the agent try to buy and sell continuously and a little of waiting. But finally the agent finish with around ~5000 unit left. My algorithm need a lot of improvement but I proud of it that it beat random action strategy. I think if it have more patient to wait for buy or sell, it will be smarter.
Finally, thank you for reading. I wish some of you to stand up and program AI, so we will have the best AI in this century. And please wish me for better agent maybe next time it will be better as I expect. If you have any suggestion please leave the comment.