강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화

아빠는 벌레잡이 2023. 2. 14. 23:54

2023. 2. 14. 23:54

데이터 과학자들은 수세기 동안 지금의 RDBMS와는 다른 개념의 데이터베이스를 만들고자 했습니다.
예를 들면 일기 예보같은 데이타를 전부 모은 데이터베이스를 만드는 것입니다.
지금 현재의 기온,습도,바람의 세기,강수량으로 내일의 날씨를 맞추려고 했습니다.
하지만 지금의 RDBMS는 데이터를 정렬하고 결과를 알아내는데 (데이터가 많아지면 많아질수록)불가능하다는 것을 알게 했습니다.
그러나 과학자들은 계속 고집을 부렸습니다.
하둡을 만들어서 더 많은 데이터를 쌓으려 하고 망고DB로 RDBMS의 릴레이션을 깨트리려 했습니다.
데이터 분석을 위해 비주얼라이제이션을 학문화하고 R이라는 언어도 만들었습니다.
그리고 파이썬을 개선해나갔습니다.
그럴수록 이것이 불가능하다는 것을 알게 됩니다.
그래서 한가지 아이디어를 냅니다.
딥런닝으로 모든 데이터를 학습해서 모델을 만들고 모델을 가지고서 답을 찾는거죠.
이제 수초 정도면 내일의 날씨를 예측할 수 있습니다.
슈퍼컴이 아니어도 일반컴으로도 그런 프로그램을 만들 수 있습니다.
이제 우리는 모든 종류의 데이타를 가지고 있습니다.
또 모든 데이터를 (기계가) 학습할 수 있습니다.
아직 고전적 방법의 데이터베이스를 필요로 하는 분야도 있지만 지금은 딥런닝이 만든 모델을 이용한 이 요상한 데이터베이스 방식을 더 선호합니다.
예를 들면 바둑기보를 다 저장한 컴퓨터가 그 데이터를 기반으로 다음 수를 찾는 다면 아마도 시간초과로 사람에게 질겁니다.
지금은 딥런링으로 학습된 데이터베이스를 가지고 있습니다.
슈퍼컴이 아닌 수백대의 컴퓨터가 분산작업을 하고 결과를 냅니다.
우리의 매매프로그램도 마찬가지입니다.
50만장의 차트이미지를 비지도 학습으로 군단화합니다.
결과가 1000가지가 될지 10000가지가 넘을지 아니면 100가지 일지는 알 수 없습니다.
그만큼 이 이미지들의 군단이 들어갈 구멍 즉 qsize가 커야 합니다.
메모리가 허용하는한 최대값으로 해도 상관이 없습니다.
그러나 학습이 완료되게 하려면 자신에 맞게 그 값을 찾아야 합니다.
124정도의 값을 하면 무리가 없을 듯 합니다.
이제 학습률을 조정합니다.
학습률은 경사하강법시 얼마나 Jump up할지의 값(뛸지의 값)이라고 보면 됩니다.
이 값이 크지면 로스(loss)그래프는 뽀족한 파형을 만들 것이고 이 값이 적다면 로스는 처음은 떨어지다가 나중에는 점점 커지게 될 것입니다.
적당한 값이라면 역L자의 그래프를 그릴 것 입니다.
학습된 모델은 pt파일로 저장하므로 처음은 0.001정도로 셋팅을 하여 학습을 진행하다가 로스값이 값이 더 이상 줄지않고
뽀족한 파장을 보인다면 학습을 중단한(ctrl + c 또는 ctrl + z 후 kill %1-linux라면) 다음 0.0001(더 학습이 필요하다면 * 0.001을 꼽하여 학습률을 더 작게 만듭니다.) 정도로 낮추어 학습을 진행해서 로스값이 0에 최대로 가까이 가게 학습합니다.
정확도가 몇프로 정도인지 측정하데 action_kind의 0과 n-1은 무조건 맞아야 하므로 그 값들의 정확도를 다시 측정합니다.
아래 소스는 자신에게 맞게 약간의 수정이 필요합니다.
난이도는 거의 제로라고 보시면 됩니다.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torchvision
import torch.nn.functional as F

# https://pytorch.org/docs/stable/torchvision/datasets.html
# 파이토치에서는 torchvision.datasets에 MNIST 등의 다양한 데이터를 사용하기 용이하게 정리해놨습니다.
# 이를 사용하면 데이터를 따로 학습에 맞게 정리하거나 하지 않아도 바로 사용이 가능합니다.
import torchvision.datasets as dset

# https://pytorch.org/docs/stable/torchvision/transforms.html?highlight=transforms
# torchvision.transforms에는 이미지 데이터를 자르거나 확대 및 다양하게 변형시키는 함수들이 구현되어 있습니다. 
import torchvision.transforms as transforms

# https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader
# DataLoader는 전처리가 끝난 데이터들을 지정한 배치 크기에 맞게 모아서 전달해주는 역할을 합니다.
from torch.utils.data import DataLoader
import pandas as pd

import numpy as np
import matplotlib.pyplot as pltx

import plotly.graph_objects as go
import plotly.subplots as ms
import plotly.express as px
import plotly as plt
import pymysql
import pandas as pd
import numpy as np
import time
import talib
from PIL import Image
import io
import os.path as path
import joblib

ticker = 'ETH'
print(torch.__version__)

batch_size = 500
learning_rate = 0.002
num_epoch = 100
data_size = 100
action_kind = 5
screen_height = 50
screen_width  = 70
qsize = 512

df = pd.read_csv("data_all.dat")
# df = df.tail((df.index.max() // 5) * 5)
df = df.head((df.index.max() // 5) * 5)
# df = df.head(120)
# df = df.tail(250000)

def get_chart(df, idx, max_data:int=300, i_w:int=140, i_h:int=100):
    ndf = df.head(idx).tail(max_data)
    ndf.reset_index(drop=True, inplace=True)
    
    candle = go.Candlestick(x=ndf.index,open=ndf['open'],high=ndf['high'],low=ndf['low'],close=ndf['close'], increasing_line_color = 'red',decreasing_line_color = 'blue', showlegend=False)
    upper = go.Scatter(x=ndf.index, y=ndf['upper'], line=dict(color='red', width=2), name='upper', showlegend=False)
    ma20 = go.Scatter(x=ndf.index, y=ndf['ma20'], line=dict(color='black', width=2), name='ma20', showlegend=False)
    lower = go.Scatter(x=ndf.index, y=ndf['lower'], line=dict(color='blue', width=2), name='lower', showlegend=False)

    volume = go.Bar(x=ndf.index, y=ndf['volume'], marker_color='red', name='volume', showlegend=False)

    MACD = go.Scatter(x=ndf.index, y=ndf['macd'], line=dict(color='blue', width=2), name='MACD', legendgroup='group2', legendgrouptitle_text='MACD')
    MACD_Signal = go.Scatter(x=ndf.index, y=ndf['signal'], line=dict(dash='dashdot', color='green', width=2), name='MACD_Signal')
    MACD_Oscil = go.Bar(x=ndf.index, y=ndf['flag'], marker_color='purple', name='MACD_Oscil')

    fast_k = go.Scatter(x=ndf.index, y=ndf['fast_k'], line=dict(color='skyblue', width=2), name='fast_k', legendgroup='group3', legendgrouptitle_text='%K %D')
    slow_d = go.Scatter(x=ndf.index, y=ndf['slow_d'], line=dict(dash='dashdot', color='black', width=2), name='slow_d')

    PB = go.Scatter(x=ndf.index, y=ndf['PB']*100, line=dict(color='blue', width=2), name='PB', legendgroup='group4', legendgrouptitle_text='PB, MFI')
    MFI10 = go.Scatter(x=ndf.index, y=ndf['MFI10'], line=dict(dash='dashdot', color='green', width=2), name='MFI10')

    RSI = go.Scatter(x=ndf.index, y=ndf['rsi14'], line=dict(color='red', width=2), name='RSI', legendgroup='group5', legendgrouptitle_text='RSI')
    
    # 스타일
    fig = ms.make_subplots(rows=5, cols=2, specs=[[{'rowspan':4},{}],[None,{}],[None,{}],[None,{}],[{},{}]], shared_xaxes=True, horizontal_spacing=0.03, vertical_spacing=0.01)

    fig.add_trace(candle,row=1,col=1)
    fig.add_trace(upper,row=1,col=1)
    fig.add_trace(ma20,row=1,col=1)
    fig.add_trace(lower,row=1,col=1)

    fig.add_trace(volume,row=5,col=1)

    fig.add_trace(candle,row=1,col=2)
    fig.add_trace(upper,row=1,col=2)
    fig.add_trace(ma20,row=1,col=2)
    fig.add_trace(lower,row=1,col=2)

    fig.add_trace(MACD,row=2,col=2)
    fig.add_trace(MACD_Signal,row=2,col=2)
    fig.add_trace(MACD_Oscil,row=2,col=2)

    fig.add_trace(fast_k,row=3,col=2)
    fig.add_trace(slow_d,row=3,col=2)

    fig.add_trace(PB,row=4,col=2)
    fig.add_trace(MFI10,row=4,col=2)

    fig.add_trace(RSI,row=5,col=2)

    # 추세추종
    # trend_fol = 0
    # trend_refol = 0
    # for i in ndf.index:
    #     if ndf['PB'][i] > 0.8 and ndf['MFI10'][i] > 80:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='orange', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)
    #     elif ndf['PB'][i] < 0.2 and ndf['MFI10'][i] < 20:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='darkblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)

    # 역추세추종
    # for i in ndf.index:
    #     if ndf['PB'][i] < 0.05 and ndf['IIP21'][i] > 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='purple', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)  #보라
    #         fig.add_trace(trend_refol,row=1,col=1)
    #     elif df['PB'][i] > 0.95 and ndf['IIP21'][i] < 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='skyblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)  #하늘
    #         fig.add_trace(trend_refol,row=1,col=1)    

    # fig.add_trace(trend_fol,row=1,col=1)
    # 추세추총전략을 통해 캔들차트에 표시합니다.

    # fig.add_trace(trend_refol,row=1,col=1)
    # 역추세 전략을 통해 캔들차트에 표시합니다.
    
    # fig.update_layout(autosize=True, xaxis1_rangeslider_visible=False, xaxis2_rangeslider_visible=False, margin=dict(l=50,r=50,t=50,b=50), template='seaborn', title=f'({ticker})의 날짜: ETH [추세추종전략:오↑파↓] [역추세전략:보↑하↓]')
    # fig.update_xaxes(tickformat='%y년%m월%d일', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray', showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_yaxes(tickformat=',d', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray',showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_traces(xhoverformat='%y년%m월%d일')
    # size = len(img)
    # img = plt.io.to_image(fig, format='png')
    img = Image.open(io.BytesIO(plt.io.to_image(fig, format='png')))
    img.convert("RGB")
    img.thumbnail((i_h, i_w), Image.LANCZOS)
    return img

class Account():
    def __init__(self, df, origin) -> None:
        self.BASIC_FEES = 0.0005
        self.hold_score = 0
        self.byu_score = 0
        self.sell_score = 0        
        self.orgin_money = origin
        self.money = origin
        self.balance = origin
        self.unit  = 0
        self.buy_index = 0
        self.max_rate = 0
        self.df = df
        self.rate = 0
        self.old_rate = 0
        self.bak_unit = 0
        self.bak_balance = 0
    
    def reset(self):
        self.balance = self.orgin_money
        self.money = self.orgin_money
        self.old_rate = 0
        self.unit = 0
        self.rate = 0
        self.buy_index = 0
        
    def select(self, action:int): 
        ret_action = 0
        if self.unit > 0 :
            ret_action = 0 if action == 1 else action
        else:
            ret_action = 0 if action == 2 else action
        return ret_action
    
    def exec_action(self, action, idx):
        real_action = self.select(action)
        if real_action == 1:
            return self.unit_buy(idx), real_action
        elif real_action == 2:
            return self.unit_sell(idx), real_action
        else:
            return self.unit_hold(idx), real_action
    
    def unit_buy(self, index):
        self.back_up()
        buy_balance = self.balance
        self.buy_index = index
        self.old_rate = self.rate
        while True:
            buy_unit = buy_balance / self.df.loc[index, 'close']
            amount = (buy_unit * self.df.loc[index, 'close']) * self.BASIC_FEES + buy_unit * self.df.loc[index, 'close']
            if self.balance < amount :
                buy_balance -= self.balance * 0.0001
            else:
                self.balance -= amount
                self.unit = buy_unit
                self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
                # if index%50 == 0 : print("buy index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
                print("buy index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
                break
        return self.rate
        
    def unit_sell(self, index):
        self.back_up()
        self.old_rate = self.rate
        sell_balance = self.unit * self.df.loc[index, 'close'] - (self.unit * self.df.loc[index, 'close']) * self.BASIC_FEES
        self.balance += sell_balance
        self.unit = 0
        self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
        # if index%50 == 0 : print("sell index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        print("sell index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        self.money = self.balance
        self.max_rate = 0
        return self.rate
    
    def unit_hold(self, index):
        self.old_rate = self.rate
        self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
        # print("index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        return self.rate
    
    def get_newaction(self, index):
        if self.unit > 0:
            hold_rate = ((self.unit * self.df.loc[index, "sma3"] + self.balance) - self.orgin_money) * 100 / self.orgin_money
            sell_rate = ((self.unit * self.df.loc[index, "close"] + self.balance) - self.orgin_money) * 100 / self.orgin_money
            return 0 if hold_rate > sell_rate else 2
        else:
            buy_balance = self.balance
            self.buy_index = index
            self.old_rate = self.rate
            while True:
                buy_unit = buy_balance / self.df.loc[index, 'close']
                amount = (buy_unit * self.df.loc[index, 'close']) * self.BASIC_FEES + buy_unit * self.df.loc[index, 'close']
                if self.balance < amount :
                    buy_balance -= self.balance * 0.0001
                else:
                    self.balance -= amount
                    self.unit = buy_unit
                    buy_rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
                    break
            hold_rate = (self.balance - self.orgin_money) * 100 / self.orgin_money
            return 0 if hold_rate > buy_rate else 1
        
    def back_up(self):
        self.bak_balance = self.balance
        self.bak_unit = self.unit

    def  back_ward(self):
        self.balance = self.bak_balance 
        self.unit = self.bak_unit
                
    def is_bankrupt(self):
        return (self.money < 10000)

class DQN(nn.Module):
    def __init__(self, h, w, outputs, qsize):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, qsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(qsize)
        self.conv3 = nn.Conv2d(qsize, qsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(qsize)

        # Number of Linear input connections depends on output of conv2d layers
        # and therefore the input image size, so compute it.
        # def conv2d_size_out(size, kernel_size = 5, stride = 2):
        #     return (size - (kernel_size - 1) - 1) // stride  + 1
        # convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
        # convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
        # print("convw[%d]  convh[%d]" % (convw, convh))
        linear_input_size = 3 * qsize
        self.head = nn.Linear(linear_input_size, outputs)

    # Called with either one element to determine next action, or a batch
    # during optimization. Returns tensor([[left0exp,right0exp]...]).
    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        return self.head(x.view(x.size(0), -1))

def select_action(df, idx):
    return ((df.loc[idx, "close"] - df.loc[idx, "closemin"]) * (action_kind-1) // (df.loc[idx, "closemax"] - df.loc[idx, "closemin"]))

# gpu가 사용 가능한 경우에는 device를 gpu로 설정하고 불가능하면 cpu로 설정합니다.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
account = Account(df, 50000000)
account.reset()

converter = torchvision.transforms.ToTensor()
charts = []
for idx in df.index:
    if idx <= data_size:
        continue
    else:
        img = get_chart(df, idx, data_size, i_w = screen_width, i_h = screen_height)
        act = select_action(df, idx)
        # display(img)
        img = converter(img).unsqueeze(0)
        act = torch.tensor(act, dtype=torch.int64)
        charts.append([img,act])

joblib.dump(charts, "charts_{0:d}.dmp".format(action_kind))
train_loader = DataLoader(charts,batch_size=batch_size, shuffle=True,num_workers=2,drop_last=True)
test_loader = DataLoader(charts,batch_size=batch_size, shuffle=False,num_workers=2,drop_last=True)

# 모델을 지정한 장치로 올립니다.
model = DQN(screen_height, screen_width, action_kind, qsize).to(device)

if path.exists("pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)):
    model.load_state_dict(torch.load("pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)))
model.train()
# 손실함수로는 크로스엔트로피를 사용합니다.
loss_func = nn.CrossEntropyLoss()

# 최적화함수로는 Adam을 사용합니다.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

loss_arr =[]
for i in range(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device).squeeze(1)
        # print(x)
        y_= label.to(device)
        # print(y_)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()
        
        if j % 100 == 0:
            print(loss.cpu().detach().numpy())
            loss_arr.append(loss.cpu().detach().numpy())

pltx.plot(loss_arr)
pltx.show()

# 맞은 개수, 전체 개수를 저장할 변수를 지정합니다.
correct = 0
total = 0.000000000001

# 인퍼런스 모드를 위해 no_grad 해줍니다.
model.eval()
with torch.no_grad():
# 테스트로더에서 이미지와 정답을 불러옵니다.
    for image,label in test_loader:
        # 두 데이터 모두 장치에 올립니다.
        x = image.to(device).squeeze(1)
        y_= label.to(device)

        # 모델에 데이터를 넣고 결과값을 얻습니다.
        output = model.forward(x)
        
        # https://pytorch.org/docs/stable/torch.html?highlight=max#torch.max
        # torch.max를 이용해 최대 값 및 최대값 인덱스를 뽑아냅니다.
        # 여기서는 최대값은 필요없기 때문에 인덱스만 사용합니다.
        _,output_index = torch.max(output,1)
        
        # 전체 개수는 라벨의 개수로 더해줍니다.
        # 전체 개수를 알고 있음에도 이렇게 하는 이유는 batch_size, drop_last의 영향으로 몇몇 데이터가 잘릴수도 있기 때문입니다.
        total += label.size(0)
        
        # 모델의 결과의 최대값 인덱스와 라벨이 일치하는 개수를 correct에 더해줍니다.
        correct += (output_index == y_).sum().float()

    # 테스트 데이터 전체에 대해 위의 작업을 시행한 후 정확도를 구해줍니다.
    print("Accuracy of Test Data: {}%".format(100*correct/total))
    torch.save(model.state_dict(),"pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device))

    tot = 0.0000000000001
    corr = 0.0
    for idy in df.index:
        x = get_chart(df, idy, data_size, i_w = screen_width, i_h = screen_height)
        x = converter(x).unsqueeze(0).to(device).squeeze(1)

        output = model.forward(x)

        _,action = torch.max(output,1)
        # print("action:", action)
        action = action.cpu().numpy()[0]
        # print("action:", action)
        sel_act = select_action(df, idy)
        tot += 1 if sel_act in [0, action_kind-1] else 0
        corr += 1 if sel_act in [0, action_kind-1] and sel_act == action else 0
        reward, real_action = account.exec_action(2 if action == (action_kind - 1) else (1 if action == 0 else 0) , idy)
        if idy % 100 == 0:
            print("idy:%d action[%d:%d] price [%.4f] unit[%.4f] agent rate:%.05f remind money:%.02f accuracy:%.02f" 
                % (idy, sel_act, action, df.loc[idy, 'close'], account.unit, account.rate, account.balance + account.unit * df.loc[idy, 'close'], 100*corr/tot))

LIST

'python > 자동매매 프로그램' 카테고리의 다른 글

Colab에서 살아남기 (3)	2023.03.07
강화학습을 이용한 비트코인 매매프로그램(14) - 학습 파일 단위 분리 (5)	2023.02.25
강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용 (5)	2023.01.30
강화학습을 이용한 비트코인 매매프로그램(11) - ResNet + RNN 적용 모델 (1)	2022.12.25
강화학습을 이용한 비트코인 매매프로그램(7)-Back Test (1)	2022.12.24

오늘도 아빠는 벌레잡는 중

강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화

'python > 자동매매 프로그램' 카테고리의 다른 글

+ Recent posts

티스토리툴바