'dqn' 태그의 글 목록

dqn

강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화 2023.02.14 1
강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용 2023.01.30 5
강화학습을 이용한 비트코인 매매프로그램(7)-Back Test 2022.12.24 1
강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장 2022.12.21
강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델 2022.12.20
강화학습을 이용한 비트코인 매매프로그램(6)-Buroto Force학습 2022.12.04 17

강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화

아빠는 벌레잡이 2023. 2. 14. 23:54

2023. 2. 14. 23:54

데이터 과학자들은 수세기 동안 지금의 RDBMS와는 다른 개념의 데이터베이스를 만들고자 했습니다.
예를 들면 일기 예보같은 데이타를 전부 모은 데이터베이스를 만드는 것입니다.
지금 현재의 기온,습도,바람의 세기,강수량으로 내일의 날씨를 맞추려고 했습니다.
하지만 지금의 RDBMS는 데이터를 정렬하고 결과를 알아내는데 (데이터가 많아지면 많아질수록)불가능하다는 것을 알게 했습니다.
그러나 과학자들은 계속 고집을 부렸습니다.
하둡을 만들어서 더 많은 데이터를 쌓으려 하고 망고DB로 RDBMS의 릴레이션을 깨트리려 했습니다.
데이터 분석을 위해 비주얼라이제이션을 학문화하고 R이라는 언어도 만들었습니다.
그리고 파이썬을 개선해나갔습니다.
그럴수록 이것이 불가능하다는 것을 알게 됩니다.
그래서 한가지 아이디어를 냅니다.
딥런닝으로 모든 데이터를 학습해서 모델을 만들고 모델을 가지고서 답을 찾는거죠.
이제 수초 정도면 내일의 날씨를 예측할 수 있습니다.
슈퍼컴이 아니어도 일반컴으로도 그런 프로그램을 만들 수 있습니다.
이제 우리는 모든 종류의 데이타를 가지고 있습니다.
또 모든 데이터를 (기계가) 학습할 수 있습니다.
아직 고전적 방법의 데이터베이스를 필요로 하는 분야도 있지만 지금은 딥런닝이 만든 모델을 이용한 이 요상한 데이터베이스 방식을 더 선호합니다.
예를 들면 바둑기보를 다 저장한 컴퓨터가 그 데이터를 기반으로 다음 수를 찾는 다면 아마도 시간초과로 사람에게 질겁니다.
지금은 딥런링으로 학습된 데이터베이스를 가지고 있습니다.
슈퍼컴이 아닌 수백대의 컴퓨터가 분산작업을 하고 결과를 냅니다.
우리의 매매프로그램도 마찬가지입니다.
50만장의 차트이미지를 비지도 학습으로 군단화합니다.
결과가 1000가지가 될지 10000가지가 넘을지 아니면 100가지 일지는 알 수 없습니다.
그만큼 이 이미지들의 군단이 들어갈 구멍 즉 qsize가 커야 합니다.
메모리가 허용하는한 최대값으로 해도 상관이 없습니다.
그러나 학습이 완료되게 하려면 자신에 맞게 그 값을 찾아야 합니다.
124정도의 값을 하면 무리가 없을 듯 합니다.
이제 학습률을 조정합니다.
학습률은 경사하강법시 얼마나 Jump up할지의 값(뛸지의 값)이라고 보면 됩니다.
이 값이 크지면 로스(loss)그래프는 뽀족한 파형을 만들 것이고 이 값이 적다면 로스는 처음은 떨어지다가 나중에는 점점 커지게 될 것입니다.
적당한 값이라면 역L자의 그래프를 그릴 것 입니다.
학습된 모델은 pt파일로 저장하므로 처음은 0.001정도로 셋팅을 하여 학습을 진행하다가 로스값이 값이 더 이상 줄지않고
뽀족한 파장을 보인다면 학습을 중단한(ctrl + c 또는 ctrl + z 후 kill %1-linux라면) 다음 0.0001(더 학습이 필요하다면 * 0.001을 꼽하여 학습률을 더 작게 만듭니다.) 정도로 낮추어 학습을 진행해서 로스값이 0에 최대로 가까이 가게 학습합니다.
정확도가 몇프로 정도인지 측정하데 action_kind의 0과 n-1은 무조건 맞아야 하므로 그 값들의 정확도를 다시 측정합니다.
아래 소스는 자신에게 맞게 약간의 수정이 필요합니다.
난이도는 거의 제로라고 보시면 됩니다.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torchvision
import torch.nn.functional as F

# https://pytorch.org/docs/stable/torchvision/datasets.html
# 파이토치에서는 torchvision.datasets에 MNIST 등의 다양한 데이터를 사용하기 용이하게 정리해놨습니다.
# 이를 사용하면 데이터를 따로 학습에 맞게 정리하거나 하지 않아도 바로 사용이 가능합니다.
import torchvision.datasets as dset

# https://pytorch.org/docs/stable/torchvision/transforms.html?highlight=transforms
# torchvision.transforms에는 이미지 데이터를 자르거나 확대 및 다양하게 변형시키는 함수들이 구현되어 있습니다. 
import torchvision.transforms as transforms

# https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader
# DataLoader는 전처리가 끝난 데이터들을 지정한 배치 크기에 맞게 모아서 전달해주는 역할을 합니다.
from torch.utils.data import DataLoader
import pandas as pd

import numpy as np
import matplotlib.pyplot as pltx

import plotly.graph_objects as go
import plotly.subplots as ms
import plotly.express as px
import plotly as plt
import pymysql
import pandas as pd
import numpy as np
import time
import talib
from PIL import Image
import io
import os.path as path
import joblib

ticker = 'ETH'
print(torch.__version__)

batch_size = 500
learning_rate = 0.002
num_epoch = 100
data_size = 100
action_kind = 5
screen_height = 50
screen_width  = 70
qsize = 512

df = pd.read_csv("data_all.dat")
# df = df.tail((df.index.max() // 5) * 5)
df = df.head((df.index.max() // 5) * 5)
# df = df.head(120)
# df = df.tail(250000)

def get_chart(df, idx, max_data:int=300, i_w:int=140, i_h:int=100):
    ndf = df.head(idx).tail(max_data)
    ndf.reset_index(drop=True, inplace=True)
    
    candle = go.Candlestick(x=ndf.index,open=ndf['open'],high=ndf['high'],low=ndf['low'],close=ndf['close'], increasing_line_color = 'red',decreasing_line_color = 'blue', showlegend=False)
    upper = go.Scatter(x=ndf.index, y=ndf['upper'], line=dict(color='red', width=2), name='upper', showlegend=False)
    ma20 = go.Scatter(x=ndf.index, y=ndf['ma20'], line=dict(color='black', width=2), name='ma20', showlegend=False)
    lower = go.Scatter(x=ndf.index, y=ndf['lower'], line=dict(color='blue', width=2), name='lower', showlegend=False)

    volume = go.Bar(x=ndf.index, y=ndf['volume'], marker_color='red', name='volume', showlegend=False)

    MACD = go.Scatter(x=ndf.index, y=ndf['macd'], line=dict(color='blue', width=2), name='MACD', legendgroup='group2', legendgrouptitle_text='MACD')
    MACD_Signal = go.Scatter(x=ndf.index, y=ndf['signal'], line=dict(dash='dashdot', color='green', width=2), name='MACD_Signal')
    MACD_Oscil = go.Bar(x=ndf.index, y=ndf['flag'], marker_color='purple', name='MACD_Oscil')

    fast_k = go.Scatter(x=ndf.index, y=ndf['fast_k'], line=dict(color='skyblue', width=2), name='fast_k', legendgroup='group3', legendgrouptitle_text='%K %D')
    slow_d = go.Scatter(x=ndf.index, y=ndf['slow_d'], line=dict(dash='dashdot', color='black', width=2), name='slow_d')

    PB = go.Scatter(x=ndf.index, y=ndf['PB']*100, line=dict(color='blue', width=2), name='PB', legendgroup='group4', legendgrouptitle_text='PB, MFI')
    MFI10 = go.Scatter(x=ndf.index, y=ndf['MFI10'], line=dict(dash='dashdot', color='green', width=2), name='MFI10')

    RSI = go.Scatter(x=ndf.index, y=ndf['rsi14'], line=dict(color='red', width=2), name='RSI', legendgroup='group5', legendgrouptitle_text='RSI')
    
    # 스타일
    fig = ms.make_subplots(rows=5, cols=2, specs=[[{'rowspan':4},{}],[None,{}],[None,{}],[None,{}],[{},{}]], shared_xaxes=True, horizontal_spacing=0.03, vertical_spacing=0.01)

    fig.add_trace(candle,row=1,col=1)
    fig.add_trace(upper,row=1,col=1)
    fig.add_trace(ma20,row=1,col=1)
    fig.add_trace(lower,row=1,col=1)

    fig.add_trace(volume,row=5,col=1)

    fig.add_trace(candle,row=1,col=2)
    fig.add_trace(upper,row=1,col=2)
    fig.add_trace(ma20,row=1,col=2)
    fig.add_trace(lower,row=1,col=2)

    fig.add_trace(MACD,row=2,col=2)
    fig.add_trace(MACD_Signal,row=2,col=2)
    fig.add_trace(MACD_Oscil,row=2,col=2)

    fig.add_trace(fast_k,row=3,col=2)
    fig.add_trace(slow_d,row=3,col=2)

    fig.add_trace(PB,row=4,col=2)
    fig.add_trace(MFI10,row=4,col=2)

    fig.add_trace(RSI,row=5,col=2)

    # 추세추종
    # trend_fol = 0
    # trend_refol = 0
    # for i in ndf.index:
    #     if ndf['PB'][i] > 0.8 and ndf['MFI10'][i] > 80:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='orange', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)
    #     elif ndf['PB'][i] < 0.2 and ndf['MFI10'][i] < 20:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='darkblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)

    # 역추세추종
    # for i in ndf.index:
    #     if ndf['PB'][i] < 0.05 and ndf['IIP21'][i] > 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='purple', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)  #보라
    #         fig.add_trace(trend_refol,row=1,col=1)
    #     elif df['PB'][i] > 0.95 and ndf['IIP21'][i] < 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='skyblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)  #하늘
    #         fig.add_trace(trend_refol,row=1,col=1)    

    # fig.add_trace(trend_fol,row=1,col=1)
    # 추세추총전략을 통해 캔들차트에 표시합니다.

    # fig.add_trace(trend_refol,row=1,col=1)
    # 역추세 전략을 통해 캔들차트에 표시합니다.
    
    # fig.update_layout(autosize=True, xaxis1_rangeslider_visible=False, xaxis2_rangeslider_visible=False, margin=dict(l=50,r=50,t=50,b=50), template='seaborn', title=f'({ticker})의 날짜: ETH [추세추종전략:오↑파↓] [역추세전략:보↑하↓]')
    # fig.update_xaxes(tickformat='%y년%m월%d일', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray', showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_yaxes(tickformat=',d', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray',showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_traces(xhoverformat='%y년%m월%d일')
    # size = len(img)
    # img = plt.io.to_image(fig, format='png')
    img = Image.open(io.BytesIO(plt.io.to_image(fig, format='png')))
    img.convert("RGB")
    img.thumbnail((i_h, i_w), Image.LANCZOS)
    return img

class Account():
    def __init__(self, df, origin) -> None:
        self.BASIC_FEES = 0.0005
        self.hold_score = 0
        self.byu_score = 0
        self.sell_score = 0        
        self.orgin_money = origin
        self.money = origin
        self.balance = origin
        self.unit  = 0
        self.buy_index = 0
        self.max_rate = 0
        self.df = df
        self.rate = 0
        self.old_rate = 0
        self.bak_unit = 0
        self.bak_balance = 0
    
    def reset(self):
        self.balance = self.orgin_money
        self.money = self.orgin_money
        self.old_rate = 0
        self.unit = 0
        self.rate = 0
        self.buy_index = 0
        
    def select(self, action:int): 
        ret_action = 0
        if self.unit > 0 :
            ret_action = 0 if action == 1 else action
        else:
            ret_action = 0 if action == 2 else action
        return ret_action
    
    def exec_action(self, action, idx):
        real_action = self.select(action)
        if real_action == 1:
            return self.unit_buy(idx), real_action
        elif real_action == 2:
            return self.unit_sell(idx), real_action
        else:
            return self.unit_hold(idx), real_action
    
    def unit_buy(self, index):
        self.back_up()
        buy_balance = self.balance
        self.buy_index = index
        self.old_rate = self.rate
        while True:
            buy_unit = buy_balance / self.df.loc[index, 'close']
            amount = (buy_unit * self.df.loc[index, 'close']) * self.BASIC_FEES + buy_unit * self.df.loc[index, 'close']
            if self.balance < amount :
                buy_balance -= self.balance * 0.0001
            else:
                self.balance -= amount
                self.unit = buy_unit
                self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
                # if index%50 == 0 : print("buy index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
                print("buy index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
                break
        return self.rate
        
    def unit_sell(self, index):
        self.back_up()
        self.old_rate = self.rate
        sell_balance = self.unit * self.df.loc[index, 'close'] - (self.unit * self.df.loc[index, 'close']) * self.BASIC_FEES
        self.balance += sell_balance
        self.unit = 0
        self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
        # if index%50 == 0 : print("sell index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        print("sell index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        self.money = self.balance
        self.max_rate = 0
        return self.rate
    
    def unit_hold(self, index):
        self.old_rate = self.rate
        self.rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
        # print("index[{}]hold unit[{:,.6f}] remind money[{:,.6f}] rate[{:,.8f}] expected[{:,.8f}]".format(index, self.unit, self.balance, self.rate, (self.unit * self.df.loc[index, 'close'] + self.balance)))
        return self.rate
    
    def get_newaction(self, index):
        if self.unit > 0:
            hold_rate = ((self.unit * self.df.loc[index, "sma3"] + self.balance) - self.orgin_money) * 100 / self.orgin_money
            sell_rate = ((self.unit * self.df.loc[index, "close"] + self.balance) - self.orgin_money) * 100 / self.orgin_money
            return 0 if hold_rate > sell_rate else 2
        else:
            buy_balance = self.balance
            self.buy_index = index
            self.old_rate = self.rate
            while True:
                buy_unit = buy_balance / self.df.loc[index, 'close']
                amount = (buy_unit * self.df.loc[index, 'close']) * self.BASIC_FEES + buy_unit * self.df.loc[index, 'close']
                if self.balance < amount :
                    buy_balance -= self.balance * 0.0001
                else:
                    self.balance -= amount
                    self.unit = buy_unit
                    buy_rate = ((self.unit * self.df.loc[index, 'close'] + self.balance) - self.orgin_money) * 100 / self.orgin_money
                    break
            hold_rate = (self.balance - self.orgin_money) * 100 / self.orgin_money
            return 0 if hold_rate > buy_rate else 1
        
    def back_up(self):
        self.bak_balance = self.balance
        self.bak_unit = self.unit

    def  back_ward(self):
        self.balance = self.bak_balance 
        self.unit = self.bak_unit
                
    def is_bankrupt(self):
        return (self.money < 10000)

class DQN(nn.Module):
    def __init__(self, h, w, outputs, qsize):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, qsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(qsize)
        self.conv3 = nn.Conv2d(qsize, qsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(qsize)

        # Number of Linear input connections depends on output of conv2d layers
        # and therefore the input image size, so compute it.
        # def conv2d_size_out(size, kernel_size = 5, stride = 2):
        #     return (size - (kernel_size - 1) - 1) // stride  + 1
        # convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
        # convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
        # print("convw[%d]  convh[%d]" % (convw, convh))
        linear_input_size = 3 * qsize
        self.head = nn.Linear(linear_input_size, outputs)

    # Called with either one element to determine next action, or a batch
    # during optimization. Returns tensor([[left0exp,right0exp]...]).
    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        return self.head(x.view(x.size(0), -1))

def select_action(df, idx):
    return ((df.loc[idx, "close"] - df.loc[idx, "closemin"]) * (action_kind-1) // (df.loc[idx, "closemax"] - df.loc[idx, "closemin"]))

# gpu가 사용 가능한 경우에는 device를 gpu로 설정하고 불가능하면 cpu로 설정합니다.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
account = Account(df, 50000000)
account.reset()

converter = torchvision.transforms.ToTensor()
charts = []
for idx in df.index:
    if idx <= data_size:
        continue
    else:
        img = get_chart(df, idx, data_size, i_w = screen_width, i_h = screen_height)
        act = select_action(df, idx)
        # display(img)
        img = converter(img).unsqueeze(0)
        act = torch.tensor(act, dtype=torch.int64)
        charts.append([img,act])

joblib.dump(charts, "charts_{0:d}.dmp".format(action_kind))
train_loader = DataLoader(charts,batch_size=batch_size, shuffle=True,num_workers=2,drop_last=True)
test_loader = DataLoader(charts,batch_size=batch_size, shuffle=False,num_workers=2,drop_last=True)

# 모델을 지정한 장치로 올립니다.
model = DQN(screen_height, screen_width, action_kind, qsize).to(device)

if path.exists("pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)):
    model.load_state_dict(torch.load("pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)))
model.train()
# 손실함수로는 크로스엔트로피를 사용합니다.
loss_func = nn.CrossEntropyLoss()

# 최적화함수로는 Adam을 사용합니다.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

loss_arr =[]
for i in range(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device).squeeze(1)
        # print(x)
        y_= label.to(device)
        # print(y_)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()
        
        if j % 100 == 0:
            print(loss.cpu().detach().numpy())
            loss_arr.append(loss.cpu().detach().numpy())

pltx.plot(loss_arr)
pltx.show()

# 맞은 개수, 전체 개수를 저장할 변수를 지정합니다.
correct = 0
total = 0.000000000001

# 인퍼런스 모드를 위해 no_grad 해줍니다.
model.eval()
with torch.no_grad():
# 테스트로더에서 이미지와 정답을 불러옵니다.
    for image,label in test_loader:
        # 두 데이터 모두 장치에 올립니다.
        x = image.to(device).squeeze(1)
        y_= label.to(device)

        # 모델에 데이터를 넣고 결과값을 얻습니다.
        output = model.forward(x)
        
        # https://pytorch.org/docs/stable/torch.html?highlight=max#torch.max
        # torch.max를 이용해 최대 값 및 최대값 인덱스를 뽑아냅니다.
        # 여기서는 최대값은 필요없기 때문에 인덱스만 사용합니다.
        _,output_index = torch.max(output,1)
        
        # 전체 개수는 라벨의 개수로 더해줍니다.
        # 전체 개수를 알고 있음에도 이렇게 하는 이유는 batch_size, drop_last의 영향으로 몇몇 데이터가 잘릴수도 있기 때문입니다.
        total += label.size(0)
        
        # 모델의 결과의 최대값 인덱스와 라벨이 일치하는 개수를 correct에 더해줍니다.
        correct += (output_index == y_).sum().float()

    # 테스트 데이터 전체에 대해 위의 작업을 시행한 후 정확도를 구해줍니다.
    print("Accuracy of Test Data: {}%".format(100*correct/total))
    torch.save(model.state_dict(),"pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device))

    tot = 0.0000000000001
    corr = 0.0
    for idy in df.index:
        x = get_chart(df, idy, data_size, i_w = screen_width, i_h = screen_height)
        x = converter(x).unsqueeze(0).to(device).squeeze(1)

        output = model.forward(x)

        _,action = torch.max(output,1)
        # print("action:", action)
        action = action.cpu().numpy()[0]
        # print("action:", action)
        sel_act = select_action(df, idy)
        tot += 1 if sel_act in [0, action_kind-1] else 0
        corr += 1 if sel_act in [0, action_kind-1] and sel_act == action else 0
        reward, real_action = account.exec_action(2 if action == (action_kind - 1) else (1 if action == 0 else 0) , idy)
        if idy % 100 == 0:
            print("idy:%d action[%d:%d] price [%.4f] unit[%.4f] agent rate:%.05f remind money:%.02f accuracy:%.02f" 
                % (idy, sel_act, action, df.loc[idy, 'close'], account.unit, account.rate, account.balance + account.unit * df.loc[idy, 'close'], 100*corr/tot))

LIST

'python > 자동매매 프로그램' 카테고리의 다른 글

Colab에서 살아남기 (3)	2023.03.07
강화학습을 이용한 비트코인 매매프로그램(14) - 학습 파일 단위 분리 (5)	2023.02.25
강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용 (5)	2023.01.30
강화학습을 이용한 비트코인 매매프로그램(11) - ResNet + RNN 적용 모델 (1)	2022.12.25
강화학습을 이용한 비트코인 매매프로그램(7)-Back Test (1)	2022.12.24

강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용

아빠는 벌레잡이 2023. 1. 30. 07:34

2023. 1. 30. 07:34

from lib.UpbitTrade import UpbitTrade
from lib.ColoredLogger import COLORS_LOG as cl
from lib.Upsert import aio_data_update
# from lib.upbitSellBuy import *
from lib.dl_upbitSellBuy import predict
from lib.Common import send_slack
from lib.Common import make_real_rate
from lib.Common import dfTrend
from lib.Common import dfInclease
import pyupbit
import time
import sys
import os
import aiomysql
import pymysql
import asyncio

import logging
import logging.config
import json

import common.constrant as Const
import config.localconf as conf
import pandas as pd
import numpy as np

config = json.load(open('config/logging.json'))
logging.config.dictConfig(config)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

table_name="xxx.TB_ETH_TRADE"
table_report="xxx.TB_ETH_DAILY_REPORT"
table_config="xxx.TB_ETH_CASH_CONFIG"
table_crashed="xxx.TB_CRASHED_REPORT"
table_realrate="xxx.TB_REAL_REPORT"

pd.set_option('display.max_columns', 13)
pd.set_option('display.width', 200)

def get_avg_buy_price():
    try : 
        amount = upbit.get_amount('ALL')
        unit = trade.get_balance()[0]
        if unit == 0:
            return 0
        else:
            return amount / unit
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        logger.error("get_avg_buy_price exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
        return 0

def get_avg_sell_price():
    try:
        balances, req = upbit.get_balances(contain_req=True)
        if len(balances) == 0:
            return 0

        avg_sell_price = 0
        for x in balances:
            if x['currency'] == conf.TRADE_TICKER:
                avg_sell_price = float(x['avg_sell_price'])
                break
        return avg_sell_price
    except Exception as x:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        logger.error("get_avg_sell_price exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
        return 0

async def read_all(sqlcon) -> pd.DataFrame:
    #read Backup DB 
    try:
        cursor = await sqlcon.cursor(aiomysql.cursors.DictCursor)
        str_query = "select * from %s" % table_name
        await cursor.execute(str_query)
        data = await cursor.fetchall()
        datadf = pd.DataFrame(data)
        return datadf
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        logger.error("read from db exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
    finally:
        await cursor.close()

async def read_from(sqlcon, ntime) -> pd.DataFrame:
    try:
        cursor = await sqlcon.cursor(aiomysql.cursors.DictCursor)
        str_query = "select * from %s where `time` > %d" % (table_name, ntime)
        # logger.info("str_query:%s", str_query)
        await cursor.execute(str_query)
        data = await cursor.fetchall()
        datadf = pd.DataFrame(data)
        # logger.info("return df")
        # logger.info(datadf)
        return datadf
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        logger.error("read from db exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
    finally:
        await cursor.close()

async def trade_proc():
    last_time = ""
    krw = 0
    first_time = True
    avg_buy_price = 0.0
    avg_sell_price = 0.0
    slack_flag = 0
    old_rate = 0
    rate_trend = ""

    try:
        pool = await aiomysql.create_pool(host="193.xxx.xxx.xx", user="xxxxxx", password='xxxxxxx', db='xxxxx')
        async with pool.acquire() as sqlcon:
            
            df = await read_all(sqlcon)
            idx = df.index.max()
            idx_time = df.loc[idx, 'time']
            sqlcon.close()
        while True:
            async with pool.acquire() as sqlcon:
                try:
                    await get_trade_yn(sqlcon)
                    start_time = time.time()

                    add_df = await read_from(sqlcon, idx_time)
                    df = pd.concat([df, add_df])
                    df = df.drop_duplicates(['time'], keep='first', inplace=False, ignore_index=True)
                    try:
                        avg_buy_price = get_avg_buy_price()
                        logger.info(cl["BOLD"] + cl["RED"] + "평균매수단가:%.2f" + cl["RESET"], avg_buy_price)
                    except Exception as ex:
                        exc_type, exc_obj, exc_tb = sys.exc_info()
                        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                        logger.error("get_avg_buy_price -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))

                    ###잔고 조회
                    current_unit = 0.0
                    try:
                        balance = trade.get_balance()
                        price = pyupbit.get_current_price("KRW-" + conf.TRADE_TICKER)
                        logger.info("현재가 : %.4f" % price)
                        limit_cash = (balance[0] * price + balance[1]) * conf.REINVESTMENT_RATE if balance[0] != 0 or balance[1] != 0 else base_balance
                        logger.info("LIMIT_CASH: %.2f", limit_cash)
                        trade.set_limitcash(Const.LIMIT_CASH)
                        rate = ((balance[0]*price + balance[1] - base_balance)*100)/base_balance if base_balance is not None and base_balance != 0 else 0
                        trend_rate = old_rate - rate
                        old_rate = rate
                        if trend_rate == 0:
                            rate_trend += "="
                        elif trend_rate > 0:
                            rate_trend += "V"
                        else:
                            rate_trend += "^"

                        if len(rate_trend) > 5:
                            rate_trend = rate_trend[-5:]

                        logger.info(cl["ITELIC"] + cl["RED"] + "평가금액:%.2f 수익률: %.2f%% %s" + cl["RESET"], balance[0]*price + balance[1], rate, rate_trend)
                        #현재의 원화 잔고 얻기
                        krw = balance[1]
                        current_unit = balance[0]
                        logger.info(cl["BOLD"] + cl["RED"] + "잔고조회: %.5f %.2f" + cl["RESET"], balance[0], balance[1])
                    except Exception as ex:
                        exc_type, exc_obj, exc_tb = sys.exc_info()
                        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                        logger.error("get_current_price -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))

                    logger.info(cl["BOLD"] + cl["GREEN"] + "clsmin %.2f(%s), clsmax:%.2f(%s), close:%.2f(%s)" + cl["RESET"], 
                        df.loc[idx, 'clsmin'], dfInclease(df.index.max(), df, "clsmin"), df.loc[idx, 'clsmax'], 
                        dfInclease(df.index.max(), df, "clsmax"), df.loc[idx, 'close'], dfInclease(df.index.max(), df, "close"))
                    logger.info(cl["BOLD"] + cl["GREEN"] + "min_rate:%.2f%%(%s) %s max_rate:%.2f%%(%s) %s" + cl["RESET"], 
                        df.loc[idx, 'min_rate'], dfInclease(df.index.max(), df, "min_rate"), 
                        dfTrend(df.index.max(), df, "minratedeg"), df.loc[idx, 'max_rate'], 
                        dfInclease(df.index.max(), df, "max_rate"), 
                        dfTrend(df.index.max(), df, "maxratedeg"))
                    logger.info("time date:" + time.strftime("%Y%m%d%H%M%S", time.localtime(idx_time)))
                    logger.info(cl["BOLD"] + cl["GREEN"] + "trade_ok:%s" + cl["RESET"], str(trade_ok))
                    dx = df
                    pred = predict(logger, dx, idx)
                    logger.info(cl["BOLD"] + cl["GREEN"] + "predict action:%s" + cl["RESET"], str(pred.get_dl_action()))
                    if pred.fsell() and current_unit > 0 :
                        logger.info("매도 타이밍: %s ", time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(idx_time)))
                        if trade_ok == True:
                            ret = trade.sell_crypto_currency()
                            if ret is not None:
                                cursor = await sqlcon.cursor(aiomysql.cursors.DictCursor)
                                try:
                                    ret = upbit.get_order("KRW-" + conf.TRADE_TICKER)
                                    while len(ret) != 0 : 
                                        ret = upbit.get_order("KRW-" + conf.TRADE_TICKER)
                                        await asyncio.sleep(0.01)
                                    str_query = """SELECT * 
                                        FROM main.TB_TRADE_HIST 
                                        WHERE user_id='%s' 
                                        AND qty > 0 
                                        AND end_dt = 0""" % user
                                    await cursor.execute(str_query)
                                    data = await cursor.fetchall()
                                    if len(data) > 0:
                                        fatch_result = pd.DataFrame(data)
                                        start_dt = fatch_result['start_dt'][0]
                                        qty = fatch_result['qty'][0]
                                        # logger.info("start_dt:%d", start_dt)
                                        str_query = """
                                        UPDATE main.TB_TRADE_HIST 
                                        SET 
                                            sell_qty = qty
                                            , qty=0
                                            , close=%f 
                                            , sell_sum =%f
                                            , end_dt=%d
                                        WHERE user_id = '%s' AND start_dt = %d""" % (
                                            df.loc[idx, 'close'], df.loc[idx, 'close']*qty, time.time(),  user, start_dt
                                        )
                                        await cursor.execute(str_query)
                                        await sqlcon.commit()
                                except Exception as ex:
                                    exc_type, exc_obj, exc_tb = sys.exc_info()
                                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                                    logger.error("read db error -> exception! " + str(ex) + "` %s: %d", fname, exc_tb.tb_lineno)
                                finally:
                                    await cursor.close()
                    elif pred.fbuy():
                        logger.info("매수 타이밍 %s", time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(idx_time)))
                        if trade_ok == True:
                            ret = trade.buy_crypto_currency()
                            if ret is not None:
                                try:
                                    ret = upbit.get_order("KRW-" + conf.TRADE_TICKER)
                                    while len(ret) != 0 :
                                        ret = upbit.get_order("KRW-" + conf.TRADE_TICKER)
                                        await asyncio.sleep(0.01)
                                    avg_buy_price = get_avg_buy_price()
                                    balance = trade.get_balance()
                                    trdhist_clmns = ['user_id', 'start_dt', 'ticker', 'qty', 'cost', 'cost_sum', 'sell_qty', 'close', 'sell_sum', 'end_dt']
                                    trdhist_data  = [[user, df.loc[idx, 'time'], conf.TRADE_TICKER, balance[0], avg_buy_price, balance[0] * avg_buy_price,
                                                    0, 0, 0, 0]]
                                    trdhist_df = pd.DataFrame.from_records(data=trdhist_data, columns=trdhist_clmns)
                                    await aio_data_update(trdhist_df, sqlcon, "main.TB_TRADE_HIST", "replace")
                                except Exception as ex:
                                    exc_type, exc_obj, exc_tb = sys.exc_info()
                                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                                    logger.error("read db error -> exception! " + str(ex) + "` %s: %d", fname, exc_tb.tb_lineno)
                                finally:
                                    await cursor.close()

                    logger.info(df.tail(5))

                    idx = df.index.max()
                    idx_time = df.loc[idx, 'time']
                    sqlcon.close()
                except Exception as ex:
                    exc_type, exc_obj, exc_tb = sys.exc_info()
                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                    logger.error("`main -> exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
                    time.sleep(60)
                finally:
                    end_time = time.time()
                    if (end_time - start_time) > 0 and (end_time - start_time) <= Const.SLEEP_TIME:
                        await asyncio.sleep(Const.SLEEP_TIME - (end_time - start_time))
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        logger.error("`main -> exception! %s ` : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
        time.sleep(60)
    finally:
        pool.close()
        await pool.wait_closed()         

async def get_trade_yn(sqlcon):
    global trade_ok
    global base_balance
    user = os.getenv('user')
    
    str_sql = """
        SELECT 
            a.TRADE_YN
            ,a.BASE_BALANCE
        FROM main.TN_PAYMENT_ACNT a
        WHERE a.USER_ID = '%s'
    """ % user
    cursor = await sqlcon.cursor(aiomysql.cursors.DictCursor)
    await cursor.execute(str_sql)
    user_info = pd.DataFrame(await cursor.fetchall())
    await cursor.close()

    trade_yn = np.asarray(user_info)[0][0]
    base_balance = float(np.asarray(user_info)[0][1])
    trade_ok = (trade_yn == 'Y')

def get_env():
    global upbit
    global trade
    global user
    user = os.getenv('user')
    logger.info("user[%s]", user)
    
    sqlcon = pymysql.connect(host='193.xxx.xxx.xx', user='xxxxxx', password='xxxxxxx', db='xxxx')
    str_sql = """
        SELECT 
            a.PUB_KEY
            ,a.APP_KEY
            ,a.SECU_KEY
            ,a.BASE_BALANCE
        FROM TN_PAYMENT_ACNT a
        WHERE a.USER_ID = '%s'
    """ % user
    cursor = sqlcon.cursor(pymysql.cursors.DictCursor)
    cursor.execute(str_sql)
    user_info = pd.DataFrame(cursor.fetchall())
    cursor.close()

    pub_key = np.asarray(user_info)[0][0]
    app_key = np.asarray(user_info)[0][1]
    sec_key = np.asarray(user_info)[0][2]
    base_balance = float(np.asarray(user_info)[0][3])

    # aes = AESCryptoCBC(pub_key)

    enc_app_key = app_key
    enc_sec_key = sec_key

    upbit = pyupbit.Upbit(enc_app_key, enc_sec_key)
    trade  = UpbitTrade(upbit, logger, conf.TRADE_TICKER, base_balance, base_balance * conf.REINVESTMENT_RATE)

if __name__ == '__main__':
    get_env()
    asyncio.run(trade_proc())

import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F

import plotly.graph_objects as go
import plotly.subplots as ms
import plotly as plt
from PIL import Image
import io
import os.path as path
import talib
import numpy as np

data_size = 100
action_kind = 4
screen_height = 50
screen_width  = 70

class DQN(nn.Module):
    def __init__(self, device, h, w, outputs, qsize):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, qsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(qsize)
        self.device = device
        self.conv3 = nn.Conv2d(qsize, qsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(qsize)

        linear_input_size = 3 * qsize
        self.head = nn.Linear(linear_input_size, outputs)

    # Called with either one element to determine next action, or a batch
    # during optimization. Returns tensor([[left0exp,right0exp]...]).
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        # print("x1------->", x.size())
        x = F.relu(self.bn2(self.conv2(x)))
        # print("x2------->", x.size())
        x = F.relu(self.bn3(self.conv3(x)))
        # print("x3------->", x.size())
        return self.head(x.view(x.size(0), -1))

def get_chart(df, idx, max_data:int=300, i_w:int=140, i_h:int=100):
    ndf = df.head(idx).tail(max_data)
    ndf.reset_index(drop=True, inplace=True)
    
    candle = go.Candlestick(x=ndf.index,open=ndf['open'],high=ndf['high'],low=ndf['low'],close=ndf['close'], increasing_line_color = 'red',decreasing_line_color = 'blue', showlegend=False)
    upper = go.Scatter(x=ndf.index, y=ndf['upper'], line=dict(color='red', width=2), name='upper', showlegend=False)
    ma20 = go.Scatter(x=ndf.index, y=ndf['ma20'], line=dict(color='black', width=2), name='ma20', showlegend=False)
    lower = go.Scatter(x=ndf.index, y=ndf['lower'], line=dict(color='blue', width=2), name='lower', showlegend=False)

    volume = go.Bar(x=ndf.index, y=ndf['volume'], marker_color='red', name='volume', showlegend=False)

    MACD = go.Scatter(x=ndf.index, y=ndf['macd'], line=dict(color='blue', width=2), name='MACD', legendgroup='group2', legendgrouptitle_text='MACD')
    MACD_Signal = go.Scatter(x=ndf.index, y=ndf['signal'], line=dict(dash='dashdot', color='green', width=2), name='MACD_Signal')
    MACD_Oscil = go.Bar(x=ndf.index, y=ndf['flag'], marker_color='purple', name='MACD_Oscil')

    fast_k = go.Scatter(x=ndf.index, y=ndf['fast_k'], line=dict(color='skyblue', width=2), name='fast_k', legendgroup='group3', legendgrouptitle_text='%K %D')
    slow_d = go.Scatter(x=ndf.index, y=ndf['slow_d'], line=dict(dash='dashdot', color='black', width=2), name='slow_d')

    PB = go.Scatter(x=ndf.index, y=ndf['PB']*100, line=dict(color='blue', width=2), name='PB', legendgroup='group4', legendgrouptitle_text='PB, MFI')
    MFI10 = go.Scatter(x=ndf.index, y=ndf['MFI10'], line=dict(dash='dashdot', color='green', width=2), name='MFI10')

    RSI = go.Scatter(x=ndf.index, y=ndf['rsi14'], line=dict(color='red', width=2), name='RSI', legendgroup='group5', legendgrouptitle_text='RSI')
    
    # 스타일
    fig = ms.make_subplots(rows=5, cols=2, specs=[[{'rowspan':4},{}],[None,{}],[None,{}],[None,{}],[{},{}]], shared_xaxes=True, horizontal_spacing=0.03, vertical_spacing=0.01)

    fig.add_trace(candle,row=1,col=1)
    fig.add_trace(upper,row=1,col=1)
    fig.add_trace(ma20,row=1,col=1)
    fig.add_trace(lower,row=1,col=1)

    fig.add_trace(volume,row=5,col=1)

    fig.add_trace(candle,row=1,col=2)
    fig.add_trace(upper,row=1,col=2)
    fig.add_trace(ma20,row=1,col=2)
    fig.add_trace(lower,row=1,col=2)

    fig.add_trace(MACD,row=2,col=2)
    fig.add_trace(MACD_Signal,row=2,col=2)
    fig.add_trace(MACD_Oscil,row=2,col=2)

    fig.add_trace(fast_k,row=3,col=2)
    fig.add_trace(slow_d,row=3,col=2)

    fig.add_trace(PB,row=4,col=2)
    fig.add_trace(MFI10,row=4,col=2)

    fig.add_trace(RSI,row=5,col=2)

    # 추세추종
    # trend_fol = 0
    # trend_refol = 0
    # for i in ndf.index:
    #     if ndf['PB'][i] > 0.8 and ndf['MFI10'][i] > 80:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='orange', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)
    #     elif ndf['PB'][i] < 0.2 and ndf['MFI10'][i] < 20:
    #         trend_fol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='darkblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)
    #         fig.add_trace(trend_fol,row=1,col=1)

    # 역추세추종
    # for i in ndf.index:
    #     if ndf['PB'][i] < 0.05 and ndf['IIP21'][i] > 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='purple', marker_size=20, marker_symbol='triangle-up', opacity=0.7, showlegend=False)  #보라
    #         fig.add_trace(trend_refol,row=1,col=1)
    #     elif df['PB'][i] > 0.95 and ndf['IIP21'][i] < 0:
    #         trend_refol = go.Scatter(x=[ndf.index[i]], y=[ndf['close'][i]], marker_color='skyblue', marker_size=20, marker_symbol='triangle-down', opacity=0.7, showlegend=False)  #하늘
    #         fig.add_trace(trend_refol,row=1,col=1)    

    # fig.add_trace(trend_fol,row=1,col=1)
    # 추세추총전략을 통해 캔들차트에 표시합니다.

    # fig.add_trace(trend_refol,row=1,col=1)
    # 역추세 전략을 통해 캔들차트에 표시합니다.
    
    # fig.update_layout(autosize=True, xaxis1_rangeslider_visible=False, xaxis2_rangeslider_visible=False, margin=dict(l=50,r=50,t=50,b=50), template='seaborn', title=f'({ticker})의 날짜: ETH [추세추종전략:오↑파↓] [역추세전략:보↑하↓]')
    # fig.update_xaxes(tickformat='%y년%m월%d일', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray', showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_yaxes(tickformat=',d', zeroline=True, zerolinewidth=1, zerolinecolor='black', showgrid=True, gridwidth=2, gridcolor='lightgray',showline=True,linewidth=2, linecolor='black', mirror=True)
    # fig.update_traces(xhoverformat='%y년%m월%d일')
    # size = len(img)
    # img = plt.io.to_image(fig, format='png')
    # canvas = FigureCanvasBase(fig)
    # img = Image.frombytes(mode='RGB', size=(700, 500), data=fig.to_image().to, decoder_name='raw')
    # img = Image.frombytes('RGBA', (700, 500), plt.io.to_image(fig, format='png'), 'raw')
    # img = Image.fromarray('RGBA', (700, 500), np.array(plt.io.to_image(fig, format='png')), 'raw')
    # img = Image.fromarray(np.array(plt.io.to_image(fig, format='png')), 'RGB')
    # img = Image.fromarray(np.array(plt.io.to_image(fig, format='png')), 'L')
    # img = Image.open(io.BytesIO(plt.io.to_image(fig, format='png', width=140, height=100)))
    # img = Image.open(io.BytesIO(plt.io.to_image(fig, format='png', width=700, height=500)))
    img = Image.open(io.BytesIO(plt.io.to_image(fig, format='png')))
    # print(img)
    img.convert("RGB")
    # print(img)
    img.thumbnail((i_h, i_w), Image.ANTIALIAS)
    # print(img)
    # img = Image.Image(img, 'RGB')
    # img.show()
    return img

from lib.addAuxiliaryData import Auxiliary

class predict():
    def __init__(self, logger, df, idx) -> None:
        df['open'] = df['start']
        df['sma3'] = talib.SMA(np.asarray(df['close']), 3)
        aux = Auxiliary(logger)
        aux.add_macd(df,12,26,9)
        aux.add_bbands(df)
        aux.add_relative_strength(df)
        aux.add_stock_cast(df)
        aux.add_iip(df)
        aux.add_mfi(df)
        # df.dropna(inplace=True)
        df.reset_index(drop=True,inplace=True)
        # gpu가 사용 가능한 경우에는 device를 gpu로 설정하고 불가능하면 cpu로 설정합니다.
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        converter = torchvision.transforms.ToTensor()
        # 모델을 지정한 장치로 올립니다.
        self.model = DQN(device, screen_height, screen_width, action_kind, 4).to(device)
        # model = CNNRNN(device, screen_height, screen_width, action_kind, 4).to(device)
        # model = CNN().to(device)
        self.model = nn.DataParallel(self.model, device_ids=[0,1]).to(device)

        if path.exists("/home/yubank/pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)):
            self.model.load_state_dict(torch.load("/home/yubank/pt/train_dqn_{0:02d}_{1}.pt".format(action_kind, device)))

        self.model.eval()

        self.x = get_chart(df, idx, data_size, i_w = screen_width, i_h = screen_height)
        self.x = converter(self.x).unsqueeze(0).to(device).squeeze(1)

    def get_dl_action(self):
        output = self.model.forward(self.x)
        _,action = torch.max(output,1)
        # print("action:", action)
        return action.cpu().numpy()[0]

    def fbuy(self):
        return True if self.get_dl_action() == 0 else False

    def fsell(self):
        return True if self.get_dl_action() == (action_kind - 1) else False

우여 곡절이 중간에 조금 있었습니다.

학습된 모델을 pt파일로 저장을 한 다음 predict 클래스를 만들었습니다.

파이선의 클래스는 static method와 일반 method의 개념이 없어서 그냥 함수 묶음으로도 사용이 가능합니다.

적용 후 실제 upbit에서 거래가 발생했고 한 차례도 손실없이 매매를 하고 있습니다.

1주일을 일단 집에서 돌리고 고로케이션을 알아보고 있는데 최소 30만원(랙 마운트 케이스 및 파워구매, 대충 4u 로케이션 가격 15만원 설치비 5만원(IDC내부 회선 설치 및 전원공사임 내서버 설치는 별도임))의 추가 비용이 발생합니다.

수익률이 안나온다면 코로케이션을 진행할 수 없을 것 같네요.

맺은말

궁금한 점이나 이상하다고 생각되시면 답글로 남겨 주세요.

세세하고 친절히 답변 드리겠습니다.

LIST

저작자표시 비영리 (새창열림)

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(14) - 학습 파일 단위 분리 (5)	2023.02.25
강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화 (1)	2023.02.14
강화학습을 이용한 비트코인 매매프로그램(11) - ResNet + RNN 적용 모델 (1)	2022.12.25
강화학습을 이용한 비트코인 매매프로그램(7)-Back Test (1)	2022.12.24
강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장 (0)	2022.12.21

강화학습을 이용한 비트코인 매매프로그램(7)-Back Test

아빠는 벌레잡이 2022. 12. 24. 21:13

2022. 12. 24. 21:13

import random
from collections import deque, namedtuple
from typing import Dict, List, NamedTuple, Optional, Text, Tuple, Union
from IPython.display import display, Math
from Account import Account
import math
from itertools import count

import os.path as path
import numpy as np
import plotly as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as T
from Market import Market
from PIL import Image
from plotly import express as px
import torchvision
import time
import joblib
import sys,os

action_kind = 4
max_episode = 5000
screen_height = 50
screen_width  = 70
visit_cnt = [0] * action_kind
# replay_buffer = deque()
epsilon = 0.3
dis = 0.9
data_size = 600

BATCH_SIZE = 8
GAMMA = 0.999
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 200
TARGET_UPDATE = 10
steps_done = 0
loss = any

# for i in range(torch.cuda.device_count()):
#     print(torch.cuda.get_device_name(i))

# if gpu is to be used
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cuda")
# device_str = "cpu"
device_str = "cuda"
device = torch.device(device_str)

from dqn import DQN
    
converter = torchvision.transforms.ToTensor()
market = Market()
policy_net = DQN(device, screen_height, screen_width, action_kind, 32).to(device)
policy_net = nn.DataParallel(policy_net, device_ids=[0,1]).to(device)

def get_chart(market, idx, max_data):
    # img = Image.fromarray(np.uint8(cm.gist_earth(plt.io.to_image(fig, format='png')*255)))
    # im = Image.fromarray(img, bytes=True)
    # im = Image.fromarray(np.uint8(cm.gist_earth(img))/255)
    # im = Image.fromarray(np.uint8(img)/255)
    # img = img.resize((700, 500), resample=Image.BICUBIC)
    # display(img)
    # img = Image.fromarray(cm.gist_earth(plt.io.to_image(fig, format='png'), bytes=True))
    img = market.get_chart(idx, max_data=max_data)
    img.convert("RGB")
    # print(img)
    img.thumbnail((screen_high, screen_width), Image.ANTIALIAS)
    return chart

def plot_durations(last_chart, curr_chart):
    plt.figure()
    # plt.subplot(1,2,1)
    img = plt.imshow(last_chart.cpu().squeeze(0).permute(1, 2, 0).numpy(), interpolation='none')
    plt.title('Example extracted screen')
    plt.figure(2)
    # plt.subplot(1,2,2)
    plt.clf()
    # durations_t = torch.tensor(episode_durations, dtype=torch.float)
    plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Duration')
    # plt.plot(durations_t.numpy())
    # plt.show()
    # Take 100 episode averages and plot them too
    # if len(durations_t) >= 100:
    #     means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
    #     means = torch.cat((torch.zeros(99), means))
    #     plt.plot(means.numpy())

    img.set_data(curr_chart.cpu().squeeze(0).permute(1, 2, 0).numpy())
    plt.pause(0.01)  # pause a bit so that plots are updated
#     if is_ipython:
    display.clear_output(wait=True)
    display.display(plt.gcf())

def select_action(df, idx):
    return ((df.loc[idx, "close"] - df.loc[idx, "closemin"]) * (action_kind-1) // (df.loc[idx, "closemax"] - df.loc[idx, "closemin"]))
            
def main():
    if path.exists("pt/train_net_{}.pt".format(device_str)):
        policy_net.load_state_dict(torch.load("pt/train_net_{}.pt".format(device_str)))
    policy_net.eval()
    
    for epoch in range(max_episode):
        df = market.get_data()
        account = Account(df, 50000000)
        # account.back_up()
        account.reset()
        count = 0
        correct = 0
        last_chart = None
        while last_chart == None:
            last_chart = get_chart(market, data_size, data_size)

        for idx, _ in enumerate(df.index, start=data_size + 1):
            try:
                since = time.time()
                curr_chart = get_chart(market, idx, data_size)
                if curr_chart is None:
                    continue

                state = curr_chart - last_chart
                curr_chart = last_chart
                dqn_action = policy_net(curr_chart)
                _,dqn_action = torch.max(dqn_action,1)
                dqn_action = dqn_action.cpu().numpy()[0]
                plc_action = select_action(df, idx)
                count += 1
                correct += 1 if dqn_action == plc_action else 0
                
                print("idx:%d==>dqn_action:%d:%d, price:%.2f, match:%.2f"%(idx, dqn_action, plc_action, df.loc[idx, 'close'], correct * 100/count))

                reward, real_action = account.exec_action(2 if dqn_action == (action_kind - 1) else (1 if dqn_action == 0 else 0) , idx)
                reward = torch.tensor([reward], device=device) 
                real_action = torch.tensor([[real_action]], device=device, dtype=torch.long)
                last_chart = curr_chart
                
                if account.is_bankrupt():
                    break
                spend = time.time() - since
                print("idx:%d price [%.4f] unit[%.4f] used time[%.2f] agent rate:%.05f remind money:%.02f" 
                        % (idx, df.loc[idx, 'close'], account.unit, spend, account.rate, account.balance + account.unit * df.loc[idx, 'close']))
            except Exception as ex:
                exc_type, exc_obj, exc_tb = sys.exc_info()
                fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                print("`epoch loop -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))

        # torch.save(policy_net.state_dict(),"pt/policy_net.pt")
        # print("epoch[%d] epsode is next loss[%.05f]" % (epoch, loss.item()))
        if path.exists("pt/train_net_{}.pt".format(device_str)) and epoch % 100:
            policy_net.load_state_dict(torch.load("pt/train_net_{}.pt".format(device_str)))
            policy_net.eval()

        # if loss.item() < 0.00001 : 
        #     break
    
    print('Complete')

if __name__ == "__main__":
    main()

오늘은 강화학습으로 pt파일이 만들어졌다면 그것을 이용하여 실제 매매이전에 back test를 진행하는 main_agent.py에 대하여 말씀 드리겠습니다.
get_chart() 함수를 이용하여 torch.tensor로 변환된 PIL 이미지를 받습니다.
최초 실행시 0번 이미지를 미리 받아 두고 0번 이미지와 1번 이미지의 차를 구합니다.
이것이 강화학습에서 말하는 상태 S1이 됩니다.
시간 t1에 대한 S1이 생성되면 이것을 DQN에 넣고 행동 a1을 받습니다.
a1은 0,1,2 중 하나의 값입니다.
0은 관망 1은 매수 2는 매도를 하게 됩니다.
앞 장에서 다룬 내용중 Markget.get_data()함수의 데이터 갯수를 줄이거나 늘려서 매매 횟수를 변경할 수 있으며 epsilone 값을 변경 하면 매매프로그램이 얼마나 자주 모험적인 선택을 할지를 결정 할 수 있습니다.

학습에 사용된 pt/train_net.pt 파일은 학습 및 응용과 지금 main_agent.py에서도 공유됩니다.
공유된 DQN의 하이퍼 파라메트에 의해서 main_agent.py는 훈련 없이 바로 detection이 가능합니다.
실제 실행 되는 상태를 보고 어느 정도의 수익이 발생하는 지를 과거의 데이터로 예상할 수 있습니다.
여기 발견된 오 작동이나 문제점은 main_buroto.py 파일의 recall_history()함수에서 반영되어 새로운 history Series 를 생성하여야 합니다.
새로 생성된 history를 이용하여 학습을 진행 하여 main_agent.py를 실행하면 학습된 내용은 자동으로 반영이 되겠죠.

학습 데이터는 작은 량에서 점점 큰 량으로 변화 시키면서 과적합이 발생하는지 확인 합니다.
학습이 정상적으로 이루어 지면 다시 main_agent.py를 실행하여 detection을 진행한다.
과거의 데이터를 이용하지만 DQN은 학습된 Q(s,a)함수를 실행하는 것이지 과거의 학습된 내용을 바탕으로 한 데이터를 따라 가는 것이 아니다.
실제 history내용과 비교 해 보면 그것을 알 수 있다.
아래 그림은 왼쪽은 과거 데이터를 이용하여 학습을 진행하고 학습된 모델을 이용하여 agent가 실행되는 모습을 담은 사진입니다.
agent가 6%대의 수익을 내고 있으며 학습하는 프로세스가 loss가 적어질 수 록 agent의 수익률은 조금식 증가합니다.
다음 시간에는 실전 매매 프로그램에 DQN 학습 모델을 연동하여 실시간에서는 어떤 동작을 하는지 알아 보겠습니다.

현재 학습이 진행되어 loss가 0.0221정도로 줄어든 로그입니다.

main_agent.py를 실행하면 학습된 training_cuda.pt파일을 읽어서 매매를 실행하여 테스트를 진행하는 사진입니다.
수익률은 44.81%정도 됩니다.
차트를 그리는 시간이 2초 이상걸려서 그리 빠르지는 않습니다.
하루에 약30000건을 테스트 중입니다.
개인 PC사양에 영향을 받을 것 같습니다.

LIST

저작자표시 비영리 (새창열림)

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용 (5)	2023.01.30
강화학습을 이용한 비트코인 매매프로그램(11) - ResNet + RNN 적용 모델 (1)	2022.12.25
강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장 (0)	2022.12.21
강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델 (0)	2022.12.20
강화학습을 이용한 비트코인 매매프로그램(8) - wsl이용하기 (3)	2022.12.12

강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장

아빠는 벌레잡이 2022. 12. 21. 14:49

2022. 12. 21. 14:49

지난차의 CNN+RNN모델에서 변경 가능한 파라메트는 hidden_dim값 일것입니다. 여기서 궁금점이 하나 생기는 것이 CNN+RNN,CNN+ISTM,CNN+GRU의 성능의 차이가 있을까? 하는것 입니다.

import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNRNN2(nn.Module):
    def __init__(self, device, h, w, embedding_size, outputs, hdnsize, num_layer):
        super(CNNRNN2, self).__init__()
        self.device = device
        self.hidden_size = hdnsize
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, hdnsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(hdnsize)
        self.conv3 = nn.Conv2d(hdnsize, hdnsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(hdnsize)
        self.embedding_size = embedding_size
        self.num_layer = num_layer

        self.encoder = nn.Embedding(54 * hdnsize, embedding_size)
        self.rnn = nn.RNN(embedding_size, hdnsize, num_layer)
        self.decoder = nn.Linear(hdnsize, outputs)
        
    def init_hidden(self):
        return torch.zeros(1, self.hidden_size)
    
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        hidden = self.init_hidden().to(self.device)
        x, hidden = self.rnn(x, hidden)
        x = self.decoder(x.view(8, -1))
        
        return x

embedding을 포함한 cnn+rnn 모델 cnnrnn2.py 입니다.

import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNRNN2(nn.Module):
    def __init__(self, device, h, w, embedding_size, outputs, hdnsize, num_layer):
        super(CNNRNN2, self).__init__()
        self.device = device
        self.hidden_size = hdnsize
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, hdnsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(hdnsize)
        self.conv3 = nn.Conv2d(hdnsize, hdnsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(hdnsize)
        self.embedding_size = embedding_size
        self.num_layer = num_layer

        self.encoder = nn.Embedding(126 * hdnsize, embedding_size)
        self.rnn = nn.LSTM(embedding_size, hdnsize, num_layer)
        self.decoder = nn.Linear(hdnsize, outputs)
        
    def init_hidden(self):
        hidden = torch.zeros(num_layer, 8, self.hidden_size) #8 = batch_size
        cell = torch.zeros(num_layer, 8, self.hidden_size)   #8 = batch_size
        return hidden, cell
    
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        hidden = self.init_hidden().to(self.device)
        x, (hidden, cell) = self.rnn(x, (hidden, cell))
        x = self.decoder(x.view(8, -1))
        
        return x

embedding을 포함한 cnn+lstm모델인 cnnlstm.py입니다.

import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNRNN2(nn.Module):
    def __init__(self, device, h, w, embedding_size, outputs, hdnsize, num_layer):
        super(CNNRNN2, self).__init__()
        self.device = device
        self.hidden_size = hdnsize
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, hdnsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(hdnsize)
        self.conv3 = nn.Conv2d(hdnsize, hdnsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(hdnsize)
        self.embedding_size = embedding_size
        self.num_layer = num_layer

        self.encoder = nn.Embedding(126 * hdnsize, embedding_size)
        self.rnn = nn.GRU(embedding_size, hdnsize, num_layer)
        self.decoder = nn.Linear(hdnsize, outputs)
        
    def init_hidden(self):
        return torch.zeros(1, self.hidden_size)
    
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        hidden = self.init_hidden().to(self.device)
        x, hidden = self.rnn(x, hidden)
        x = self.decoder(x.view(8, -1))
        
        return x

embedding을 포함한 cnn+gru모델인 cnngru.py입니다.

import random
from collections import deque, namedtuple
from IPython.display import display, Math
from Account import Account
import math
from itertools import count
import os

import sys
import os.path as path
import plotly as plt
import torch
import torch.nn as nn
import torch.optim as optim
from Market import Market
import torchvision
import time
from cnnrnn2 import CNNRNN2
from memory import ReplayMemory, Experience
# from transformers import get_cosine_schedule_with_warmup
import transformers

action_kind = 3
max_episode = 5000
screen_height = 100
screen_width  = 140
data_size = 250
visit_cnt = [0] * action_kind
# replay_buffer = deque()
epsilon = 0.3
dis = 0.9

BATCH_SIZE = 8
# GAMMA = 0.999
GAMMA = 0.99
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 200
TARGET_UPDATE = 10
steps_done = 0
loss = any
WINDOW_START = 0
WINDOW_SIZE  = 1500

# for i in range(torch.cuda.device_count()):
#     print(torch.cuda.get_device_name(i))
device_str = "cuda"
device = torch.device(device_str)
    
memory = ReplayMemory(BATCH_SIZE)
converter = torchvision.transforms.ToTensor()
market = Market()
train_net = CNNRNN2(device, screen_height, screen_width, 10, action_kind, 32, 2).to(device)
train_net = nn.DataParallel(train_net, device_ids=[0,1]).to(device)

episode_durations = []
optimizer = optim.RMSprop(train_net.parameters(), lr=0.0001, eps=0.00000001)
# scheduler = get_cosine_schedule_with_warmup(optimizer, 5, base_lr=0.3, final_lr=0.01)
scheduler = transformers.get_cosine_schedule_with_warmup(optimizer, 
                                                         num_warmup_steps=5, 
                                                         num_training_steps=25)
def optimize_action(memory):
    if len(memory) < BATCH_SIZE:
        return None
    optimizer.zero_grad()
    epsode = memory.pop(BATCH_SIZE)
    batch = Experience(*zip(*epsode))
    state_batch = torch.cat(batch.state)
    action_batch = torch.cat(batch.action)
    state_action_values = train_net(state_batch).gather(1, action_batch)
    # optimizer = optim.RMSprop(train_net.parameters(), 0.01)
    
    criterion = nn.SmoothL1Loss()
    loss = criterion(state_action_values, action_batch)

    # Optimize the model
    # optimizer.zero_grad()
    loss.backward()
    for param in train_net.parameters():
        param.grad.data.clamp_(-1, 1)
    optimizer.step()
    return loss

# def select_action(df, idx):
#     try:
#         global steps_done
#         sample = random.random()
#         eps_threshold = EPS_END + (EPS_START - EPS_END) * math.exp(-1. * steps_done / EPS_DECAY)
#         steps_done += 1
#         if sample > eps_threshold:
#             if df.loc[idx, "closemax"] == df.loc[idx, "close"]:
#                 action = 2
#                 return  action
#             elif df.loc[idx, "closemin"] == df.loc[idx, "close"]:
#                 action = 1
#                 return  action
#             else:
#                 return 0
#         else:
#             action = random.randrange(action_kind)
#             return  action
#     except Exception as ex:
#         exc_type, exc_obj, exc_tb = sys.exc_info()
#         fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
#         print("`select_action -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
#         return 0
def select_action(df, idx):
    try:
        if df.loc[idx, "closemax"] == df.loc[idx, "close"]:
            action = 2
            return  action
        elif df.loc[idx, "closemin"] == df.loc[idx, "close"]:
            action = 1
            return  action
        else:
            return 0
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        print("`select_action -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
        return 0
        
def get_chart(market, idx, max_data):
    img = market.get_chart(idx, max_data=max_data)
    if img is None:
        return None
    # img = Image.fromarray(np.uint8(cm.gist_earth(plt.io.to_image(fig, format='png')*255)))
    # im = Image.fromarray(img, bytes=True)
    # im = Image.fromarray(np.uint8(cm.gist_earth(img))/255)
    # im = Image.fromarray(np.uint8(img)/255)
    # img = img.resize((700, 500), resample=Image.BICUBIC)
    # img = Image.fromarray(cm.gist_earth(plt.io.to_image(fig, format='png'), bytes=True))
    # display(img)
    # chart = converter(img).unsqueeze(0).to(device)
    chart = converter(img).unsqueeze(0)
    return chart

def plot_durations(last_chart, curr_chart):
    plt.figure()
    # plt.subplot(1,2,1)
    img = plt.imshow(last_chart.cpu().squeeze(0).permute(1, 2, 0).numpy(), interpolation='none')
    plt.title('Example extracted screen')
    plt.figure(2)
    # plt.subplot(1,2,2)
    plt.clf()
    durations_t = torch.tensor(episode_durations, dtype=torch.float)
    plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Duration')
    plt.plot(durations_t.numpy())
    # plt.show()
    # Take 100 episode averages and plot them too
    if len(durations_t) >= 100:
        means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
        means = torch.cat((torch.zeros(99), means))
        plt.plot(means.numpy())

    img.set_data(curr_chart.cpu().squeeze(0).permute(1, 2, 0).numpy())
    plt.pause(0.01)  # pause a bit so that plots are updated
    display.clear_output(wait=True)
    display.display(plt.gcf())
            
def main():
    if path.exists("pt/train_cnnrnn2_{}.pt".format(device_str)):
        train_net.load_state_dict(torch.load("pt/train_cnnrnn2_{}.pt".format(device_str)))
    train_net.train()
        
    for _ in range(10):    
        df = market.get_data()
        # df = df.head(WINDOW_SIZE)

        for epoch in range(max_episode):
            account = Account(df, 50000000)
            account.reset()
            for idx,_ in enumerate(df.index, start=data_size):
                try:
                    since = time.time()
                    curr_chart = get_chart(market, idx, data_size)
                    reward = 0
                    num_action = select_action(df, idx)
                    reward, real_action = account.exec_action(num_action, idx)
                    print("idx:%d==>action:%d, price:%.2f"%(idx, num_action, df.loc[idx, 'close']))
                    reward = torch.tensor([reward], device=device)
                    action = torch.tensor([[num_action]], device=device, dtype=torch.int64)
                                    
                    memory.push(curr_chart, action, reward)
                    while len(memory) >= BATCH_SIZE:
                        optimizer.zero_grad()
                        epsode = memory.pop(BATCH_SIZE)
                        batch = Experience(*zip(*epsode))
                        state_batch = torch.cat(batch.state)
                        action_batch = torch.cat(batch.action)
                        state_action_values = train_net(state_batch).gather(1, action_batch)
                        # optimizer = optim.RMSprop(train_net.parameters(), 0.01)
                        
                        criterion = nn.SmoothL1Loss()
                        loss = criterion(state_action_values, action_batch)

                        # Optimize the model
                        # optimizer.zero_grad()
                        loss.backward()
                        for param in train_net.parameters():
                            param.grad.data.clamp_(-1, 1)
                        optimizer.step()
                        if loss is not None:
                            print("epoch[%d:%d] epsode is next loss[%.10f]" % (epoch, idx, loss.item()))

                    if idx % TARGET_UPDATE == 0:
                        torch.save(train_net.state_dict(),"pt/train_cnnrnn2_{}.pt".format(device_str))
                    
                    spend = time.time() - since
                    print("idx:%d price [%.4f] unit[%.4f] used time[%.2f] agent rate:%.05f remind money:%.02f" 
                            % (idx, df.loc[idx, 'close'], account.unit, spend, account.rate, account.balance + account.unit * df.loc[idx, 'close']))
                    if account.is_bankrupt():
                        break
                    if idx == df.index.max():
                        break
                except Exception as ex:
                    exc_type, exc_obj, exc_tb = sys.exc_info()
                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                    print("`recall_training -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
            scheduler.step()

        print("end training DQN")
    
    print('Complete Training')

if __name__ == "__main__":
    main()

위의 3가지 모델을 학습하기 위한 학습 파일인 main_cnnrnn2.py 입니다. import 부분만 변경하여 훈련이 가능 할 것으로 예상됩니다.

대표이미지 출처:https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr

Long Short-Term Memory (LSTM) 이해하기

이 글은 Christopher Olah가 2015년 8월에 쓴 글을 우리 말로 번역한 것이다. Recurrent neural network의 개념을 쉽게 설명했고, 그 중 획기적인 모델인 LSTM을 이론적으로 이해할 수 있도록 좋은 그림과 함께

dgkim5360.tistory.com

LIST

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(11) - ResNet + RNN 적용 모델 (1)	2022.12.25
강화학습을 이용한 비트코인 매매프로그램(7)-Back Test (1)	2022.12.24
강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델 (0)	2022.12.20
강화학습을 이용한 비트코인 매매프로그램(8) - wsl이용하기 (3)	2022.12.12
강화학습을 이용한 비트코인 매매프로그램(6)-Buroto Force학습 (17)	2022.12.04

강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델

아빠는 벌레잡이 2022. 12. 20. 07:37

2022. 12. 20. 07:37

현재 강화학습 모델은 과거 데이터로 히스토리를 만든 다음 주가차트와 보조지표가 다 있는 이미지로 env의 상태 tensor를 만들어 프로세싱한 모델이 강제로 3가지로 라벨링을 하기 때문에 비슷한 결과에도 어떤때는 보유 또는 관망이 되고 어떤때는 매매가 되므로 optimizer가 하이퍼 파라메트를 결정하기가 쉽지 않게 됩니다. 그래서 loss가 충분히 소멸하지 않고 계속 널뛰기를 하는 현상이 있습니다. DQN 내부의 CNN의 outputs값은 분류 총 값이지만 실제 비지도 학습으로 분류된 값은 약 500~800개 정도로 넓은 범위를 가지고 있습니다. 이것을 최종 nn.Linear함수를 거치면서 모든 값이 소멸하고 3가지로 축소 됩니다. 만약에 차트가 조금 상승할때 팔고 조금하락할때 사고를 반복하면 수익률은 점점 가파르게 감소하여 잔고가 0이 되게 됩니다. 실제 히스토리 함수가 매매를 자주 하지 않는 이유는 작은 변동에 매매가 발생할 경우 수익도 크지 않을뿐더러 낙폭이 크지는 경우 큰 손실이 발생한 다음은 손실을 복구하지 못하는 상황이 발생하게 됩니다. 그래서 매매를 자주 하지 않게 설계되어 있습니다. 이 부분은 실제로는 수학적 손실,손익과는 논리적으로 맞지 않기 때문에 이상적인 목표를 가진 CNN의 입장에서는 모두 loss로 인식이 되게 됩니다. 이 문제를 극복하기 위해서는 CNN자체를 upgrade하는 방법과 CNN과 RNN을 병합하는 방법을 연구해야 합니다. 저도 어느게 맞다고 말할 수 없습니다. 아직도 강화학습은 연구 대상이지 실용단계로 보기는 힘듭니다. RNN이란 말을 들어시고 뭔가 이상함을 느꼈다면 제가 앞에서 이야기한 인간이 개입한 데이터는 시계열 데이터로 보기 힘들다는 이야기 때문일 겁니다. 그러나 RNN은 주가분석같은 시계열 데이터 분석뿐만 아니라 자연어 학습에도 사용됩니다. 우리가 연구 중인 주식이나 코인의 매매 차트를 분류하면 조금 전에 말씀드린것처럼 약 800개의 패턴이 발생합니다. 그 이상을 넘어 가지는 않습니다. RNN 자연어 분석 예제를 보시면 입력값이 입력 글짜 수(여기에 차트의 비지도 학습 분류 패턴 수를 입력),hidden layer 갯수, 출력 글짜 수(여기에 매매프로그램이 사용할 행동갯수를 입력) 입니다. 이 부분을 잘 응용하면 CNN의 출력수(비지도 분류값 약 800개)를 RNN 의 입력값으로 hidden값은 init_hidden함수로 자동 계산되고 출력값은 우리가 구하려는 3가지(매도,매수,관망)일겁니다. 여기에 임베딩(GRU를 사용할 경우 encoding을 사용하여 자연어 분석시 학습률을 높일 수 있음)을 더하면 조금 더 복잡해지겠지만 단순히 차트만으로 분류하는게 아닌 순서적 흐름을 이용한 분류가 가능해 질것 입니다. 대부분의 값은 관망값을 가져야 합니다. 그렇다면 히든값은 무엇을 의미할까요? RNN에 들어갈 첫번째 값이 있다면 이 값은 시간적인 값t0의 값을 가질겁니다. 히든값은 그 다음에 올 수 있는 수 많은 가능성의 t1값의 tensor일 것입니다. 두 값의 연산에 의해서 t1이 결정되면 hidden은 가능성의값 t2의 텐서가 될것이고 그로 인해 실제 t2가 결정될것 입니다. 학습이 진행되면 정확도는 증가할 것으로 예상됩니다.

import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNRNN(nn.Module):
    def __init__(self, device, h, w, outputs, hdnsize):
        super(CNNRNN, self).__init__()
        self.device = device
        self.hidden_size = hdnsize
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, hdnsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(hdnsize)
        self.conv3 = nn.Conv2d(hdnsize, hdnsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(hdnsize)
        
        self.i2h = nn.Linear(54 * hdnsize, hdnsize)
        self.h2h = nn.Linear(hdnsize, hdnsize)
        self.i2o = nn.Linear(hdnsize, outputs)
        self.act_fn = nn.Tanh()

    def init_hidden(self):
        return torch.zeros(1, self.hidden_size)
    
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        hidden = self.init_hidden().to(self.device)
        x= self.i2h(x.view(x.size(0), -1))
        hidden = self.h2h(hidden)
        hidden = self.act_fn(x + hidden)
        return self.i2o(hidden)

cnnrnn.py 입니다.

import random
from collections import deque, namedtuple
from IPython.display import display, Math
from Account import Account
import math
from itertools import count
import os

import sys
import os.path as path
import plotly as plt
import torch
import torch.nn as nn
import torch.optim as optim
from Market import Market
import torchvision
import time
from cnnrnn import CNNRNN
from memory import ReplayMemory, Experience

action_kind = 3
max_episode = 5000
screen_height = 100
screen_width  = 140
data_size = 250
visit_cnt = [0] * action_kind
# replay_buffer = deque()
epsilon = 0.3
dis = 0.9

BATCH_SIZE = 8
# GAMMA = 0.999
GAMMA = 0.99
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 200
TARGET_UPDATE = 10
steps_done = 0
loss = any
WINDOW_START = 0
WINDOW_SIZE  = 1500

# for i in range(torch.cuda.device_count()):
#     print(torch.cuda.get_device_name(i))
device_str = "cuda"
device = torch.device(device_str)
    
memory = ReplayMemory(BATCH_SIZE)
converter = torchvision.transforms.ToTensor()
market = Market()
train_net = CNNRNN(device, screen_height, screen_width, action_kind, 32).to(device)
train_net = nn.DataParallel(train_net, device_ids=[0,1]).to(device)

episode_durations = []
    
optimizer = optim.RMSprop(train_net.parameters())

def optimize_action(memory):
    if len(memory) < BATCH_SIZE:
        return None
    epsode = memory.pop(BATCH_SIZE)
    batch = Experience(*zip(*epsode))
    state_batch = torch.cat(batch.state)
    action_batch = torch.cat(batch.action)
    train_net.train()
    state_action_values = train_net(state_batch).gather(1, action_batch)
    
    criterion = nn.SmoothL1Loss()
    loss = criterion(state_action_values, action_batch)

    # Optimize the model
    optimizer.zero_grad()
    loss.backward()
    for param in train_net.parameters():
        param.grad.data.clamp_(-1, 1)
    optimizer.step()
    return loss

def select_action(df, idx):
    try:
        global steps_done
        sample = random.random()
        eps_threshold = EPS_END + (EPS_START - EPS_END) * math.exp(-1. * steps_done / EPS_DECAY)
        steps_done += 1
        if sample > eps_threshold:
            if df.loc[idx, "closemax"] == df.loc[idx, "close"]:
                action = 2
                return  action
            elif df.loc[idx, "closemin"] == df.loc[idx, "close"]:
                action = 1
                return  action
            else:
                return 0
        else:
            action = random.randrange(action_kind)
            return  action
    except Exception as ex:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
        print("`select_action -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
        return 0
        
def get_chart(market, idx, max_data):
    img = market.get_chart(idx, max_data=max_data)
    if img is None:
        return None
    # img = Image.fromarray(np.uint8(cm.gist_earth(plt.io.to_image(fig, format='png')*255)))
    # im = Image.fromarray(img, bytes=True)
    # im = Image.fromarray(np.uint8(cm.gist_earth(img))/255)
    # im = Image.fromarray(np.uint8(img)/255)
    # img = img.resize((700, 500), resample=Image.BICUBIC)
    # img = Image.fromarray(cm.gist_earth(plt.io.to_image(fig, format='png'), bytes=True))
    # display(img)
    chart = converter(img).unsqueeze(0).to(device)
    return chart

def plot_durations(last_chart, curr_chart):
    plt.figure()
    # plt.subplot(1,2,1)
    img = plt.imshow(last_chart.cpu().squeeze(0).permute(1, 2, 0).numpy(), interpolation='none')
    plt.title('Example extracted screen')
    plt.figure(2)
    # plt.subplot(1,2,2)
    plt.clf()
    durations_t = torch.tensor(episode_durations, dtype=torch.float)
    plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Duration')
    plt.plot(durations_t.numpy())
    # plt.show()
    # Take 100 episode averages and plot them too
    if len(durations_t) >= 100:
        means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
        means = torch.cat((torch.zeros(99), means))
        plt.plot(means.numpy())

    img.set_data(curr_chart.cpu().squeeze(0).permute(1, 2, 0).numpy())
    plt.pause(0.01)  # pause a bit so that plots are updated
    display.clear_output(wait=True)
    display.display(plt.gcf())
            
def main():
    if path.exists("pt/train_cnnrnn_{}.pt".format(device_str)):
        train_net.load_state_dict(torch.load("pt/train_cnnrnn_{}.pt".format(device_str)))
        
    for _ in range(10):    
        df = market.get_data()
        # df = df.head(WINDOW_SIZE)

        for epoch in range(max_episode):
            account = Account(df, 50000000)
            account.reset()

            last_chart = get_chart(market, data_size, data_size)
            account.reset()

            for idx,_ in enumerate(df.index, start=(data_size + 1)):
                try:
                    since = time.time()
                    curr_chart = get_chart(market, idx, data_size)
                    if curr_chart is not None:
                        state = curr_chart - last_chart
                    else:
                        continue
                    last_chart = curr_chart
                    reward = 0
                    num_action = select_action(df, idx)
                    reward, real_action = account.exec_action(num_action, idx)
                    print("idx:%d==>action:%d, price:%.2f"%(idx, num_action, df.loc[idx, 'close']))
                    reward = torch.tensor([reward], device=device)
                    action = torch.tensor([[num_action]], device=device, dtype=torch.int64)
                                    
                    memory.push(state, action, reward)
                    while len(memory) >= BATCH_SIZE:
                        loss = optimize_action(memory)
                        if loss is not None:
                            print("epoch[%d:%d] epsode is next loss[%.10f]" % (epoch, idx, loss.item()))

                    if idx % TARGET_UPDATE == 0:
                        torch.save(train_net.state_dict(),"pt/train_cnnrnn_{}.pt".format(device_str))
                    
                    spend = time.time() - since
                    print("idx:%d price [%.4f] unit[%.4f] used time[%.2f] agent rate:%.05f remind money:%.02f" 
                            % (idx, df.loc[idx, 'close'], account.unit, spend, account.rate, account.balance + account.unit * df.loc[idx, 'close']))
                    if account.is_bankrupt():
                        break
                    if idx == df.index.max():
                        break
                except Exception as ex:
                    exc_type, exc_obj, exc_tb = sys.exc_info()
                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                    print("`recall_training -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))

        print("end training DQN")
    
    print('Complete Training')

if __name__ == "__main__":
    main()

main_cnnrnn.py 입니다.
CNN 단독 모델과 CNN + RNN 모델의 학습 증가율을 비교해 보시기 바랍니다.

대표이미지출처:https://blog.kakaocdn.net/dna/IFp6A/btqASygMYCA/AAAAAAAAAAAAAAAAAAAAAHPRKM8z8_i1rJvLfrqQ8vMKOITsjcC7Q0zi2JxkpFQQ/img.png?credential=yqXZFxpELC7KVnFOS48ylbz2pIh7yKj8&expires=1759244399&allow_ip=&allow_referer=&signature=mFEIH8jDzqhtbOVg9uF%2FFSmH2NY%3D
RNN 소스코드 출처 : 책 파이토치 첫걸음에서 발췌 및 응용

LIST

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(7)-Back Test (1)	2022.12.24
강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장 (0)	2022.12.21
강화학습을 이용한 비트코인 매매프로그램(8) - wsl이용하기 (3)	2022.12.12
강화학습을 이용한 비트코인 매매프로그램(6)-Buroto Force학습 (17)	2022.12.04
강화학습을 이용한 비트코인 매매프로그램(5)-데이터에 Min/Max 추가하기 (0)	2022.11.30

강화학습을 이용한 비트코인 매매프로그램(6)-Buroto Force학습

아빠는 벌레잡이 2022. 12. 4. 23:34

2022. 12. 4. 23:34

import random
from collections import deque, namedtuple
from typing import Dict, List, NamedTuple, Optional, Text, Tuple, Union
# from IPython.display import display, Math
from Account import Account
import math
from itertools import count
import os

import sys
import os.path as path
import numpy as np
import plotly as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as T
from Market import Market
from PIL import Image
from plotly import express as px
import torchvision
import time
import joblib
import pandas as pd
from dqn import DQN
from memory import ReplayMemory, Experience

action_kind = 21
max_episode = 5000
screen_height = 100
screen_width  = 140
data_size = 600
epsilon = 0.3
dis = 0.9

BATCH_SIZE = 8
# GAMMA = 0.999
GAMMA = 0.99
EPS_START = 0.9
EPS_END = 0.05
EPS_DECAY = 200
TARGET_UPDATE = 10
steps_done = 0
loss = any
WINDOW_START = 0
WINDOW_SIZE  = 1500

# for i in range(torch.cuda.device_count()):
#     print(torch.cuda.get_device_name(i))

# if gpu is to be used
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cuda")
# device_str = "cpu"
device_str = "cuda"
device = torch.device(device_str)
    
memory = ReplayMemory(BATCH_SIZE)
converter = torchvision.transforms.ToTensor()
market = Market()
train_net = DQN(device, screen_height, screen_width, action_kind, 32).to(device)
train_net = nn.DataParallel(train_net, device_ids=[0,1]).to(device)

episode_durations = []
    
optimizer = optim.RMSprop(train_net.parameters(), lr=0.000001)

# def select_action(df, idx):
#     try:
#         global steps_done
#         sample = random.random()
#         eps_threshold = EPS_END + (EPS_START - EPS_END) * math.exp(-1. * steps_done / EPS_DECAY)
#         steps_done += 1
#         clslvl = ((df.loc[idx, "close"] - df.loc[idx, "closemin"]) * 40 // (df.loc[idx, "closemax"] - df.loc[idx, "closemin"]))
#         if sample > eps_threshold:
#             if df.loc[idx, "closemax"] == df.loc[idx, "close"]:
#                 action = clslvl
#                 return  action
#             elif df.loc[idx, "closemin"] == df.loc[idx, "close"]:
#                 action = clslvl
#                 return  action
#             else:
#                 return clslvl
#         else:
#             action = random.randrange(action_kind)
#             return  action
#     except Exception as ex:
#         exc_type, exc_obj, exc_tb = sys.exc_info()
#         fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
#         print("`select_action -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))
#         return 0

def select_action(df, idx):
    return ((df.loc[idx, "close"] - df.loc[idx, "closemin"]) * (action_kind-1) // (df.loc[idx, "closemax"] - df.loc[idx, "closemin"]))
        
def get_chart(market, idx, max_data):
    img = market.get_chart(idx, max_data=max_data)
    if img is None:
        return None
    # if  idx > (df.index.max() - data_size - 1):
    #     return None
    # img = Image.fromarray(np.uint8(cm.gist_earth(plt.io.to_image(fig, format='png')*255)))
    # im = Image.fromarray(img, bytes=True)
    # im = Image.fromarray(np.uint8(cm.gist_earth(img))/255)
    # im = Image.fromarray(np.uint8(img)/255)
    # img = img.resize((700, 500), resample=Image.BICUBIC)
    # display(img)
    # img = Image.fromarray(cm.gist_earth(plt.io.to_image(fig, format='png'), bytes=True))
    chart = converter(img).unsqueeze(0)
    return chart

def plot_durations(last_chart, curr_chart):
    plt.figure()
    # plt.subplot(1,2,1)
    img = plt.imshow(last_chart.cpu().squeeze(0).permute(1, 2, 0).numpy(), interpolation='none')
    plt.title('Example extracted screen')
    plt.figure(2)
    # plt.subplot(1,2,2)
    plt.clf()
    durations_t = torch.tensor(episode_durations, dtype=torch.float)
    plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Duration')
    plt.plot(durations_t.numpy())
    # plt.show()
    # Take 100 episode averages and plot them too
    if len(durations_t) >= 100:
        means = durations_t.unfold(0, 100, 1).mean(1).view(-1)
        means = torch.cat((torch.zeros(99), means))
        plt.plot(means.numpy())

    img.set_data(curr_chart.cpu().squeeze(0).permute(1, 2, 0).numpy())
    plt.pause(0.01)  # pause a bit so that plots are updated
    # display.clear_output(wait=True)
    # display.display(plt.gcf())
            
def main():
    if path.exists("pt/train_net_{}.pt".format(device_str)):
        train_net.load_state_dict(torch.load("pt/train_net_{}.pt".format(device_str)))
        
    for _ in range(10):    
        df = market.get_data()
        # df = df.head(WINDOW_SIZE)

        for epoch in range(max_episode):
            account = Account(df, 50000000)
            account.reset()

            last_chart = get_chart(market, data_size, data_size)
            account.reset()

            for idx,_ in enumerate(df.index, start=(data_size + 1)):
                try:
                    since = time.time()
                    curr_chart = get_chart(market, idx, data_size)
                    if curr_chart is None or last_chart is None:
                        continue
                    else:
                        state = curr_chart - last_chart
                        
                    last_chart = curr_chart
                    reward = 0
                    num_action = select_action(df, idx)
                    reward, real_action = account.exec_action(2 if num_action == (action_kind - 1) else (1 if num_action == 0 else 0) , idx)
                    print("idx:%d==>action:%d,real_action==>%d price:%.2f"%(idx, num_action, real_action, df.loc[idx, 'close']))
                    # reward = torch.tensor([reward], device=device)
                    num_action = torch.tensor([[num_action]], device=device, dtype=torch.int64)
                                    
                    memory.push(curr_chart, num_action, reward)
                    if len(memory) >= BATCH_SIZE:
                        epsode = memory.pop(BATCH_SIZE)
                        batch = Experience(*zip(*epsode))
                        # non_final_mask = torch.tensor(tuple(map(lambda s: s is not None,
                        #                                       batch.next_state)), device=device, dtype=torch.bool)
                        # non_final_next_states = torch.cat([s for s in batch.next_state
                        #                                             if s is not None])
                        state_batch = torch.cat(batch.state)
                        action_batch = torch.cat(batch.action)
                        # reward_batch = torch.cat(batch.reward)
                        train_net.train()
                        state_action_values = train_net(state_batch).gather(1, action_batch)
                        # next_state_values = torch.zeros(BATCH_SIZE, device=device)
                        # next_state_values[non_final_mask] = train_net(non_final_next_states).max(1)[0].detach()
                        # expected_state_action_values = (next_state_values * GAMMA) + reward_batch
                        # print(action_batch)
                        # print(state_action_values)
                        optimizer.zero_grad()
                        criterion = nn.SmoothL1Loss()
                        loss = criterion(state_action_values, action_batch)
                        # print(loss)
                        # loss = criterion(state_action_values, action_batch.unsqueeze(1))
                        # loss = criterion(market_status, train_status)

                        # Optimize the model
                        loss.backward()
                        for param in train_net.parameters():
                            param.grad.data.clamp_(-1, 1)
                        optimizer.step()
                        print("epoch[%d:%d] epsode is next loss[%.10f]" % (epoch, idx, loss.item()))

                    if idx % TARGET_UPDATE == 0:
                        torch.save(train_net.state_dict(),"pt/train_net_{}.pt".format(device_str))
                    
                    spend = time.time() - since
                    print("idx:%d price [%.4f] unit[%.4f] used time[%.2f] agent rate:%.05f remind money:%.02f" 
                            % (idx, df.loc[idx, 'close'], account.unit, spend, account.rate, account.balance + account.unit * df.loc[idx, 'close']))
                    if account.is_bankrupt():
                        break
                    if idx == df.index.max():
                        break
                        # train_net.load_state_dict(train_net.state_dict())
                    # save_file_model(degreeQ)                
                except Exception as ex:
                    exc_type, exc_obj, exc_tb = sys.exc_info()
                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                    print("`recall_training -> exception! %s : %s %d" % (str(ex) , fname, exc_tb.tb_lineno))

        print("end training DQN")
    
    print('Complete Training')

if __name__ == "__main__":
    main()

main_buroto.py 파일입니다.

import random
from collections import deque, namedtuple

Experience = namedtuple(
    'Experience',
    ('state', 'action', 'reward')
)

class ReplayMemory(object):
    def __init__(self, capacity):
        self.memory = deque([],maxlen=capacity)

    def push(self, *args):
        """Save a transition"""
        self.memory.append(Experience(*args))

    def pop(self, batch_size):
        arr = []
        for _ in range(batch_size):
            arr.append(self.memory.popleft())
        return arr
    
    def sample(self, batch_size):
        return random.sample(self.memory, batch_size)

    def __len__(self):
        return len(self.memory)

memory.py 부분을 분리했습니다.

import torch.nn as nn
import torch.nn.functional as F

class DQN(nn.Module):
    def __init__(self, device, h, w, outputs, qsize):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(4, h*w, kernel_size=5, stride=2)
        self.bn1 = nn.BatchNorm2d(h*w)
        self.conv2 = nn.Conv2d(h*w, qsize, kernel_size=5, stride=2)
        self.bn2 = nn.BatchNorm2d(qsize)
        self.device = device
        self.conv3 = nn.Conv2d(qsize, qsize, kernel_size=5, stride=2)
        self.bn3 = nn.BatchNorm2d(qsize)

        # Number of Linear input connections depends on output of conv2d layers
        # and therefore the input image size, so compute it.
        def conv2d_size_out(size, kernel_size = 5, stride = 2):
            return (size - (kernel_size - 1) - 1) // stride  + 1
        convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
        convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
        # print("convw[%d]  convh[%d]" % (convw, convh))
        linear_input_size = 54 * qsize
        self.head = nn.Linear(linear_input_size, outputs)

    # Called with either one element to determine next action, or a batch
    # during optimization. Returns tensor([[left0exp,right0exp]...]).
    def forward(self, x):
        x = x.to(self.device)
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        return self.head(x.view(x.size(0), -1))

dqn.py를 분리했습니다.

main함수부터 설명 드리면 가장 이상적인 매매 프로세스인 저점에서 매수하고 고점에서 매도하는 방식을 rolling함수를 이용하여 구하였습니다.
그리고 이것을 전체 데이터에 적용을 하여 history에 저장합니다.
매매 길이에 따라서 수천프로이상의 수익이 발생할 것입니다.
이것은 말 그대로 이상적인 매매 프로세스 입니다.
백프로 매치되는 것은 아니지만 이상적인 grid world의 마르코프 실행 프로세서(MDP) 처럼은 우리도 이것을 MDP로 가정합니다.
이제 실행된 결과를 DQN(CNN)에 학습합니다.
optimize함수가 그 역할을 합니다.
loss함수는 nn.SmoothL1Loss를 사용하여 해당 S의 action a와 기대 이상 행동 a'를 구하여 둘의 차이를 구하여 계산합니다.
아쉽지만 windows에서는 NCCL이 적용되지 않아 pytorch함수에서 에러가 발생합니다.
NCCL은 nvidia의 GPU를 서로 연동하여 최상의 성능을 내는 Api lib입니다.
현재 ubuntu와 red hat에서만 지원하고 있습니다.
그래서 지금 저는 WSL2를 이용한 ubuntu LTS(최종안정화버젼)인 20.04.5버젼으로 이전을 하려고 하는데 windows10과 wsl2의 병행 운영은 엄청난 cpu 제원을 요구 하네요.
어쩔 수 없이 서버를 ubuntu로 다시 설치하고 노트북으로 접속하여 이용하는 방법을 실행하려고 하고 있습니다.
불행히도 제 노트북이 현재 아작이 난 상태라 알리에서 검색해서 LCD판넬을 주문한 상태입니다만 가장 빠른 날짜가 12월 15일 도착이라고 하네요.
Nvidia 홈페이지 상으로는 현재로써는 windows에 NCCL이 적용이 안되어서 여러장의 GPU카드가 있어도 한장의 효과만 있다는 거죠.
현재 소스에서 보시면 nn.DataParallel이 DQN을 감싸고 있습니다.
실제 성능탭으로 보면 그래도 nn.DataParallel을 적용하기 전에는 한개의 GPU만 사용하게 작동이 되었으나 지금은 어느 정도 반반씩 부담하여 사용하는게 보입니다.
중요한 점은

recall_history()함수에서 저장된 데이터를 recall_training 함수에서 DQN을 학습하게 됩니다. 학습된 결과는 pt/training_net.pt파일로 저장을 하게 됩니다. 저의 경우는 데이터가 너무 많은 관계로 과적합 상태를 보이나 로스차이가 그렇게 크지는 않습니다. 적당한 데이터의 크기는 5000~10000개 정도의 데이터를 반복 학습하면서 loss를 최소화 한다음 다시 데이터를 샘플링하고 적용하고 하는 반복 작업을 통하여 실제 DQN이 적은 loss로 동작하는지가 중요합니다.

이 작업이 어느 정도 끝나면 다음은 main_agent.py 파일을 작성하여 실제 상황에서 detection을 발생하여 거래를 하게 하여 어느 정도의 수익률이 나오는지 분석을 해 보도록 하겠습니다.
저도 처음에는 강화학습이 데이터가 많이 적용되면 더욱 딕텍션이 안정될 거라 생각했지만 실제로는 데이터가 많을 수록 ai는 거래를 하지 않으려하네요.
프로그램을 조금 수정해서 pt파일을 여러 버젼으로 저장하는게 어떨까 제안 합니다.
다음 차에서 agent파일을 돌려 보면 알겠지만 DQN은 가장 안정화된 방법을 선택하려 합니다.
즉 매매를 하지 않아 수익도 손실도 없는 상태를 만들려고 합니다.
그렇지만 어떤 경우 훈련이 잘 진행되어 100%이상의 수익을 내기도 합니다.
아무런 if문 없이 수익을 낸다는 게 신기합니다만 retro.gym을 경험해 보신분들은 아실 겁니다.
사실 retro.gym을 훈련하는 과정에도 인간의 생각 판단 따위는 중요하지 않습니다.
훈련은 우연의 연속입니다.
다만 그 우연들 중에서 가장 스코어가 높은 쪽으로만 AI는 움직이려 합니다.
그렇게 하는것은 인간의 계산에서 epsilon값을 어떻게 판단하는냐에 따라 결과가 달라 질 겁니다.
epsilon을 0.5 정도로 하면 절반은 모험값으로 채워질 겁니다.
그리고 결과를 수익률 또는 남은 돈으로 정할 때 그 값이 최고가 되는 모델들만 history에 저장하고 학습을 돌린다면 엔진은 공격적인 매매를 해서 높은 수익률을 올릴 것입니다.
다음 편 실 agent 시뮬레이션 편을 기대 해 주세요.

LIST

저작자표시 비영리 (새창열림)

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델 (0)	2022.12.20
강화학습을 이용한 비트코인 매매프로그램(8) - wsl이용하기 (3)	2022.12.12
강화학습을 이용한 비트코인 매매프로그램(5)-데이터에 Min/Max 추가하기 (0)	2022.11.30
강화학습을 이용한 비트코인 매매프로그램(4)-LSTM의 주가분석 문제점 (1)	2022.11.30
강화학습을 이용한 비트코인 매매프로그램(3) - 데이터로 차트 그리기 (1)	2022.11.28

PREV 이전 1 NEXT 다음

오늘도 아빠는 벌레잡는 중

dqn

강화학습을 이용한 비트코인 매매프로그램(13) - 강화학습 최적화

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(12) - 실거래 적용

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(7)-Back Test

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(10) - CNN+RNN 모델 확장

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(9) - CNN+RNN 모델

'python > 자동매매 프로그램' 카테고리의 다른 글

강화학습을 이용한 비트코인 매매프로그램(6)-Buroto Force학습

'python > 자동매매 프로그램' 카테고리의 다른 글

+ Recent posts

티스토리툴바