３目並べを利用したAlphaGoの学習 tensorflow-1編対戦

1/20/2023

1.概要

AlphaGoの勉強過程で３目並べを学んでいます。考え方の基礎を知る上で大切なことであると思いネット上の資料も参考にしています。まず、３目並べのルールを記述したClassを作成して、対戦ができることを実現しました。今回はDeepLearningを利用した３回目の手順説明です。

2.詳細

(1) 概要

３目並べのフィールドを３☓３のイメージと考えて、手書き文字認識の手法を利用します。利用する環境はtensorflow-1.15環境です。入力データはminimax法で活用したすべての手順(9!=362880)の組み合わせの中から勝負が決まった時点の３目並べのフィールド情報と結果（勝ち、負け、引き分け）を利用します。

tensorflowで利用できる形式に変換し、学習をしてモデルを作成し、モデルを利用して３目並べの対戦をします。元情報がminimax法で解析した情報なので、tensorflowによる学習結果がminimax法まで到達できると最高の結果です。大まかな手順は以下の通りです。

(1) minimax法の解析を利用して学習用入力データを作成
(2) 作成した学習用入力データを利用して、tensorflowでモデル作成
(3) tensorflowのモデルを利用して実際に対戦

上記手順を３回に分けて記述し、今回は３回目です。

(2) 詳細

(3) tensorflowのモデルを利用して実際に対戦
ttttensorflow.pyを作成します。動作環境はtensorflow-1.15環境です。この中で利用するtictactoe.pyはmontecarlo版を利用します。

トレーニングしたモデル(dlmodel.h5)をロードします。最初、モデルの結果だけを利用したのですが、minimax法と同様にリーチ目を認識できません。そこで、alphabeta法で利用したis_reach()も利用しています。感触的にはminimax法と同等程度の手を打つようです。

from tictactoe import Tictactoe
import random

import tensorflow as tf
from tensorflow.keras.models import load_model
import numpy as np

def random_select(actions):
index = random.randint(0, len(actions) - 1)
return actions[index]

def input_select(actions):
while True:
print(actions)
action = int(input('select actions='))
if action in actions:
break
else:
print('input again')
return action

def tensorflow_select(actions):
model = load_model('dlmodel.h5')
if (len(actions) % 2) == 1:
flg = 1
else:
flg = 2
result = []
for action in actions:
reach = obj.is_reach()
if reach != None:
print("reach action ", reach)
return reach
score = obj.do_game(action)
f1 = [obj.fields]
a1 = np.array(f1)
a2 = a1.astype(np.float32)
predictions = model.predict(a2)
l1 = predictions.tolist()
l1[0].append(action)
result.append(l1[0])
obj.undo_game(action)

maxvalue = -1
maxaction = None
for item in result:
value = item[flg-1]
if value > maxvalue:
maxvalue = value
maxaction = item[3]

return maxaction

def montecarlo_select(actions):
if (len(actions) % 2) == 1:
flg = 1
else:
flg = 2
result = []
for action in actions:
reach = obj.is_reach()
if reach != None:
print("reach action ", reach)
return reach
score = obj.do_game(action)
init = [action,0,0,0]
minimax(obj.next_action(), init)
result.append(init)
obj.undo_game(action)

print(result)

maxvalue = -1
maxaction = None
maxlist = []
for item in result:
value = item[flg]
if value > maxvalue:
maxvalue = value
maxaction = item[0]
maxlist = [item[0]]
elif value == maxvalue:
maxlist.append(item[0])
print('maxlist ', maxlist)
if len(maxlist) != 1:
maxaction = maxlist[random.randint(0, len(maxlist) - 1)]
print('maxaction ', maxaction)
return maxaction

def alphabeta_select(actions):
if (len(actions) % 2) == 1:
flg = 1
else:
flg = 2
result = []
for action in actions:
reach = obj.is_reach()
if reach != None:
print("reach action ", reach)
return reach
score = obj.do_game(action)
init = [action,0,0,0]
minimax(obj.next_action(), init)
result.append(init)
obj.undo_game(action)

print(result)

maxvalue = -1
maxaction = None
for item in result:
value = item[flg]
if value > maxvalue:
maxvalue = value
maxaction = item[0]

return maxaction

def minimax_select(actions):
if (len(actions) % 2) == 1:
flg = 1
else:
flg = 2
result = []
for action in actions:
score = obj.do_game(action)
init = [action,0,0,0]
minimax(obj.next_action(), init)
result.append(init)
obj.undo_game(action)

print(result)

maxvalue = -1
maxaction = None
for item in result:
value = item[flg]
if value > maxvalue:
maxvalue = value
maxaction = item[0]

return maxaction

def minimax(actions, result):
for action in actions:
score = obj.do_game(action)
if score == 1:
result[1] += 1
elif score == -1:
result[2] += 1
elif score == 0:
result[3] += 1
else:
minimax(obj.next_action(), result)
obj.undo_game(action)

if __name__ == "__main__":

obj = Tictactoe()
actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:
print('my turn')
action = tensorflow_select(actions)
else:
print('other turn')
action = random_select(actions)

print(actions)
print("select", action)
result = obj.do_game(action)
print(obj.game_state())

if result == 1:
print("o Win")
break;
if result == -1:
print("x Win")
break;
if result == 0:
print("Draw")
break;

actions = obj.next_action()

参考
[外部サイト参照]
・ミニマックス法で最強の3目並べAIを実装する話

検索

Ubuntu User Blog

OpenMythosのサンプルプログラムを動かしました