３目並べを利用したAlphaGoの学習ミニマックス編

12/16/2022

1.概要

AlphaGoの勉強過程で３目並べを学んでいます。考え方の基礎を知る上で大切なことであると思いネット上の資料も参考にしています。まず、参考資料を参照して、３目並べのルールを記述したClassを作成し、手入力とランダム入力の対戦ができることを実現しました。今回は、minimax法に関して記述します。実際には先手は○が３個並ぶ手が多い着手、後手は✕が３個並ぶ手が多い着手を選択しています。

2.詳細

実現方法は参考資料とは少し異なります。その理由は参考資料に従うと先手がminimax法の場合に３目並べの中心(4の位置)を選択しません。この位置に着することが優位であると考えていたので、全ての手順を調べて初手が３目並べの中心位置であることを確認して、それを選択するようにプログラムしました。また、後手がminimax法の場合も良い手を打つように改良しました。

(1) 概要

前回記述した３目並べのルールに関しては、do_game(self, action)を実行することでゲームが進行するようにしましたものを活用します。minimax法に関して、すべての手順(9!=362880)を調べて、ある手を打った場合に、勝利する場合の数、敗退する場合の数、引き分ける場合の数を計算して、勝利する場合の数が高かった手を打つことにしました。初手が一番検討すべき手順が多いのですが、数秒で処理されます。

(a) tictactoe.py　ゲームルール(minimax法を実現するために、undo_game(self,action)を追加)
(b) tttminimax.py　minimax入力とランダム入力の対戦

(2) 詳細

(a) tictactoe.py

class Tictactoe:

def __init__(self, fields=None):
self.fields = fields if fields != None else [0] * 9
self.myturn = True
self.data = [[0,1,2],[3,4,5],[6,7,8],
[0,3,6],[1,4,7],[2,5,8],
[0,4,8],[2,4,6]]

def do_game(self, action):
self.fields[action] = self.do_play()
check = self.do_check()
self.myturn = not(self.myturn)
return check

def undo_game(self, action):
self.fields[action] = 0
self.myturn = not(self.myturn)

def do_play(self):
if self.myturn == True:
return 1
else:
return -1

def do_check(self):
if self.is_win() == True:
return self.do_play()
if self.is_draw() == True:
return 0
return None

def is_win(self):
for i in range(8):
count = 0
for j in range(3):
count = self.fields[self.data[i][j]] + count
if count == 3 or count == -3:
return True
return False

def is_draw(self):
count = 0
for i in range(9):
if self.fields[i] == 0:
count += 1
if count == 0:
return True
else:
return False

def next_action(self):
actionlist = []
for i in range(len(self.fields)):
if self.fields[i] == 0:
actionlist.append(i)
return actionlist

def game_state(self):
str = ''
for i in range(9):
if self.fields[i] == 1:
str += 'o'
elif self.fields[i] == -1:
str += 'x'
else:
str += '_'
if i % 3 == 2:
str += '\n'
return str

(b) tttminimax.py

from tictactoe import Tictactoe
import random

def random_select(actions):
index = random.randint(0, len(actions) - 1)
return actions[index]

def input_select(actions):
while True:
print(actions)
action = int(input('select actions='))
if action in actions:
break
else:
print('input again')
return action

def minimax_select(actions):
if (len(actions) % 2) == 1:
flg = 1
else:
flg = 2
result = []
for action in actions:
score = obj.do_game(action)
init = [action,0,0,0]
minimax(obj.next_action(), init)
result.append(init)
obj.undo_game(action)

maxvalue = -1
maxaction = None
for item in result:
value = item[flg]
if value > maxvalue:
maxvalue = value
maxaction = item[0]

return maxaction

def minimax(actions, result):
for action in actions:
score = obj.do_game(action)
if score == 1:
result[1] += 1
elif score == -1:
result[2] += 1
elif score == 0:
result[3] += 1
else:
minimax(obj.next_action(), result)
obj.undo_game(action)

if __name__ == "__main__":

obj = Tictactoe()
actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:
print('my turn')
action = minimax_select(actions)
else:
print('other turn')
action = random_select(actions)

print(actions)
print("select", action)
result = obj.do_game(action)
print(obj.game_state())

if result == 1:
print("o Win")
break;
if result == -1:
print("x Win")
break;
if result == 0:
print("Draw")
break;

actions = obj.next_action()


参考

AlphaZero 深層学習・強化学習・探索人工知能プログラミング実践入門
布留川英一著

検索

Ubuntu User Blog

OpenMythosのサンプルプログラムを動かしました