３目並べを利用したAlphaGo学習の整理 ( ランダム編 / ミニマックス編 / アルファベータ編 / モンテカルロ編 )

1.概要

AlphaGoの勉強過程で３目並べを学んでいます。考え方の基礎を知る上で大切なことであると思いネット上の資料も参考にしています。まず、参考資料を参照して、３目並べのルールを記述したClassを作成し、手入力とランダム入力の対戦ができることを実現しました。この内容を記述します。今後、作成したプログラムを改造して、３目並べを強くしていきます。

2.詳細ランダム編

(1) 概要

３目並べのルールに関しては、do_game(self, action)を実行することでゲームが進行するようにしました。actionは0から8までの数値で３目並べの９個のマス目を示します。２つのpythonコードを記述しました。

(a) tictactoe.py　ゲームルール

(b) tttrandam.py　手入力とランダム入力の対戦

(2) 詳細

(a) tictactoe.py

class Tictactoe:

def __init__(self, fields=None):

self.fields = fields if fields != None else [0] * 9

self.myturn = True

self.data = [[0,1,2],[3,4,5],[6,7,8],

[0,3,6],[1,4,7],[2,5,8],

[0,4,8],[2,4,6]]

def do_game(self, action):

self.fields[action] = self.do_play()

check = self.do_check()

self.myturn = not(self.myturn)

return check

def do_play(self):

if self.myturn == True:

return 1

else:

return -1

def do_check(self):

if self.is_win() == True:

return self.do_play()

if self.is_draw() == True:

return 0

return None

def is_win(self):

for i in range(8):

count = 0

for j in range(3):

count = self.fields[self.data[i][j]] + count

if count == 3 or count == -3:

return True

return False

def is_draw(self):

count = 0

for i in range(9):

if self.fields[i] == 0:

count += 1

if count == 0:

return True

else:

return False

def next_action(self):

actionlist = []

for i in range(len(self.fields)):

if self.fields[i] == 0:

actionlist.append(i)

return actionlist

def game_state(self):

str = ''

for i in range(9):

if self.fields[i] == 1:

str += 'o'

elif self.fields[i] == -1:

str += 'x'

else:

str += '_'

if i % 3 == 2:

str += '\n'

return str

(b) tttrandam.py

from tictactoe import Tictactoe

import random

def random_select(actions):

index = random.randint(0, len(actions) - 1)

return actions[index]

def input_select(actions):

while True:

print(actions)

action = int(input('select actions='))

if action in actions:

break

else:

print('input again')

return action

if __name__ == "__main__":

obj = Tictactoe()

actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:

print('my turn')

action = input_select(actions)

else:

print('other turn')

action = random_select(actions)

result = obj.do_game(action)

print(obj.game_state())

if result == 1:

print("o Win")

break;

if result == -1:

print("x Win")

break;

if result == 0:

print("Draw")

break;

actions = obj.next_action()

3.詳細ミニマックス編

実現方法は参考資料とは少し異なります。その理由は参考資料に従うと先手がminimax法の場合に３目並べの中心(4の位置)を選択しません。この位置に着することが優位であると考えていたので、全ての手順を調べて初手が３目並べの中心位置であることを確認して、それを選択するようにプログラムしました。また、後手がminimax法の場合も良い手を打つように改良しました。

(1) 概要

前回記述した３目並べのルールに関しては、do_game(self, action)を実行することでゲームが進行するようにしましたものを活用します。minimax法に関して、すべての手順(9!=362880)を調べて、ある手を打った場合に、勝利する場合の数、敗退する場合の数、引き分ける場合の数を計算して、勝利する場合の数が高かった手を打つことにしました。初手が一番検討すべき手順が多いのですが、数秒で処理されます。

(a) tictactoe.py　ゲームルール(minimax法を実現するために、undo_game(self,action)を追加)

(b) tttminimax.py　minimax入力とランダム入力の対戦

(2) 詳細

(a) tictactoe.py

class Tictactoe:

def __init__(self, fields=None):

self.fields = fields if fields != None else [0] * 9

self.myturn = True

self.data = [[0,1,2],[3,4,5],[6,7,8],

[0,3,6],[1,4,7],[2,5,8],

[0,4,8],[2,4,6]]

def do_game(self, action):

self.fields[action] = self.do_play()

check = self.do_check()

self.myturn = not(self.myturn)

return check

def undo_game(self, action):

self.fields[action] = 0

self.myturn = not(self.myturn)

def do_play(self):

if self.myturn == True:

return 1

else:

return -1

def do_check(self):

if self.is_win() == True:

return self.do_play()

if self.is_draw() == True:

return 0

return None

def is_win(self):

for i in range(8):

count = 0

for j in range(3):

count = self.fields[self.data[i][j]] + count

if count == 3 or count == -3:

return True

return False

def is_draw(self):

count = 0

for i in range(9):

if self.fields[i] == 0:

count += 1

if count == 0:

return True

else:

return False

def next_action(self):

actionlist = []

for i in range(len(self.fields)):

if self.fields[i] == 0:

actionlist.append(i)

return actionlist

def game_state(self):

str = ''

for i in range(9):

if self.fields[i] == 1:

str += 'o'

elif self.fields[i] == -1:

str += 'x'

else:

str += '_'

if i % 3 == 2:

str += '\n'

return str

(b) tttminimax.py

from tictactoe import Tictactoe

import random

def random_select(actions):

index = random.randint(0, len(actions) - 1)

return actions[index]

def input_select(actions):

while True:

print(actions)

action = int(input('select actions='))

if action in actions:

break

else:

print('input again')

return action

def minimax_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

return maxaction

def minimax(actions, result):

for action in actions:

score = obj.do_game(action)

if score == 1:

result[1] += 1

elif score == -1:

result[2] += 1

elif score == 0:

result[3] += 1

else:

minimax(obj.next_action(), result)

obj.undo_game(action)

if __name__ == "__main__":

obj = Tictactoe()

actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:

print('my turn')

action = minimax_select(actions)

else:

print('other turn')

action = random_select(actions)

print(actions)

print("select", action)

result = obj.do_game(action)

print(obj.game_state())

if result == 1:

print("o Win")

break;

if result == -1:

print("x Win")

break;

if result == 0:

print("Draw")

break;

actions = obj.next_action()

4.詳細アルファベータ編

(1) 概要

しかし、３目並ぶリーチ局面であるにもかかわらず、３目並ぶ手を選択せずに、２箇所の３目並べリーチとなる手を選択します。そこで、alphabeta法として無駄な処理を刈り取るために、リーチ局面の場合はリーチから勝ちの手を選ぶ方法をminimax法に追加します。

先手番をminimax法、後手番をalphabeta法にするとdraw(引き分け)になり完成度が向上したと思います。

(a) tictactoe.py　ゲームルール(alphabeta法を実現するために、is_reach(self)を追加)

(b) alphabeta.py　先手(minimax)、後手(alphabeta)の対戦

(2) 詳細

(a) tictactoe.py

class Tictactoe:

def __init__(self, fields=None):

self.fields = fields if fields != None else [0] * 9

self.myturn = True

self.data = [[0,1,2],[3,4,5],[6,7,8],

[0,3,6],[1,4,7],[2,5,8],

[0,4,8],[2,4,6]]

def do_game(self, action):

self.fields[action] = self.do_play()

check = self.do_check()

self.myturn = not(self.myturn)

return check

def undo_game(self, action):

self.fields[action] = 0

self.myturn = not(self.myturn)

def do_play(self):

if self.myturn == True:

return 1

else:

return -1

def do_check(self):

if self.is_win() == True:

return self.do_play()

if self.is_draw() == True:

return 0

return None

def is_win(self):

for i in range(8):

count = 0

for j in range(3):

count = self.fields[self.data[i][j]] + count

if count == 3 or count == -3:

return True

return False

def is_draw(self):

count = 0

for i in range(9):

if self.fields[i] == 0:

count += 1

if count == 0:

return True

else:

return False

def is_reach(self):

for i in range(8):

count = 0

action = None

for j in range(3):

count = self.fields[self.data[i][j]] + count

if self.fields[self.data[i][j]] == 0:

action = self.data[i][j]

if count == 2 or count == -2:

return action

return None

def next_action(self):

actionlist = []

for i in range(len(self.fields)):

if self.fields[i] == 0:

actionlist.append(i)

return actionlist

def game_state(self):

str = ''

for i in range(9):

if self.fields[i] == 1:

str += 'o'

elif self.fields[i] == -1:

str += 'x'

else:

str += '_'

if i % 3 == 2:

str += '\n'

return str

(b) alphabeta.py

from tictactoe import Tictactoe

import random

def random_select(actions):

index = random.randint(0, len(actions) - 1)

return actions[index]

def input_select(actions):

while True:

print(actions)

action = int(input('select actions='))

if action in actions:

break

else:

print('input again')

return action

def alphabeta_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

reach = obj.is_reach()

if reach != None:

print("reach action ", reach)

return reach

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

return maxaction

def minimax_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

return maxaction

def minimax(actions, result):

for action in actions:

score = obj.do_game(action)

if score == 1:

result[1] += 1

elif score == -1:

result[2] += 1

elif score == 0:

result[3] += 1

else:

minimax(obj.next_action(), result)

obj.undo_game(action)

if __name__ == "__main__":

obj = Tictactoe()

actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:

print('my turn')

action = minimax_select(actions)

else:

print('other turn')

action = alphabeta_select(actions)

print(actions)

print("select", action)

result = obj.do_game(action)

print(obj.game_state())

if result == 1:

print("o Win")

break;

if result == -1:

print("x Win")

break;

if result == 0:

print("Draw")

break;

actions = obj.next_action()

5.詳細モンテカルロ編

(1) 概要

前回記述した３目並べのルールに関しては、do_game(self, action)を実行することでゲームが進行するようにしたものを活用します。minimax法に関して、すべての手順(9!=362880)を調べて、ある手を打った場合に、勝利する場合の数、敗退する場合の数、引き分ける場合の数を計算して、勝利する場合の数が高かった手を打つことにしました。

先手番をminimax法、後手番をalphabeta法にするとdraw(引き分け)になります。しかし、毎回同じパターンを繰り返します。同じ選択基準である手が複数ある場合でも最初の手を選択していたことが原因でした。そこで、複数の選択基準がある場合は乱数を利用して手を選択する方式を追加しました。これで様々な場面を見ることができるようになりました。

(a) tictactoe.py　ゲームルール(alphabeta法と同じなので省略）

(b) montecarlo.py　先手(minimax)、後手(montecarlo)の対戦

(2) 詳細

(a) tictactoe.py（省略します）

(b) montecarlo.py

from tictactoe import Tictactoe

import random

def random_select(actions):

index = random.randint(0, len(actions) - 1)

return actions[index]

def input_select(actions):

while True:

print(actions)

action = int(input('select actions='))

if action in actions:

break

else:

print('input again')

return action

def montecarlo_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

reach = obj.is_reach()

if reach != None:

print("reach action ", reach)

return reach

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

maxlist = []

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

maxlist = [item[0]]

elif value == maxvalue:

maxlist.append(item[0])

if len(maxlist) != 1:

maxaction = maxlist[random.randint(0, len(maxlist) - 1)]

return maxaction

def alphabeta_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

reach = obj.is_reach()

if reach != None:

print("reach action ", reach)

return reach

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

return maxaction

def minimax_select(actions):

if (len(actions) % 2) == 1:

flg = 1

else:

flg = 2

result = []

for action in actions:

score = obj.do_game(action)

init = [action,0,0,0]

minimax(obj.next_action(), init)

result.append(init)

obj.undo_game(action)

maxvalue = -1

maxaction = None

for item in result:

value = item[flg]

if value > maxvalue:

maxvalue = value

maxaction = item[0]

return maxaction

def minimax(actions, result):

for action in actions:

score = obj.do_game(action)

if score == 1:

result[1] += 1

elif score == -1:

result[2] += 1

elif score == 0:

result[3] += 1

else:

minimax(obj.next_action(), result)

obj.undo_game(action)

if __name__ == "__main__":

obj = Tictactoe()

actions = [0,1,2,3,4,5,6,7,8]

for i in range(9):

if obj.myturn == True:

print('my turn')

action = minimax_select(actions)

else:

print('other turn')

action = montecarlo_select(actions)

print(actions)

print("select", action)

result = obj.do_game(action)

print(obj.game_state())

if result == 1:

print("o Win")

break;

if result == -1:

print("x Win")

break;

if result == 0:

print("Draw")

break;

actions = obj.next_action()

(3) 評価

最強になったと思ったのですが、先手番が手入力と対戦すると勝てません（下図参照）。先手番で２箇所リーチを作る手順をプログラムは読みきれません。３目並べも中々難しいものです。

先手(4) -> 後手(8) -> 先手(0) の局面

o _ _

_ o _

_ _ x

この局面で、後手(2 or 6)以外は先手勝ちになります。しかし、プログラムは後手(5 or 7)を打ちます。後手のリーチで先手の２箇所リーチを防ぐことが必要ですが、この評価方法が難しい。

6.所見

三目並べは全てケースを計算して負けないようにすることは可能です。これはロジック記述が可能であるからです。次は、ロジックを書かずにCNNによる学習でどの程度強くなるかを調べます

参考書籍

AlphaZero 深層学習・強化学習・探索人工知能プログラミング実践入門

布留川英一著

参考

[本ブログ内参照]
・OpenMythosをGPUを利用できるLinuxMint/Docker環境で構築しました

miniPCのBMAXでWindows11のsecure boot設定漏れでトラブル発生

4/23/2026

検索

Ubuntu User Blog