Berkeley AI: Pacman

Per the rules of the University of California, Santa Cruz, the source code for this project can not be shared publicly. To view the source code please send me an email, which can be found in the contact section of the website, and state a reason for requesting source code.

Project Info

Role: Programmer
Team Size: 1
Time Frame: 2 Months (Oct - Nov 2024)
Language: Python

About

The Pacman AI project is a course long project used at many universities across the globe. In the course I took at UCSC, CSE140: Artificial Intelligence, my professor broke the assignment into four parts. The first three parts consisted of individual work while the final part consisted of teamwork in a group of four.

Assignment PDF: [Link]

Part 1

Overview

Part One of the project consists of writing multiple search algorithms: Depth First Search, Breath First Search, Uniform Cost Search, and A* search. The search algorithms are incorporated with the pacman code so that the pacman agent will traverse through each layout using the path provided by running the algorithm.

Algorithm Descriptions

The searches all follow the well known defintions of themselves. The depth search first expands the farthest node to the left and then goes up the tree. Breadth First Search first expands the closest nodes to the root and then starts expanding recursively down on the left most node. Uniform Cost Search is similar to the previous two searches descibed but now uses a priority queue. The prioirty of the queue is determined by which ever node has the minimum distance or cost. Finally, A* search will now consider the cost of reaching a node as well as the distance to the goal. The A* search function accepts a predefined heuristic function making it an informed search function.

Part 2

Overview

Part Two of the assingment consists of writing a minimax search agent, a minimax search agent using alpha beta pruning, and an expectimax agent.

Algorithm Descriptions

The minimax search agent will recursively evaluate possible outcomes of different moves asuming that the agent is playing optimally. At each level of the tree, the maximizing player chooses the move with the highest score, while the minimzing player chooses the move with the lowest score. When alpha beta pruning is added, the agent can disgregard parts of the tree in scenarios of which there will be no better option in the child nodes. That part of the tree can then be removed, or pruned, which will improve the overall efficeny of the search. Expectimax is an extension of of the minimax algorithm where a chance node represents the different possible outcomes of a probablistic event. The algorithm calculates the expected value at chance nodes by takign the average of the possible outcomes, weighted by their probabliites.

Part 3

Overview

Part 3 of the project consisted of writing a value iteration function and a q-learning agent on a pacman grid. In Part 1 and Part 2, the pacman agent would go where it was told. Now, the agent has an 80% change of going the way it is told and 10% chances of going left and right.

Value Iteration

The value iteration function taken in a Markov Decision Process (MDP) on initialization, and runs a value iteration for a given number of iterations using a supplied discount factor. The value iteration function will supply an optimal policy, or a solution that states what an agent must do for any state the agent might reach. The most important function to complete the value iteration function in the getQValue() function. It will return the q value of a state action pair which is a utility value or the expected utility of taking a given action in a given state.

To compute the Q-Value use the equation: ∑ P(s'|s, a)[R(s, a, s') + 𝛾U[s'] where:

P(s'|s, a) is the probability of reaching state s' if an action, a, is done in state s.

R(s, a, s') is the reward the agent receives for every transition from s to s' for action, a.

𝛾U[s'] is the discount factor multiplied by the value/utility of the transition state.

Q-Learning Agent

The algorithm maintains a Q-value for each state-action pair, representing the expected cumulative reward of taking a specific action in a given state and following the optimal policy thereafter. Through exploration of the environment, the agent updates its Q-values based on the observed rewards and refines its policy over time.

To compute the Q-Value in the update function use the equation: Q(s, a) <- (1 - α)Q(s, a) + (α)[R(s, a, s') + 𝛾maxQ(s', a')] where:

𝛾 is the discount rate

α is the alpha

s is the current state

s' is the next state

a is the current action