Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

**Question regarding assignment 9.2**

In exercise 9.2, we assume the transition function T(s, a, s’) modelling (quote) the probability that, if player A does action a in state s, player B will play such that we end up in state s’.

In the first subtask, we are required to write down the Bellman equation for this scenario. I assume that we should substitute T(s, a, s’) for P(s’ | s,a) here. But there is another probability given on the exercise sheet, P(s’), which is the probability that B picks a move ending in s’ if A picks given action a, which sounds to be exactly the same as T(s, a, s’) for a specific a to me, so I used P(s’) in my equation in the place of P(s’ | s,a).

Unfortunately, that results in the sum part of the equation to be just a sum over c for all possible next states s’ (U(s’)/U(s’) = 1), which really confuses me, because then all state-related information is lost, and it does not matter which action we pick.

Additionally, I have no idea how I am supposed to calculate c, or how that would change anything.

Yes. Since T(s,a,s’) korresponds to the probability that we end up in state s’ after executing action a in state s, and since we have no influence on what player B does, this is the only meaningful way to model a 2-player-game-situation in terms of a transition function.

Not necessarily. How you compute P(s’ | s,a) is up to you, so to speak

*Another*? Which other probability is given?

Again: *how* you compute P(s’ | s,a) is for you to find out

You’re contradicting yourself: A sum over c for all possible next states s’ intrinsically *has* to depend on which action we pick, since that action determines which states are reachable at all.

Well, since P(s’) depends on c, c necessarily influences the probabilities in the transition function. So it has to change „everything“.

You can definitely calculate c, since in every state s, c is uniquely determined by P(s’) being a probability distribution

First of all, thank you for your quick reply and for trying to make me understand.

The only thing I can think of right now is that it depends on the number of states reachable from s, which doesn’t seem to make any sense to me.

I’m assuming you are referencing the „has to sum up to 1“ property - for that, I think c has to be 1/sum_s’ U(s’), which wouldn’t make it a constant in my eyes, but I’m probably misunderstanding something here.

But doesn’t P(s’ | s,a) have to come from the transition model? Or are you saying that defining the transition model is up to me?

I’d appreciate if you’d attempt to ask questions in such a manner that you don’t in detail describe what you think could be the correct solution.

which wouldn’t make it a constant in my eyes

Quote:

In a state s, if A picks given action a, then the probability that B picks a move ending in s′ is P(s′) =c/U(s′) for some constant c

c is a constant in P(s’) - meaning it may not depend on s’. However, we are in a context: „In a state s and assuming we pick action a“ means c is allowed to depend on s and a, since those are „fixed“ now.