Please SSOT / Reinforcement Learning? [EX4]

I have a question regarding computional expenses and to the algrithms of the 4th challenge

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

Please SSOT / Reinforcement Learning? [EX4]
Dear SAKI-Team,

I have a question regarding the fourth challenge.

First, the slides state we should use a warehouse of 3x2. Now it’s 3x3. Of course, the number of states has changed drastically from around 24 to over 15 million. How shall we implement this? For example a numpy array with 6x17Mx17M dtype=np.float16 has 32TiB. Even if I would allocate it partially with storage mapping or If I would use hdf5, it’s just is too big for my hard drive - and I guess its even too big for every hardware which everyone of us has at home. I am not sure but I think even you have quite a hard time even on the HPC? ^^

Furthermore, there are several slides in the drive folder it would be nice to have just one single source of thought. However, it would be great to have more resources on calculating P. Within our lecture on Reinformence Learning (MLTS WS20/32 Lecture 13, FAU TV, Dr. Mutschler), we learned that usually, you use SARSA, Q-Learning if it comes to applications like this. This is because the table representation is just not feasible, as stated above.

Thank you a lot for your time and afford. I hope nobody has asked this question already; if this is the case, I very am sorry for stealilng your time twice.

Best Regards,

You can consider it as 2x2 layout.